The nature of World Wide Web is to always provide the latest information to its users. Websites evolve and update constantly. This can be seen as a great strength, but it also shows how ephemeral the web is. How do you archive web pages and keep track of changes?
The information you want to capture today for any historical or business purpose might not even be there soon and you can do nothing but regret. We have already lost a lot of web data from the 90s. Many of the studies published online are not available anywhere because the website broke down or the domain expired.
With the power of web archiving, you can now ensure important data is preserved and can be fetched whenever you need it. Having the data in the form of screenshots ensures that the visual context is retained.
- What is Website Archiving?
- Types of Web Archiving
- Why Web Archiving?
- How to Archive Web Pages?
What is Website Archiving?
Web Archiving is the process of preserving websites in an archive. By capturing screenshots at specific moments, the data of each web page is retained. These screen captures preserve the original context, containing both content and appearance. Safeguarding screenshots in an archive ensures long-term accessibility for analysis or reference.
This process is somewhat similar to traditional archiving, where people used to preserve papers or documents manually. The basic idea is the same, you select the information and store and preserve it while making it available to people for future use.
As the internet contains a huge amount of data (more than 1.5 Billion websites to be precise), web archivists make use of an automated process to capture these web pages. With the help of crawlers, archivists move across multiple web pages and capture the details from the sources. Once this data is stored, it is made available as snapshots in the web archive collection.
This can be done for multiple purposes:
- For building heritage
- To comply with regulations
- To fight legal battles
- To get an edge over competitors
In earlier days, officials used to keep a record of everything important that happened. It might surprise you to know that the term “archiving” dates back to 1st century.
Types of Web Archiving
Now that you are aware of the basic concept, let’s move towards different types of web archiving.
One of the most common methods of this process is Client-Side Web Archiving. It is mostly popular because of its scalability and simplicity. The method helps you create an archive of any web page that is freely available on the Internet. People who want to preserve their own website as well as the website of other organizations often use Client-Side.
Unlike the previous one, Transaction-based requires permissions from the server hosting the web content. The approach requires an agreement with the server owner. Due to its complexity, this process is less preferred.
This method is conducted on the server side and captures all of the transactions that occur between user and server. With this type of archiving, you can capture exactly what was seen and when. This is often useful for internal, corporate, or institutional archiving where compliances or legal accountability holds great importance.
In this process, crawlers copy all the information directly from the server. Similar to Transaction-based, this type of web archiving also requires consent between server owner and archivist. This gets complicated when you want to capture a web page that produces dynamic content like ads or other similar elements from any external source.
Why Web Archiving?
Over the past few years, web archiving has gathered a lot of attention. Previously, it was limited to being a method of keeping a record of the page for the sake of heritage. However, today we are more aware of how archiving can be used for a lot more. Here are a few scenarios where it is helping a lot of businesses.
Measuring SERP presence
Last week I got a call from Steve. He was excited to see a sudden increase in our website’s ranking. Surprisingly, I was not able to see any changes in my browser.
But I soon received this screenshot from Steve and, to my surprise, Stillio was ranked second. <3 Figured out location plays a huge role in SERP results. The screenshot that I took was based in India while the screenshot Steve took was from the USA, and it affected the results.
Since our most preferred search engine started focusing on personalized search results, it should be said that SERP results vary based on demographics and behavior. Your website may be “ranking” in Google in your country and at the same time, it may not rank in some other location. In such cases, it becomes very important to keep a log of the search results in order to avoid confusion.
Keeping Your Marketing Strategy Up-to-Date
Want to keep a track of offers run by your competitors? Want to monitor your brand across multiple web properties? Or want to stay up-to-date on the latest changes made by your rivals? Web archiving is the answer. With regular screenshots by your side, you can be up-to-date on what changed and when. Marketing is all about having an edge over your competitors. Keeping a keen eye on them makes it pretty easy.
Safeguarding Yourself from False Claims
My friend Jack owns an ecommerce store that sells apparel. Every Wednesday he runs a 49% Off deal on the first 15 orders. One of his customers was about to leverage the deal and got timed out. Hence, instead of the discounted price, he ended up paying the original price of the product and he was very furious about it. He assumed that if he added the product to the Cart during the sale period, he would get it for the discounted price. The contrary was true, the company allowed discount only if the whole process, including payment & checkout, was done before the end of the sale. Now, something that looked like ignorance turned out to be a lawsuit for Jack. Not only did the customer want his money back, he asked for 10,000$ for damages and harassment.
In such cases, having screenshots of everything that is said on your website makes the process much easier. Since, Jack kept a record of every page of his website, he used the General Terms and Conditions page as evidence and got rid of lawsuit faster than we might believe.
Website archiving is a must-have for combating legal issues. With regular screenshots, you can stay carefree in case someone makes a false claim. The demand for older or historical content is growing rapidly. Web archiving is a great option for website owners who don’t want to keep their legacy information on a live site yet, but might need this in future.
I Already Keep Website Backups Time to Time, Do I Still Need Archiving?
Website backups and archives work in very different way. While regular backups ensure that your website stays safe even if something gets messed up and files get removed from the server, archiving provides you control over visual things.
Another difference is that backing up a website allows you to put together a website from the saved files in case of any issues, but, archiving ensures that the website can be captured, preserved, and navigated by users just like the live website.
Web Archiving Is No Longer Optional
Many businesses are required to keep a detailed record of any kind of electronic communication they do. [As required by SEC, FINRA, IDA of Canada, FSA of the UK, and the Sarbanes-Oxley Act of 2002]. Failing to do that may result in serious problems. With your own archive record handy, you can stay prepared for such issues and make sure you are at winning side.
Apart from what I have mentioned above, web archiving can be helpful in trend tracking and analyzing your competitors as well as brand management.
How to Archive Web Pages?
Here comes the real meat you have been waiting for. There are several ways you can perform archiving. I will be sharing all of the options with relevant scenarios. Before you do that, here are some points to consider:
- Choosing the right content is very important. While planning to archive, you need to ask yourself some questions, such as
- Does this content or web page hold any historical value for my business?
- How is this content related to other records that I am required to keep?
These questions will help you identify the right content and duration of archiving. Not all content needs to be stored for years. For example, you are generally required to keep financial records for minimum 7 years.
- The next thing to consider is the frequency of archiving. Do you want daily archives? Or would once a month maybe be fine? This depends on how often the website is updated. For example, if there is an event going on, your website will be updated quite often and in that case, you will have to set the archival frequency accordingly.
- While archiving, it is important to ensure that no content is updated between archival sessions as this will not be collected or stored anywhere.
Always keep in mind that, although Web Archiving is a great way to keep record of everything online, not all of the elements are captured with 100% accuracy. If a website is not “Machine readable”, it becomes a bit difficult to archive it. Web crawlers usually can’t reach password protected sites or search boxes and they can therefore not be captured.
Let’s get started with the process:
1. When You Just Want to Archive a Single Web Page Offline.
A. Check out this Chrome extension from Fireshot. All you have to do is install the extension in Chrome and click on the little icon at the top right. Fireshot gives the option to save the page both as PDF and PNG.
B. If you are flooded with too many Chrome extensions, here is an alternative:
- Open the target webpage.
- Press Ctrl+Shift+I, then press Ctrl+Shift+P.
- Search for screenshot and select “Capture full-size screenshot” and you are done!
Pros: These are free and easy to use.
Cons: There is no option to store data online. Automation is missing too.
C. Press Ctrl+P in Chrome and it opens the print option. You can save that as PDF. This is great when you are focused on content only.
Pros: It allows you to save screenshots on Google Drive too.
Cons: This process might cause some compatibility issues in print and screenshot format. As mentioned above, use this one if content is your major focus. If visuals are important, you will want to stay away from this one.
I don’t have Chrome, how can I archive a web page?
There is an SAAS tool called Url2png that you can use to take screenshots and archive any web page. Url2png is primarily focused on creating thumbnails and screenshots for multiple websites.
P.S. Url2png doesn’t allow full-page screenshots in free versions and I wouldn’t recommend the paid version unless you plan to integrate the API with a tool like Woorank. As I said, the target market for Url2png is businesses looking for bulk screenshots of their applications.
Other similar tools are Browshot, Thum.io, Screenshotlayer, and Webthumb.bluga.net.
Cons: These tools don’t allow you to schedule screenshots, you have to do it all manually. Also, they don’t archive your screenshots, you need to store them yourself. Furthermore, these tools are specifically aimed at technically minded users, as the primary interface is designed to call their API to program own capture jobs.
2. When You Want to Archive Web Pages Online
These tools also allow you to check the periodical data of a web page.
A. Wayback Machine
Wayback Machine is solely designed to store web pages across the Internet. Saving any URL to Wayback Machine is pretty easy.
- Go to http://web.archive.org/.
- Enter the target URL in the “Save Page Now” box and click on “Save Page”.
- That’s it guys. Now your desired web page is permanently stored on Wayback.
Pros: With Wayback Machine, you can also check historical data of any web page. All you need to do is enter the URL into the search bar and you will get a complete timeline of the web versions.
Cons: The process is completely manual. There is also no guarantee for the stability of archived content. Results are not nearly as accurate as a full-page screenshot. Lastly, there is no support provided.
Follow this link, if you want to know more about common Wayback Machine alternative.
Despite a few flaws, we all love Wayback Machine for the contributions it has made to the Internet.
This is another tool you can use to archive any web page, just like Wayback Machine. The process is simple, add the URL you want to submit and hit the submit button. Within minutes, your web page will be archived. The tool also provides a Chrome extension, which offers a one-click way to get the work done.
Pros: Free and shows old data for most of your desired URLs.
Cons: Can’t archive ads and certain codes are excluded. Furthermore, if you want to schedule archiving, this tool won’t work.
3. When You Want to Archive a Whole Website
Httrack is a great, nifty tool that uses a completely different approach. Instead of taking screenshots like Fireshot and other tools, Httrack downloads the whole website, including the code and images.
Pro: It can download the complete front end along with the code. This is somewhat like taking backup of a website in HTML format.
- It sometimes misses the images.
- Buggy, it crashes sometimes and is a bit complicated.
4. When You Want to Capture and Archive Website Screenshots Automatically
You are often more interested in having this done automatically, rather than taking care of this whole website archiver process manually. Well, Stillio here can save the day. Let’s explain how.
With Stillio, creating your web archive is quite easy. Whether it concerns your organization’s homepage and key landing pages, a SERP, your competitor’s website, or any of the social media profiles, this tool can archive most pages. You can set up the whole process quickly and it can save a lot of time. There’s no need to install any software; you just need to enter the URLs you want to preserve, select the schedule, and you are good to go.
All of the screenshots can be saved on most of the cloud providers, such as Dropbox, Google Drive, Box, Microsoft OneDrive, Amazon S3, and even offline.
- You can schedule the archival process to take place daily, weekly, monthly, or anywhere in between.
- Unlike Wayback Machine and Archive.is, Stillio captures ads, images, and any other elements with close to 100% accuracy.
- While Wayback Machine is not able to capture Google SERP, Stillio does that quite easily.
- Submit your sitemap.xml and have all your web pages added at once.
- If needed, you can also share these screenshots with others.
- Geo-specific screenshots: As discussed above, there might be variations in web pages or SERPs depending on the location from where the URL is accessed. With Stillio, we can also take geo-specific screenshots to archive website pages.
- With Stillio, you can also capture the screenshot of the mobile version of the website. Responsive mobile archival can come in handy in cases where data is different from desktop version of the URL.
- No need to buy and install any software. Just create an account and let Stillio capture your website or any other URL.
5. Taking Full-Page Screenshots on a Phone
IFTTT: Taking screenshots while browsing via your phone is quite common and easy. But the problem arises when your phone and email data is not in sync. Well, here is a small trick that may work for you.
- Create an account on https://ifttt.com.
- Now, visit this recipe and turn it on.
- That’s it folks. Now whenever you will take a screenshot on your Android phone, you will automatically get an email.
- You can send these screenshots to whomever you want from your inbox.
Managing Data Offline
I know many of you might already be aware of this, but for those who have just started out, here’s how you can do it:
- Create specific folders for each of the website you want to archive.
- Now store each version based on its date.
- Additionally, you can save this data in Google Drive.
- Ideally, this data should be saved to a network drive that should also be backed up to another data source.
- If you have access to Dropbox or a similar storage provider, you need to sync this web archive there too. This way you will have 24/7 connectivity to everything.
Before starting the archiving process, it is essential that you know the right objective behind it. Only then will you be able to maintain the heritage and provide the access to the right personnel. It will also help you figure out how often you should collect this information, for how long it should be there, and who can get access of this data.
So what are you waiting for? Start building your heritage today!