The nature of the World Wide Web is to always provide the latest information to its users. Websites evolve and update constantly. This can be seen as a great strength, but it also shows how ephemeral the web is. How do you archive web pages and keep track of changes?
The information you want to capture today for any historical or business purpose might not even be there soon and you can do nothing but regret. We have already lost a lot of web data from the 90s. Many of the studies published online are not available anywhere. The reason? Either the website broke down or the domain expired.
With the power of web archiving, you can now ensure important data is preserved and can be fetched whenever you need it. Having the data in the form of screenshots ensures that the visual context is retained.
Web Archiving is the process of preserving websites in an archive. By capturing screenshots at specific moments, the data of each web page is retained. These screen captures preserve the original context, containing both content and appearance. Safeguarding screenshots in an archive ensures long-term accessibility for analysis or reference.
This process is somewhat similar to traditional archiving, where people used to preserve papers or documents manually. The basic idea is the same, you select the information and store and preserve it while making it available to people for future use.
As the internet contains a huge amount of data (more than 1.5 Billion websites to be precise), web archivists make use of an automated process to capture these web pages. With the help of crawlers, archivists move across multiple web pages and capture the details from the sources. Once this data is stored, it is made available as snapshots in the web archive collection.
This can be done for multiple purposes:
In earlier days, officials used to keep a record of everything important that happened. It might surprise you to know that the term “archiving” dates back to the 1st century.
Now that you are aware of the basic concept, let’s move towards different types of web archiving.
In this process, you can create an archive of any web page that is freely available on the internet.
One of the most common methods of this process is Client-Side Web Archiving. It is widely popular because of its scalability and simplicity. The method helps you create an archive of any web page that is freely available on the Internet. People who want to preserve their own website as well as the website of other organizations often use Client-Side.
In this process, you can archive websites for which you have permission from the server, hosting the content.
Unlike the previous one, the approach requires an agreement with the server owner.
Due to its complexity, this process is less preferred. This method is conducted on the server side and captures all of the transactions that occur between user and server.
With this type of archiving, you can capture exactly what was seen and when. This is often useful for internal, corporate, or institutional archiving where compliances or legal accountability holds great importance.
In this process, crawlers copy all the information directly from the server.
Similar to Transaction-based, this type of web archiving also requires consent between server owner and archivist. This gets complicated when you want to capture a web page that produces dynamic content like ads or other similar elements from any external source.
Over the past few years, web archiving has gathered a lot of attention. Before, it was limited to being a method of keeping a record of the page for the sake of heritage. However, today we are more aware of how archiving can be used for a lot more. Here are a few scenarios where it is helping a lot of businesses.
Last week I got a call from Steve. He was excited to see a sudden increase in our website’s ranking. Surprisingly, I was not able to see any changes in my browser.
But I soon received this screenshot from Steve and, to my surprise, Stillio was ranked second.
Since our most preferred search engine started focusing on personalized search results, it should be said that SERP results vary based on demographics and behavior. Your website may be “ranking” in Google in your country and at the same time, it may not rank in some other location. In such cases, it becomes very important to keep a log of the search results in order to avoid confusion.
Want to keep a track of offers run by your competitors? Want to monitor your brand across multiple web properties? Or want to stay up-to-date on the latest changes made by your rivals? Web archiving is the answer. With regular screenshots by your side, you can be up-to-date on what changed and when. Marketing is all about having an edge over your competitors. Keeping a keen eye on them makes it pretty easy.
My friend Jack owns an ecommerce store that sells apparel. Every Wednesday he runs a 49% Off deal on the first 15 orders. One of his customers was about to leverage the deal and got timed out. Hence, instead of the discounted price, he ended up paying the original price of the product and he was very furious about it. He assumed that if he added the product to the Cart during the sale period, he would get it for the discounted price.
The contrary was true, the company allowed discount only if the whole process, including payment & checkout, was done before the end of the sale. Now, something that looked like ignorance turned out to be a lawsuit for Jack. Not only did the customer want his money back, he asked for 10,000$ for damages and harassment. In such cases, having screenshots of everything that is said on your website makes the process much easier.
Since, Jack kept a record of every page of his website, he used the General Terms and Conditions page as evidence and got rid of lawsuit faster than we might believe. Website archiving is a must-have for combating legal issues.
With regular screenshots, you can stay carefree in case someone makes a false claim. The demand for older or historical content is growing rapidly. Web archiving is a great option for website owners who don’t want to keep their legacy information on a live site yet, but might need this in future.
Website backups and archives work in very different way. Regular backups ensure that your website stays safe even if something gets messed up and files get removed from the server. On the other hand, archiving provides you control over visual things.
Another difference is that backing up a website allows you to put together a website from the saved files in case of any issues, but, archiving ensures that the website can be captured, preserved, and navigated by users just like the live website.
Many businesses need to keep a detailed record of any kind of electronic communication they do. [As required by SEC, FINRA, IDA of Canada, FSA of the UK, and the Sarbanes-Oxley Act of 2002]. Failing to do that may result in serious problems.
With your own archive record handy, you can stay prepared for such issues and make sure you are at winning side. Apart from what I have mentioned above, web archiving can be helpful in trend tracking and analyzing your competitors as well as brand management.
Here comes the real meat you have been waiting for. There are several ways you can perform archiving. I will be sharing all of the options with relevant scenarios. Before you do that, here are some points to consider:
These questions will help you identify the right content and duration of archiving. Not all content needs to be stored for years. For example, you are generally required to keep financial records for minimum 7 years.
Always keep in mind that, although Web Archiving is a great way to keep record of everything online, not all of the elements are captured with 100% accuracy. If a website is not “Machine readable”, it becomes a bit difficult to archive it. Web crawlers usually can’t reach password protected sites or search boxes and they can therefore not be captured.
Let’s get started with the process:
A. Check out this Chrome extension from Fireshot. All you have to do is install the extension in Chrome and click on the little icon at the top right. Fireshot gives the option to save the page both as PDF and PNG.
B. If you are flooded with too many Chrome extensions, here is an alternative:
Pros: These are free and easy to use.
Cons: There is no option to store data online. Automation is missing too.
C. Press Ctrl+P in Chrome and it opens the print option. You can save that as PDF. This is great when you are focused on content only.
Pros: It allows you to save screenshots on Google Drive too.
Cons: This process might cause some compatibility issues in print and screenshot format. As mentioned above, use this one if content is your major focus. If visuals are important, you will want to stay away from this one.
I don’t have Chrome, how can I archive a web page?
There is an SAAS tool called Url2png that you can use to take screenshots and archive any web page. Url2png is primarily focused on creating thumbnails and screenshots for multiple websites.
P.S. Url2png doesn’t allow full-page screenshots in free versions and I wouldn’t recommend the paid version unless you plan to integrate the API with a tool like Woorank. As I said, the target market for Url2png is businesses looking for bulk screenshots of their applications.
Other similar tools are Browshot, Thum.io, Screenshotlayer, and Webthumb.bluga.net.
Cons: These tools don’t allow you to schedule screenshots, you have to do it all manually. Also, they don’t archive your screenshots, you need to store them yourself. Furthermore, these tools are specifically aimed at technically minded users, as the primary interface is designed to call their API to program own capture jobs.
These tools also allow you to check the periodical data of a web page.
Wayback Machine is solely designed to store web pages across the Internet. Saving any URL to Wayback Machine is pretty easy.
Pros: With Wayback Machine, you can also check historical data of any web page. All you need to do is enter the URL into the search bar and you will get a complete timeline of the web versions.
Cons: The process is completely manual. There is also no guarantee for the stability of archived content. Results are not nearly as accurate as a full-page screenshot. Lastly, there is no support provided.
Despite a few flaws, we all love Wayback Machine for the contributions it has made to the Internet.
This is another tool you can use to archive any web page, just like Wayback Machine. The process is simple, add the URL you want to submit and hit the submit button. Within minutes, your web page will be archived. The tool also provides a Chrome extension, which offers a one-click way to get the work done.
Pros: Free and shows old data for most of your desired URLs.
Cons: Can’t archive ads and certain codes are excluded. Furthermore, if you want to schedule archiving, this tool won’t work.
Httrack is a great, nifty tool that uses a completely different approach. Instead of taking screenshots like Fireshot and other tools, Httrack downloads the whole website, including the code and images.
Pro: It can download the complete front end along with the code. This is somewhat like taking backup of a website in HTML format.
You are often more interested in having this done automatically, rather than taking care of this whole website archiver process manually. Well, Stillio here can save the day. Let’s explain how.
With Stillio, creating your web archive is quite easy. Whether it concerns your organization’s homepage and key landing pages, a SERP, your competitor’s website, or any of the social media profiles, this tool can archive most pages.
You can set up the whole process quickly and it can save a lot of time. There’s no need to install any software; you just need to enter the URLs you want to preserve, select the schedule, and you are good to go.
All of the screenshots can be saved on most of the cloud providers, such as Dropbox, Google Drive, Box, Microsoft OneDrive, Amazon S3, and even offline.
IFTTT: Taking screenshots while browsing via your phone is quite common and easy. But the problem arises when your phone and email data is not in sync. Well, here is a small trick that may work for you.
I know many of you might already be aware of this, but for those who have just started out, here’s how you can do it:
Before starting the archiving process, it is essential that you know the right objective behind it. Only then will you be able to maintain the heritage and provide the access to the right personnel. It will also help you figure out how often you should collect this information, for how long it should be there, and who can get access of this data.
So what are you waiting for? Start building your heritage today!