It is usually essential to capture a web page or a number of pages as evidence that particular claims have been made.
There are a number of ways to do capture a web page:
- Create a pdf of the page/pages;
- Copy the HTML of the page, including the graphics and other files;
- Capturing complete websites;
- Use a page caching service to save a copy of the page.
There are advantages and disadvantages for each method and the success will depend on the web page you are trying to capture. Capturing several different websites may need different techniques.
In many cases, a pdf writer will give be a reasonably faithful copy of the web page. However, some web pages do not come out well, sometimes just missing out the graphics (eg WordPress and Blogspot blogs). The text is usually captured, but the page graphics may be removed and the layout of the page messed up. That may not be that important, but it is usually better to have a more accurate copy of the web page. The only way to find out is to try it.
Sometimes a pdf writer will produce a blank page or one with most of the content missing. You'll need to find a different solution.
Adobe ShockWave Flash pages are particularly difficult to capture.
Pdf writers
These are useful for creating a copy of a webpage. pdf file are ubiquitous and can be opened and read on most PCs.
The easiest way to capture a web page is to convert it to a pdf. pdfFactory is good and costs around £32, but there are some free alternatives:
Once installed, a new printer will appear in your list of printers. Although this may sound odd, to create a pdf of a web page all you have to do is to print the page to the pdf printer. You will then have the opportunity to save the resulting pdf file to your hard drive. This isn't just for creating a pdf of a web page — the pdf printer will be available from any Windows program and can be used to, say, create a pdf of a Word document or of a graphic in your favourite graphics program.
If your pdf writer can't cope with the way a page has been created, you will need to find another way to capture it.
Although it doesn't create a pdf from web pages, a useful free online alternative to creating a pdf from a Word document is to use PDF Online. Once your Word document is uploaded to the PDF Online website, it will be converted to a pdf and emailed to you. This is fine for an occasional document, but not for continual use!
Saving a webpage
Individual web pages can be stored on your PC.
Firefox
Click on File: Save Page As… There are two useful options under Save as type: Web Page, complete and Web Page, HTML only.
The first Save as type: creates a file that has the same name as to web page you are capturing. A folder is also created with the same name and this stores all the other information needed to reproduce the web page, including the graphics and style sheets. Once saved, it can be viewed in your browser and is should look the same as the original on the Internet. However, because the saved web page consists a file and a folder, it's a bit more difficult to send them to others.
The second Save as type: option does not store the graphice and style sheets, so you just get a basic web page with little or no formatting.
Internet Explorer
Click on File: Save As… There are three useful options under Save as type: Webpage, complete (*.htm, *.html), Web Archive, single file (*.mht) and Webpage, HTML only (*.htm, *.html),
The first and third options are similar to the Firefox options. The second option (Web Archive, single page) creates a single file that is easier to move around, but it can only be opened in Internet Explorer.
Google Chrome
Click on Customise and control Google Chrome: Save page as…There are two options under Save as type: Web Page, Complete and Web page, HTML Only. These are the same as the Firefox options.
Other browsers should be similar to those above.
Caching pages
An online cached copy of a web page can be very useful. One excellent service is FreezePage. They offer 5 MB free storage space and 10 MB if you register with them. Larger capacities up to 300 MB can be bought
A page can be captured by going to the FreezePage website and entering the URL of the page you want to capture. Alternatively, a button is available for your broswer toolbar that lets you capture a page directly.
Although there are some restrictions — the Web page must be less than 1.5 MB in total, have less than 250 embedded elements (images, stylesheets, script files, etc) and be retrievable within 90 seconds — in practice, it will capture the vast majority of pages. If you don't take out a subscription, you must also visit the site occasionally (30 days for unregistered users or 60 days for registered users) or you will lose files. If you take out a subscription, the pages are not deleted.
Once FreezePage has captured a page, it is given URL with a unique 20-character identifier and the page can be seen by anyone you give this URL to. Part of the URL is the date and time you froze the page (in Unix time format). See their FAQ for further details.
Another great feature of FreezePage is that it allows cached pages to be organised into folders. This helps spearate out different web pages captured for different purposes.
It is useful when writing about a web page to give a link to that page and also a link to the cached FreezePage page so that it can be seen what may have changed on the page.
