DON'S FREEWARE CORNER -- JUL 2014
USING HTTRACK TO DOWNLOAD WEBSITES
FOR PRESERVATION AND READING OFFLINE

This page was last updated 2014-07-08
Underlined titles are links.
To Google search these notes and subpages CLICK HERE and enter your search terms.

Return to Don's Freeware Corner Page or Don's Family History Class Notes Page . =============================================================================================

DON'S FREEWARE CORNER  2014-07
USING HTTRACK TO DOWNLOAD WEBSITES FOR PRESERVATION AND READING OFFLINE

©2014 Donald R. Snow
These notes are published in TAGGology, our Utah Valley Technology and Genealogy Group (UVTAGG) monthly newsletter, and are posted here on  http://uvtagg.org/classes/dons/dons-classes.html  where there may be updates, corrections, or additions.

CAUTION
The UVTAGG webpage has malware that we are trying to get rid of that adds unwanted links and even pages sometimes when you click on links.  This is NOT in my notes, but comes in from an external source from our website.  This happens with multiple browsers on multiple computers in multiple places, so it's not just on one computer.  I have noticed that it sometimes opens up a new tab in the browser and takes me there without me clicking on the new tab.  Sometimes deleting the new tab clears the problem and it doesn't recur until I open the browser again.  I am sorry for the problem, but there is nothing I can do about it at present.  BTW, I have noticed that downloading files from my website using HTTRACK doesn't download the malware with the webpages, so they show up clean in the downloads.

WHY USE HTTRACK TO DOWNLOAD A WEBSITE
Websites change all the time and even disappear.  You may want to preserve a copy or just have it available on your own computer to read without being connected to the Internet.  There are several free programs that will do this and this note discusses one of these, HTTRACK, available from  http://www.httrack.com/ .  The home page contains information about the current version, a download button, links to a manual, their forum, blog, and other information.  The Windows version of HTTRACK is called WinHTTrack.com.  There is a set of Step-By-Step instructions and suggestions of how to use it at  http://www.httrack.com/html/step.html .  There are also some no-no's there about using it incorrectly.  For older versions of websites that are now changed on the Internet you can frequently find them online at Internet Archive at  https://archive.org/ .  Internet Archive has been taking "snapshots" of the entire web every few days since the 1990's.  These are copies of the static parts of the web, not the dynamic parts that are generated when you fill in some blank.  For example, on FamilySearch Family Tree you have to enter your name or someone else's, so you can't use HTTRACK to download the entire Family Tree.

USING HTTRACK
After downloading and installing HTTRACK when you want to save a webpage, open HTTRACK, click on File > New Project > Next.  Here you give your project a name such as HTTRACKWEBSITE-FH-HELPS in the category FAMILY HISTORY, and include the folder you want to save it in, e.g. C:\DownloadedWebsites.  I have found that including HTTRACKWEBSITE in the title allows me to find these easily on my computer using the freeware program EVERYTHING that I have discussed repeatedly in these notes.  The pick arrows (downward pointing triangles at the ends of the lines) show the other projects and categories you have set up earlier.  You can save many websites into the category you select, but give the projects names, and maybe even dates, so you can tell exactly what and when you downloaded it, e.g. HTTRACKWEBSITE-UVTAGG-VideoLibrary-2014-07-08.  Then click Next and fill in the URL of the website you want to download, e.g. http://uvtagg.org/videolibrary/ .  Now click on Set Options.  Only a few of the many options here need to be changed from the defaults.  Click on the tab Limits and set the Maximum Mirroring Depth of how many levels down in the website you want to download.  This will depend on the website you are downloading.  For the UVTAGG Video Library you would probably only need 2 levels since there are no links that go below those levels.  The number of levels can be set low to start and update the download later, if you need more levels.  The Maximum External Depth refers to levels of links that take are not on the website you are downloading.  To start set this at 0 until you see if you need more.  On the tab Log, Index, Cache, put a check in Store All Files in Cache.  Leave the options in all the rest of the tabs as they default and click OK, then Next > Finish.  As it works you see a list of the files it is downloading with progress bars to indicate how it's doing.  You can cancel the operation at any time.  If you have set it to download many levels, it may take a long time (hours) to download.  If you set it for only a few, it will probably only take a couple of minutes to finish.  When it finishes, click Next and you see a panel on the left with your computer's file structure and the folder showing the projects you have downloaded.  You can click on the Browse the Downloaded Websites button or else click on the file labeled index.html to see a list in HTTRACK of all the projects you have downloaded in that folder.  Clicking on any one opens it in your browser.  For the URL in the address bar at the top of your browser you will see something like file:///C:/ ..., which indicates you are looking at the downloaded webpage as it is now stored on your computer.  If there are links beyond the levels you have downloaded and you are connected to the Internet, clicking there will take you to the online URL and you see the full address in the address bar.  So, by watching the address in the browser you can tell if you are looking at a downloaded copy or the online version.  Once you have downloaded the website you can read it in your browser without being connected to the Internet.  You can copy the website folder to a flash drive and transfer it to another computer, if you want.  This makes a good way to be able to read information without being connected to the Internet and to save a copy of the website to archive it yourself.  Be careful of the size of the websites you download since some are very large and take a long time to download and take up much storage space.  Unfortunately, there seems to be no way to tell how much of the website you have downloaded until it finishes.  While it is working you can be working on other things on your computer.  If it seems to be taking too long, there is a Cancel button which gives you the option of stopping and retaining what you have already downloaded or going back and deleting all the already-downloaded files.  Remember copyrights so you don't break copyright laws.  I don't think it is breaking copyright laws to have a downloaded copy of a website on your own computer to read later, as long as you don't pass it on, change it, or sell it, but I'm not an attorney.  You can update copies of websites on your computer by opening HTTRACK and using the update feature.  Once the website has been downloaded, you can use it in your browser to make pdf's, text files of the pages, screenshots, and even scrolling window screenshots with freeware.  But that's another article.

VIEWING YOUR DOWNLOADED WEBSITES
There are two ways to view your downloaded websites.  First, open HTTRACK and click on link for the project you want to view.  Second, without opening HTTRACK, go to the download folder and click on the index.html file.  This opens your browser and you see the list of downloaded websites.  For either method of viewing the downloaded websites the links will take you to downloaded pages on your own computer as far down in levels as you saved, and after that, will take you online, if your computer is connected to the Internet.

WEBSITES WITH PASSWORDS
In the FAQ's (Frequently Asked Questions) on the Helps page is an example of how to download a website that requires a user name and password.  Here's the format to put in the HTTRACK box:  http://[user]:[password]@www.somewebpage.com/mybox.html .  However, if the website requires you to enter additional information before continuing on, I don't know how to do that.

CONCLUSIONS
You may want to experiment with downloading some useful websites to learn how to use HTTRACK and then keep it in mind for saving something you really need later.

=================================================================================
Return to Don's Freeware Corner Page or Don's Family History Class Notes Page .