dons-internetarchive.html

INTERNET ARCHIVE TEXT ITEMS

©2020 Donald R. Snow -- Page last updated 2020-05-13
Return to the  Utah Valley Technology and Genealogy Group Home Page  or  Don Snow's Class Notes Page .
ABSTRACT:  Internet Archive is a free website supported by donations, but they are not required to use it. Their goal is to store the entire world's knowledge electronically and make it available to everyone for free. They do this by digitizing and posting books, magazines, music, movies, radio and TV broadcasts, concerts, software, and more, and every few days they make and store a "snapshot" of the internet. Items are organized in thousands of collections and everything that can be is searchable and downloadable. They have contracts with libraries , universities, churches, and other organizations to store and make available their items. This class will discuss the text items that it contains and how to search, use, and download them.  Additional classes and notes deal with the other types of materials they collect. The notes for this class and related articles, all with active internet links, are posted on my website  https://uvtagg.org/classes/dons/dons-classes.html .

    WELCOME AND INTRODUCTION

  1. Instructor is Donald R. Snow ( snowd@math.byu.edu ) of Provo and St. George, Utah. 
  2. The notes for this class and related articles, all with active internet links, are posted on my website  https://uvtagg.org/classes/dons/dons-classes.html
  3. Tips:  (1)  To put an icon on your desktop for the URL for these notes, or any webpage, just drag and drop on your desktop the icon from in front of the address in your browser.  (2)  To open a link while keeping your place in the original page, hold down the Control key while clicking the link, so the link opens in a new tab. 
  4. The problem for today:  What is the Internet Archive and how to search it to find and use text items.
  5. Other classes and notes discuss the WayBack Machine, websites, audio, video, music, and more on their website.
  6. INTERNET ARCHIVE

  7. The Internet Archive is a free website at  https://archive.org/.  ("archive" is singular here.)
  8. Quote from Wikipedia article about it -- https://en.wikipedia.org/wiki/Internet_Archive  -- "The Internet Archive is an American Digital Library with the stated mission of 'universal access to all knowledge.' It provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and millions of books."
  9. To accomplish their goal they have contracts with libraries, universities, churches, and other organizations to store and post their materials and, periodically, they make a snapshot of the entire internet -- everything they post is available for free to anyone.  
  10. Was founded in 1996 by Brewster Kahle in San Francisco, California -- https://en.wikipedia.org/wiki/Brewster_Kahle , and is supported entirely by donations, but you don't have to donate to use it -- Kahle has been a keynote speaker and presenter at many family history conferences, including several RootsTech genealogy conferences in Salt Lake City.  
  11. Free accounts are available to receive information, store information, upload data and websites, and several other things, but you don't need an account to access most things.   
  12. The Internet Archive blog is at -- https://blog.archive.org/ .   
  13. INTERNET ARCHIVE COLLECTIONS

  14. The icons in the middle of the home page (hover to see their titles) are: Web, Texts, Video, Audio, TV, Software, Image, Concerts, and Collections. Under each is the number of those items, usually in the millions or 100,000s.
  15. The search box ABOVE the icons is for the WayBack Machine which shows past versions of webpages and is a discussion for another class  
  16. The search box BELOW the icons is for items or collections and we will discuss it later.
  17. Clicking one of the icons gives you an idea of the great numbers of items they have 
    1. Texts -- 25 million items, including books, ebooks, magazines, and other texts  
    2. Audio -- 10 million items  
    3. Collections -- more than 800,000 collections, some containing millions of items     
  18. To get back to the home page from any other page click the Internet Archive icon in the upper left corner.
  19. Scrolling down the home page shows their Top Collections (I don't know what that means.) and you can keep clicking at the bottom to open more and more pages (hundreds of pages); the numbers in the boxes are the number of items in that collection; some have millions of items and some just a few
  20. Items may be listed in more than one collection 
  21. After clicking any collection, near the center left is the word About -- gives a description of that collection:  Examples
    1. American Libraries shows the collections from libraries in the U.S:  university and public libraries, the Family History Library in Salt Lake City, the Library of Congress, and more -- this entire collection has more than 3M items
    2. California Digital Library has almost 200,000 items
    3. Click on the word Collection (center left) to go back to view the collection
  22. The collections can be sorted by the tabs in various ways, Title, Date Published, etc.
  23. To show the collection in a list click on the icon to the right above the collection; click on the box there to show more details of each item in the list; if the list is not too long, you can hold down the Page Down key and keep "fetching" more records until you have the details of the entire collection in the list; then use the CONTROL+F Find keyboard shortcut to find what you want   
  24. SEARCHING

  25. Their search helps guide is:  Search - A Basic Guide for Internet Archive -- https://help.archive.org/hc/en-us/articles/360018359991-Search-A-Basic-Guide
  26. Searching is not always easy in Internet Archive, since it contains so much, and I can't find anything about their search syntax, e.g. does it make a difference what order the words are in?  Is there a way to get exact expressions by using something like quote marks?  Does punctuation matter?  It does seem to do stem searches; that is, if you search for Don, you get Don, Donald, Donnie, etc.
  27. Search metadata vs Search text contents
    1. Click in the search box to show the options for searches  
    2. The search box defaults to Search metadata which is the information the indexers included about the source; searching for "utah genealogy" gets more than 900 hits and "utah family history" gets more than 1,600 hits 
    3. The search box for Search text contents for "utah genealogy gets more than 1M hits and "utah family history" gets about 300,000 hits   
    4. "united states" (without the quotes) gets 2M hits for Search metadata and 13M for Search text contents
    5. "genealogy" (without the quotes) gets 150,00 in metadata and 400,000 in contents
    6. "yearbook" (singular and no quotes) gets 40,000 in metadata and 300,000 in contents 
  28. On the left side of the search results are filters to narrow your search
    1. Media types -- if you start in Texts, you won't see other media types here
    2. Year, Topics & Subjects, Collection, Creator, Language, etc. -- the numbers of items in that collection are shown
    3. Clicking one or more of these selects only those and it may take a few seconds to respond -- you'll see the names of your selected groups highlighted at the top of their categories
    4. For some searches you see an icon to the right of Topics & Subjects which opens up many pages of subtopics; click as many as you want and click Apply Your Filers
  29. The Advanced Search section allows you to include or exclude search terms in various sections 
  30. Ideas to try for searches -- go to Texts and modify these for your own interests -- if there are many, keep scrolling down and at the bottom it will say "Fetching More Results ..." and keep adding more items
    1. [your name] in Search text contents; also your name in various orders, surname first, etc.; also nicknames
    2. A state of the U.S., e.g. California    
    3. "mormon migration" Denmark
    4. "fakebooks" in Search text contents -- these are books with music lead sheets with the melody line and chord symbols (I use these for accordion concerts :=) )

    BOOKS ON INTERNET ARCHIVE

  31. The Internet Archive has millions of books with many helpful in family history
  32. Some books still in copyright can only be checked out for 14 days and not downloaded permanently -- you need an Open Library membership (free) for Internet Archive to borrow books
  33. Books and text items are every-word searchable and most can be downloaded and the downloads are every-word searchable  
  34. Examples of books on Internet Archive for Utah family history research
    1. Pioneers and Prominent Men of Utah by Frank Esshom - pictures, biographies, and histories
    2. Church Chronology (several editions) by Andrew Jenson - day-by-day chronology of ships, immigration, happenings, travels, details - completely searchable
    3. Times and Searsons - 6-year periodical published in Nauvoo 1839-1946
    4. Journal of Discourses, Conference Reports, Improvement Eras, Women's Exponent, many more 
    5. Ward and stake records
    6. Books by Orson Pratt, Parley P. Pratt, John Taylor, B.H. Roberts
    7. Early histories of the Church of Jesus Christ of Latter-day Saints
  35. SOME FAMILY HISTORY TEXT COLLECTIONS OF INTEREST

  36. Below ae a few of the many collections of interest to genealogists -- besides family history you will find many other collections for your other interests        
  37. These lists are only a sampling; when you look at these collections on Internet Archive, you will see the numbers of items included in each and it's in the millions for some 
  38. There is a search folder for records added in different time periods, so if you keep track of when you search, you can just search additions since the last time you looked. 
  39. LIBRARIES -- Family History Library, Brigham Young University Library, Church History Library (LDS), David O. McKay Library (BYU-Idaho), European Libraries, Boston Public Library, Indianapolis Public Library Yearbooks, The Boston Library Consortium, The Newberry Library, The Library of Congress, New York Public Library
  40. UNIVERSITIES -- BYU Campus Publications, University of Michigan Books, University of Pennsylvania Libraries, Kansas State University Yearbooks, Kansas State University Newspapers, UCLA Yearbook Collection
  41. MILITARY -- U.S. Military Pensions, British Army List, British Navy Lists, WWII Archive
  42. UNITED STATES -- California Digital Collection, Minnesota Historical Society, North Carolina Digital Heritage Center, Pennsylvania Germans, State Library of Massachusetts
  43. NEWSPAPERS -- Kentucky Digital Newspapers, Newspapers, Daily Colonial Newspaper Collection, The PastPages News Homepage Archive, Giganews Usenet Collection
  44. BOOKS, MAGAZINES, and JOURNALS -- Internet Archive Books, Journals, JSTOR Early Journal Content, Computer Magazine Archives, Computer Newsletters: User Groups and Flyers, Million Book Project, The Magazine Rack
  45. FAMILY HISTORY -- Genealogy, Passenger and Crew Vessel Lists for New York, NY 1897-1957, Reclaim the Records (mostly vital records of Eastern U.S., Rutland Historical Society, Scottish Family History, Congregational Library of the, Congregational Association, UF Family Search
  46. MAPS -- USGS Maps, USGS Maps of Arizona, California, Colorado, Idaho, Nevada, Texas, Utah, etc.
  47. UNITED STATES CENSUSES
    1. All U.S. Federal Censuses
    2. To find them for a state, search for "federal census [state]" (without the quotes); can be sorted in several ways and you can print or download any you want in several formats, including pdf   
    3. Has a Census Reader to show two pages of the census side-by-side; click to turn pages; works fast for showing the pages 
    4. Internet Archive does NOT have a U.S. census index, so use the index on FamilySearch or Ancestry, etc., to find film number before looking on Internet Archive 
    5. U.S. censuses were filmed several times and some parts are clearer in some sets than others, so check here, too -- other online sources are FamilySearch Historical Records, Ancestry, and HeritageQuest Online
  48. LDS PUBLICATIONS - some were mentioned above  
    1. General Conference Reports
    2. LDS Church Magazines -- includes Ensign, Improvement Era, Relief Society Magazine, Children's Friend,  Women's Exponent, etc.; magazines since 1971 are on Gospel Library
    3. LDS Church Magazines in other languages - Liahona of Mexican Mission, Church magazines in German, Scandinavian, and Dutch
    4. Latter-day Saints Millennial Star (England)
  49. COMMENTS AND CONCLUSIONS

  50. The number and types of collections and items here is staggering and they are adding new collections all the time -- My problem is that I lose track of time while looking through it  
  51. For your genealogy research consider university and library collections near the locations of your ancestors, since they may have local collections  
  52. For other interests or, if your kids or grandkids need help for school, there are collections such as the Khan Academy videos; search for things like  "mathematics khan" (without the quotes) 
  53. In other classes we will discuss the WayBack Machine (old versions of websites), audio, video, and other types of collections on Internet Archive.  

  54. Return to the  Utah Valley Technology and Genealogy Group Home Page  or  Don Snow's Class Notes Page .