DON'S FREEWARE CORNER - JAN 2019

EXTRACTING PAGES FROM A PDF USING PDF-XCHANGE EDITOR

©2019 Donald R. Snow - Last updated 2019-01-11

Don's Freeware Corner notes are printed in the UTAH VALLEY TECHNOLOGY AND GENEALOGY GROUP (UVTAGG) Newsletter TAGGology each month and are posted on his Class Notes Page at http://uvtagg.org/classes/dons/dons-classes.html   where there may be corrections, updates, and additions.

PDF = PORTABLE DOCUMENT FORMAT

Portable Document Format is a standard format used world-wide for text files since they appear exactly the same on any computer or operating system. So, if you save a text file as a pdf, and most text writing programs have this feature, you know exactly how it will appear on any computer anywhere. Because of their universality many large companies and organizations, including the Church of Jesus Christ of Latter-day Saints, use pdfs for all their articles, handbooks, manuals, etc. When books are scanned and posted on the Internet, they are usually in pdf. Besides books online, there are thousands of documents that can be downloaded in pdf format. After downloading pdfs, you frequently want to work with them to separate them into parts or extract certain pages. This Freeware Corner article will show a way to do this using the free-for-non-commercial-work program, PDF-XCHANGE EDITOR. Most family history is non-commercial, so this program can be used for free for it.

PDF-XCHANGE EDITOR

This program is available from  https://www.tracker- software.com/product/pdf-xchange-editor  and, as mentioned, is free for non-commercial work. An earlier version was called PDF-XCHANGE VIEWER. The program has many features for working with pdfs, including OCR (Optical Character Recognition) which will make a non-searchable pdf into a searchable one. That's the subject for another Freeware Corner article another time. The program has the ability to extract pages from a pdf and I recently discovered that it  shows you the pages while you are working, so you can see what's on them without having to use another viewing program first. I have been looking for such a program for several years and didn't know I already had one on my computer. With this program you don't have to use a program to find the pages that you want to extract, note the page numbers, and then open the extraction program and type those numbers into it. Using this feature of PDF-XCHANGE EDITOR is the subject of this article.

EXAMPLE OF A PDF FROM WHICH TO EXTRACT PAGES

The Internet has thousands of pdfs that might interest you. For example, on FamilySearch Books, I found and downloaded a pdf of the book Woburn Massachusetts Vital Records Part 1 - Births 1640pdf -1873. There are many early New England Snows in Woburn Massachusetts and I wanted a copy of every page of this book that had one. The book has 297 pages and downloads as searchable. When downloading any book from FamilySearch Books, you get both the picture layer and the text layer, so they are every-word-searchable. That's not the case with Google Books and some other websites. To open a pdf in PDF- XCHANGE EDITOR go to File > Sessions > Browse, find the pdf on your computer, and select it. You'll see the file open in a large panel with a smaller Search panel open on the right. The screenshot below illustrates this.



 There is a scroll bar on the large panel, so you can scroll through the entire book. Since the text-layer is included with this pdf, you can search for any word in the entire file. When I searched for Snow, it immediately generated a list of every page with Snow entries, pages 13, 243, 244, 245, and 257. Knowing these pages, I could use any pdf extraction program and extract those pages, but PDF-XCHANGE EDITOR has that feature included. The procedure is described in the next section of this article. Keep in mind that the pages 13, 243, etc., are the pdf pages, not necessarily the numbers on the printed pages in the book, since the pdf usually contains introductory material, etc. It turns out that for this particular book, the pdf page number is the same as the printed page number, but that's not usually the case.

USING PDF-XCHANGE EDITOR TO EXTRACT PAGES

The pdf page number is shown in the box that is in the middle of the bottom of the panel. By clicking on the first Snow hit, page 13, it goes to pdf page 13 and highlights the name Snow on the page. In this case, the hit is near the bottom of the page and shows Susanna, daughter of Lawrence Bailey and Joanna Snow, was born 23 Jun 1719. Now, with this page open and viewable in the panel, click on Organize on the top line of PDF-XCHANGE EDITOR and then Extract Pages.  The above illustration has the Organize Menu open and you see the Extract Pages near the top left. Clicking here opens an Extraction Window for you to fill in the options of what the name  you want to call it, where to save it, and additional things. [Note: PDF- XCHANGE EDITOR indicates that there is a keyboard shortcut for this, namely CTRL+SHIFT+E, but that seems to open a window to email the page somewhere, and not the Extracton Window.] The Extraction Window has boxes to check for only the current page or certain page numbers. It also asks if you want to save all extracted pages in one file or in separate files and into which folder and with what name of the folder.  There are additional options that make it very convenient to save extracted pages. If the pages go together as an article, for example, you can save them in the same file. If you want each page separate, you can save them as separate pages. When you give it the name you want, PDF-XCHANGE EDITOR remembers that name for the next time you open the Extraction Window, so you can just change the page number or name and keep the rest of the information the same. My labeling for page 13 of this book is Snows-WoburnMassachusettsVitalRecords-Births-Part1- 1640-1873-Page013-FamilySearchBooks--2019-01-06.pdf. Also, I'm saving the file in my Screenshots folder. When everything is set the way you want, click OK. Then go to the next page(s) to save, in this case pages 243-245, change the dot from Extract This Page to Extract Pages and enter 243-245. change the page number in the name so the name has these page numbers, and click OK. The last page I wanted was 257, so I went to that page, clicked Extract Pages, checked Current Page, changed the title to Page 257, and clicked OK. I labeled the first page as 013, instead of just 13, so the files would sort in order on my computer. I now have three files extracted from this book with the Snows on them and I can use these as sources for births.

COMMENTS

When extracting pages from a pdf where you want nearly every page separate, you can leave the Extraction Window set for Extract Current Page, then enter the name for the page, and click OK. Where there are two or three pages that belong to the same topic, I just name the next page with Page2, Page3, etc., and then combine them later. For me this is easier than trying to look ahead to see which pages go together, if there are only a few such. It takes time to go through a book this way when you need every page, but it saves time in the long run and you end up with the extracted pdfs with the correct titles. I've done this with music Lead Sheet Fake Books from Internet Archive where nearly every page is a separate song, so I want each in a separate file. Lead Sheet Fake Books are music books with just the melody line and chord symbols. I use the saved music for my accordion concerts, because I can add my own harmonies, once I know the chord sequence.

CONCLUSIONS

PDF-XCHANGE EDITOR is a helpful program and I use it a lot. As mentioned at the start, it has built-in OCR and many other pdf features and I recently discovered that it will show me the pages and let me do extractions, so I can immediately decide which pages to extract. In short, it's a major help that is free for non-commercial work and that's nearly everything I do. I'll write about other features of the program in other articles.
================================