DON'S FREEWARE CORNER - JAN 2019
EXTRACTING PAGES FROM A PDF USING PDF-XCHANGE EDITOR
©2019 Donald R. Snow - Last updated 2019-01-11
Don's Freeware Corner notes are printed in the UTAH VALLEY TECHNOLOGY AND GENEALOGY GROUP (UVTAGG)
Newsletter TAGGology each month and are posted on his Class Notes Page at
http://uvtagg.org/classes/dons/dons-classes.html
where there may be corrections, updates, and additions.
PDF = PORTABLE DOCUMENT FORMAT
Portable Document Format is a standard format used world-wide for text files since
they appear exactly the same on
any computer or operating system. So, if you save a text
file as a pdf, and most text writing programs have this feature,
you know exactly how it will appear on any computer anywhere.
Because of their universality many large companies and
organizations, including the Church of Jesus Christ of Latter-day
Saints, use pdfs for all their articles, handbooks, manuals, etc.
When books are scanned and posted on the Internet, they are
usually in pdf. Besides books online, there are thousands of
documents that can be downloaded in pdf format. After
downloading pdfs, you frequently want to work with them to
separate them into parts or extract certain pages. This
Freeware Corner article will show a way to do this using the
free-for-non-commercial-work program, PDF-XCHANGE EDITOR.
Most family history is non-commercial, so this program can be
used for free for it.
PDF-XCHANGE EDITOR
This program is available from
https://www.tracker-
software.com/product/pdf-xchange-editor and, as mentioned,
is free for non-commercial work. An earlier version was called
PDF-XCHANGE VIEWER. The program has many features for
working with pdfs, including OCR (Optical Character
Recognition) which will make a non-searchable pdf into a
searchable one. That's the subject for another Freeware Corner
article another time. The program has the ability to extract
pages from a pdf and I recently discovered that it
shows you the pages while you are working, so you can see what's on them
without having to use another viewing program first. I have
been looking for such a program for several years and didn't
know I already had one on my computer. With this program you
don't have to use a program to find the pages that you want to
extract, note the page numbers, and then open the extraction
program and type those numbers into it.
Using this feature of PDF-XCHANGE EDITOR is the subject of this article.
EXAMPLE OF A PDF FROM WHICH TO EXTRACT PAGES
The Internet has thousands of pdfs that might interest you. For
example, on FamilySearch Books, I found and downloaded a pdf
of the book Woburn Massachusetts Vital Records Part 1 - Births
1640pdf -1873. There are many early New England Snows in
Woburn Massachusetts and I wanted a copy of every page of this
book that had one. The book has 297 pages and downloads as searchable. When downloading
any book from FamilySearch Books, you get both the
picture layer and the text layer, so
they are every-word-searchable. That's not the case with
Google Books and some other websites. To open a pdf in PDF-
XCHANGE EDITOR go to File > Sessions > Browse, find the
pdf on your computer, and select it. You'll see the file open in a
large panel with a smaller Search panel open on the right. The
screenshot below illustrates this.
There is a scroll bar on the large panel, so you can scroll
through the entire book. Since the text-layer is included with
this pdf, you can search for any word in the entire file. When I
searched for Snow, it immediately generated a list of every page
with Snow entries, pages 13, 243, 244, 245, and 257. Knowing
these pages, I could use any pdf extraction program and extract
those pages, but PDF-XCHANGE EDITOR has that feature
included. The procedure is described in the next section of this
article. Keep in mind that the pages 13, 243, etc., are the pdf
pages, not necessarily the numbers on the printed pages in the
book, since the pdf usually contains introductory material, etc. It turns
out that for this particular book, the pdf page number is the
same as the printed page number, but that's not usually the
case.
USING PDF-XCHANGE EDITOR TO EXTRACT PAGES
The pdf page number is shown in the box that is in the middle of the bottom of
the panel. By clicking on the first Snow hit, page 13, it goes to
pdf page 13 and highlights the name Snow on the page. In this
case, the hit is near the bottom of the page and shows Susanna,
daughter of Lawrence Bailey and Joanna Snow, was born 23 Jun
1719. Now, with this page open and viewable in the panel, click
on Organize on the top line of PDF-XCHANGE EDITOR and then
Extract Pages. The above illustration has the Organize Menu open and you see
the Extract Pages near the top left. Clicking here opens an
Extraction Window for you to fill in the options of what the
name you want to call it, where to save it, and additional things. [Note: PDF-
XCHANGE EDITOR indicates that there is a keyboard shortcut for
this, namely CTRL+SHIFT+E, but that seems to open a window to
email the page somewhere, and not the Extracton Window.] The
Extraction Window has boxes to check for only the current page
or certain page numbers. It also asks if you want to save all
extracted pages in one file or in separate files and into which
folder and with what name of the folder.
There are additional options that make it very convenient to
save extracted pages. If the pages go together as an article, for
example, you can save them in the same file. If you want each
page separate, you can save them as separate pages. When you
give it the name you want, PDF-XCHANGE EDITOR remembers
that name for the next time you open the Extraction Window, so
you can just change the page number or name and keep the rest
of the information the same. My labeling for page 13 of this
book is Snows-WoburnMassachusettsVitalRecords-Births-Part1-
1640-1873-Page013-FamilySearchBooks--2019-01-06.pdf. Also,
I'm saving the file in my Screenshots folder. When everything is
set the way you want, click OK. Then go to the next page(s) to save, in
this case pages 243-245, change the dot from Extract This Page
to Extract Pages and enter 243-245. change the page number
in the name so the name has these page numbers, and click OK.
The last page I wanted was 257, so I went to that page, clicked Extract
Pages, checked Current Page, changed the title to Page 257, and clicked
OK. I labeled the first page as 013, instead of just 13, so the files
would sort in order on my computer. I now have three files extracted from this
book with the Snows on them and I can use these as
sources for births.
COMMENTS
When extracting pages from a pdf where you want nearly every
page separate, you can leave the Extraction Window set for
Extract Current Page, then enter the name for the page, and
click OK. Where there are two or three pages that belong to the
same topic, I just name the next page with Page2, Page3, etc.,
and then combine them later. For me this is easier than trying
to look ahead to see which pages go together, if there are only a
few such. It takes time to go through a book this way when you
need every page, but it saves time in the long run and you end
up with the extracted pdfs with the correct titles. I've done this
with music Lead Sheet Fake Books from Internet Archive where
nearly every page is a separate song, so I want each in a
separate file. Lead Sheet Fake Books are music books with just
the melody line and chord symbols. I use the saved music for
my accordion concerts, because I can add my own harmonies,
once I know the chord sequence.
CONCLUSIONS
PDF-XCHANGE EDITOR is a helpful program and I use it a lot. As
mentioned at the start, it has built-in OCR and many other pdf
features and I recently discovered that it will show me the
pages and let me do extractions, so I can immediately decide
which pages to extract. In short, it's a major help that is free for
non-commercial work and that's nearly everything I do. I'll write
about other features of the program in other articles.
================================