DON'S FREEWARE CORNER - AUG 2019
SCANNING AND TRANSCRIBING HANDWRITTEN DOCUMENTS SUCH AS LETTERS
Don's Freeware Corner articles are printed in the UTAH VALLEY
TECHNOLOGY AND GENEALOGY GROUP (UVTAGG) Newsletter TAGGology each
month and are posted on his Class Notes Page https://uvtagg.org/classes/dons/dons-classes.html
where there may be corrections and updates.
SCANNING AND TRANSCRIBING HANDWRITTEN DOCUMENTS SUCH AS LETTERS
©2019 Donald R. Snow - Last updated 2019-10-19
HANDWRITTEN DOCUMENTS
Most of us have collections of handwritten documents in our family
history archives, letters, deeds, wills, censuses, contracts, etc. Some of
these we would like to transcribe so they are more readable and
searchable. There are people and companies that will do this for a fee, but
most of us want to do it ourselves or have our family members help.
Such handwritten documents could be from
ancestors or even from yourself, e.g. our missionary letters home.
Digitizing them by scanning is a first step, so they can be backed up and
distributed. Digital copies in several places means they won't be destroyed
in the event of a disaster such as a fire or flood. Scanned copies are
easier to work with too, since they can be shown on screen, expanded,
darked, etc., to make them more readable. In text format, after transcription,
they can be examined and searched for names, words, or phrases, and people
can read them more easily. So a first step is to scan them into
digital format. Then they can be transcribed in various ways. You may even
want to "farm out" some of the scans to other family members to help
with the transcription.
SCANNING DOCUMENTS
Scanners are not expensive and you may
have access to a good one in your local Family History
Center. These scanners scan directly to flash drives, so you take your
hardcopy documents and a flashdrive to the FHC and scan them to your
flashdrive. At home you copy them to your computer and rename the files so
the names help you know what's in them without opening them and so that
they sort easily and in order for the person they pertain to. There are some helpful
ideas and programs in class notes on my website
https://uvtagg.org/classes/dons/dons-classes.html . Scanned documents can be edited to make them more readable or take out extraneous
parts.
Bleed-through is ink bleeding through the paper and some old
documents have so much bleed-through that the words are hard to read. There
are electronic ways to get rid of this bleed-through. Some scanners at FHC's
have settings that help minimize this during the scanning. My
missionary letters from Mexico and Guatemala to my parents in Los Angeles were written on
very thin "onion skin" paper so we didn't have to pay so much for airmail
postage. For some reason, unknown to me now, I wrote them all with a green-ink
fountain pen and it bled through the thin paper. (sigh) I
discovered recently that there are settings on the scanners in FHC's that
take most of this bleed-through out as you scan, so the scans are much
more readable than the originals. I didn't discovered this setting until after I had already
scanned all my letters, so I went back and rescanned them all. The color is
still with green ink, but most of the bleed-through is gone, so you only see what
I wrote on one side of the paper. Other things you can do to make scans more
readable are to darker the very light ones or lighten the very
dark ones.
To help you decide what scanner settings to use see other articles and
clas notes on my webpage. I usually scan
black and white documents to pdf at 150 dpi (dots per inch). Most
handwritten documents don't contain photos, but if they do, I use a higher
resolution. For photos the rule of thumb from the Library of Congress is to
scan them so the final copy is about 250 dpi, that is, each final inch has
250 dots or pixels. That means that, if you are scanning a 2 x 3 inch photo
and want to have it 4 x 6 inches, use 2 x 250 = 500 dpi to scan the
original. For most handwritten documents scanning at 150 dpi is sufficient.
Once the documents are scanned you can back them up and put copies on
flashdrives and/or email them to family members for backups or to help you
with transcription. You can also
post them on websites such as FamilySearch Family Tree, so they are
preserved and others can benefit from them.
TRANSCRIBING BY TYPING
The National Archives has a Transcription Tips website
at https://www.archives.gov/citizen-archivist/transcribe/tips . It suggests
you type exactly what you see, misspellings, words out of place,
strike-outs, etc. No one has come up with a satisfactory program to read
hand writing yet, so you have to do the work yourself. The simplest way is
to just read and type what you see into your favorite text editor. That
always works and, sometimes, is the best way to do it. Several programs are
available to help with this; see --
https://abundantgenealogy.com/word-word-document-transcribing-technology/ .
TRANSCRIPT
A free-for-non-commercial-use program called TRANSCRIPT is available from http://www.jacobboerema.nl/en/Freeware.htm .
It requires that the image be in digital (graphic) format such as .jpg or .tif .
The program doesn't have many image editing features, so you want it
already color-corrected and clear enough to read. It has
two panels, one above the other in the free version, with the image in the
top panel and what you type in the bottom panel. One nice feature is that, as
you type and press enter to go to the next line on your typed part, the
image moves up too, so you don't have to stop and move the image yourself.
The text you type can be saved in several formats including .rtf (rich text
format) and .ods (open document format), both of which are readable by most
text editing programs. The main prooblem that I have found with using
TRANSCRIPT is getting the scanned image into a readable form before
opening it in TRANSCRIPT. Below is a screenshot of TRANSCRIPT in action.
GENSCRIBE
Another program to help with transcription is GENSCRIBE. This is also free
for non-commercial use. Dick Eastman wrote an article about it in his
Eastman's Online Genealogy Newsletter; see --
https://blog.eogn.com/2014/12/16/genscriber-a-free-transcription-tool-for-genealogy-research/
. The program can be downloaded from
http://genscriber.com/genapps/en/start . To get it to install, I had to right-click the installation file and run
it as an Administrator, even though it says you shouldn't have to. To handle
pdfs you download and install a free add-on. I haven't had much experience
with this program, so can't say much more about it, but it looks helpful.
TRANSCRIBING BY READING INTO SPEECH-RECOGNITION
PROGRAMS
Another way to transcribe documents is to read them into speech- or
voice-recognition programs. There are several such programs, both free and
commercial. The premier commercial one is Dragon Naturally Speaking which is
updated regularly and costs about $100 when you find it on sale. It will
give fairly good results when you speak slowly and clearly. All
speech-recognition programs have you train them so they recognize your voice by
having you read s script into a microphone while it analyzes the way you
speak. Dragon Naturally Speaking gives good accuracy for slow and clear
speech, but it can't handle "continuous speech" very well. TContinuous speech is what they
call the way we all talk to each other. Several years ago I talked to the
Dragon Support group and they told me that it was not really capable of
getting high accuracy for continuous speech. I had phoned them to ask how to
transcribe my daily journal which I do on a digital recorder and used to do
on a cassette tape. So, with any speech-recognition program, you have to
plan on going back to correct errors. However, the result you get without
editing may be sufficient to search and find names, etc., until you have
time to correct it all. As far as I know, there are no programs that will
index audio speech yet.
WINDOWS 10 has speech-recognition software built-in
and that will be the subject of another Freeware Corner article. Also, there
are several apps on smartphones and tablets that could be useful as
speech-recognition to transcribe what you read from handwritten documents.
One of these is ava whose icon is am ampersand "&". It is free for 5
hours every month. If you buy the commercial version, your use time is unlimited. I learned about it from the Utah State Hearing Impaired program
classes I attende, since it can be used to transcribe a person's speech when
you can't hear them . You hold your smartphone near their mouth and they
talk and you read what they say. When using such a mobile device to
transcribe something, you have to know how to get the text from that program
into whichever text program you use.
CONCLUSIONS
Automatic transcription of
hand writing is still in the futture, but is coming. Some computer scientists told me recently that neural networks are
proving more accurate in reading hand writing than anything else. These are
computer networks that are "trained" by giving them lots of examples and
telling them "yes" or "no" when it "thinks" a word is a particular word. The
computer scientists told me that such networks were getting better accuracy
than trying to describe to the program which curves meant which letters.
Most of us have documents that need transcribing to make them
readable and searchable. This can always be done by hiring someone to do it,
e.g., a grand kid. The first step for any document is to scan it so you can
back it up, work with it to make it more readable, and give copies to
others. The next step is to find a way to transcribe it and for most of us,
that means doing it ourselves. This article has shown a few computer
programs that help and there are many others.
=================================