Class: WORKING
WITH LARGE DATABASES:
THE EARLY LDS MEMBERS DATABASE
©2007 by Donald R. Snow
Sections of the Class Notes
Return to Don's
Class
Listings page or to the home page of Utah Valley Technology and Genealogy Group . This page was last
updated 2 Mar 2007.
INTRODUCTION
- Instructors
are Donald R. and Diane M. Snow, 801-225-7123 in Provo, Utah
and 435-673-1932 in St. George, Utah (snowd@math.byu.edu,
dms34@juno.com)
- These notes with
active Internet links are posted on the
Utah Valley Technology and Genealogy Group website - http://uvtagg.org
under Class Outlines, Don's Listings..
- See Don's notes on Cleaning
Up Your Database on Utah Valley Technology and Genealogy Group for how data should be entered and for more details on some of these ideas and additional procedures
- Will discuss here working with large databases
(thousands of names)
- Brings in different problems, e.g.
identifying the
duplicates, errors, problems with places, multiple marriage links of
same people
- Will discuss new techniques we have developed while working on the large Early LDS
Members database
- These techniques work on databases of any size
- About our Early LDS Database posted on the Internet at http://earlylds.com
- Has about 54,000 names and is for
genealogists and LDS Church historians
- Goal is to
include all early LDS members
up through the
early Utah period, with their families and many sources
- Originally started by missionaries
in the
Illinois Nauvoo Mission many years ago -- many data entry errors since worked on by so many people
- To read the Introduction to the file click on Search button with nothing
in search fields and look for Introduction near top
- Has our email address - snowdr@gmail.com - for people to send corrections and additions in with primary sources to consider for inclusion
- Examples of why you need to check for errors
FINDING DUPLICATES
- In small databases duplicates are usually easy to
spot and PAF’s merge option finds most
- In large
databases duplicates may not be obvious since far apart
alphabetically and PAF's merge option doesn't find
them all -- also, you may have to go through many suggested merges that
aren't duplicates at all and PAF has no way to mark these as
"non-duplicates"
- PAF Insight and other programs find more, but still don't
find all
- Checking by finding same dates, e.g. same birth,
marriage, or death dates
- Use GENViewer -- http://mudcreeksoftware.com/ -- commercial program, but has a free lite version) to show
columns with dates, sort
on that column, and look for people with same birth date, marriage
date, or death date
- Picks up duplicates like Abigail and Nabby, or maiden
name and married names, which aren't recognized as duplicates
by
other programs
- Soundex-sorted lists
- GENViewer columns can show Soundex of
Surname and Soundex of Given Name and can sort on any
column, but not on two columns
- By exporting to a CSV (comma separated
values) list, can import into a
spreadsheet and sort first on Soundex of
Surname and
second on Soundex of Given Name -- helps to put Soundex
duplicates close to each other so you can
recognize them
- Some duplicates are never recognized this
way, e.g. Kuhn and Coon since Soundex of one starts
with K and the other with C -- are only recognized by "chance" or by
using a more
powerful phonetic system, e.g. the Daitch-Mokotoff system
- For name variants see http://www.namethesaurus.com/ -- shows surname and given name variant spellings
FINDING OTHER DATA ERRORS
- See more details on some of these and other techniques in
my notes on Cleaning
Up Your Database on Utah Valley Technology and Genealogy Group .
- Be sure to check for and correct file structure errors before
working on data
errors.
- Many data errors are misspellings or
incomplete places, etc., and can be corrected without
much genealogical research
- Finding errors and getting corrections in place names
- Place Sorted List in PAF -- helpful to recognize problems
with places
- U.S. Cities Galore2 -- http://www.uscitiesgalore.com/
-- will edit GEDCOM's, but not PAF file, but allows checking locations
- World Place Advisor (current version is called Map My Family Tree -- http://www.progenygenealogy.com/ -- not an editor, but will check your entire database for errors in locations
- Family Atlas -- http://www.familyatlas.com/ -- matches your place list against a world-wide directory to see if spelled correctly
- Use Internet search engines to find correct info, e.g.
type "Illinoin" (without quotes) into Google and you get "Did you mean
Illinois?", or type "Nauvoo Illinois county" (without
quotes) into Google and you get Hancock County
- Finding loops in data, i.e. someone who is
linked as his own ancestor; not usually
obvious -- two programs that find loops
- PAF Companion 5.2 -- bundled with PAF 5.2 from LDS Church
Distribution Centers
- Using GENViewer to find data entry errors -- http://mudcreeksoftware.com/ -- set highlight conditions as follows:
- Married - Same Name – finds duplicate
marriages where same spouse links are in there two or more times
- Surname Not Father’s – finds children
in wrong families
- Surname Not Parents – finds children in wrong
families
- Surname is Mothers – finds problems with
children
- # Families > 1 – finds duplicate
marriages
and families
- Children with Same name – shows duplicate
individuals in families, but be careful since it was a common practice
to name
another child after one that had died
- Couple Same Surname – shows spouses with same
surname (may be OK)
- Married - No Spouse + # Descendants = 0 – shows
“Unknown” spouse links with no children
UPDATING AND ADDING DATA FROM OTHER
FILES
- Problem of combining best data when several people are
updating info simultaneously
- Can't use PAF's (or Legacy's or other programs) Unique
Serial ID numbers since all the changes you made to records show up
there, too -- what you need is a way to select out just what others
have changed
- Method we developed
- For the secondary file make a Custom Report of "Last Date Changed is
Greater Than [date they started working on it]" -- include
columns for RIN's, names, spouses, birth info, parents; sort it alpabetically or by
RIN's, and print it landscape, so it has one record per line
- Circle obvious
problems on this custom report
- Use PAF Insight to Compare and Synchronize names and data
from secondary file into main database by using names or RIN's from
custom report -- write on custom
report things you find that need correcting
- Run Merge on all the names added from the secondary
database
- Make corrections in main file that you noted on
the custom report
- Use World Place Advisor to find more place errors for the
added data
- Check for more than one set of parents, since PAF Insight
brings in extra sets of parents when sync'ing in data
PUTTING FAMILIES TOGETHER AND
GETTING FURTHER INFORMATION
- Find links for families by looking
at info in the notes --
spouse, parents, children
- Entering these as real people helps identify that
individual and link them into families
- Check other databases for family links,
e.g. Ancestral File, IIGI,
Ancestry.com,
Censuses
- Wives with unknown maiden surnames
- Our file had hundreds of Elizabeths,
Marys, etc., with no surnames
- We used a non-standard way
to show these in our database -- we entered them as Mary
/Jones (Mrs.)/
- Makes them alphabetize near their husbands,
so they are not just Mary // or Elizabeth // with the
other hundreds like that
- Helps to identify them for this database, but this is
NOT
the
way to do it for
temple work
- Easy to find all of these, if needed,
by searching for
(Mrs.) in
surname field
- Putting Mrs. in Married Name field in PAF doesn't work,
since names don't show up in the Individual View with Mrs.
- Using Nauvoo Databank -- see my class notes on The
Nauvoo
Databank on Utah Valley Technology and Genealogy Group
- Available at Nauvoo FHC, the BYU Regional FHC, and a few other FHC's in Utah
- Contains texts
of hundreds of journals, books, articles,
databases, and the entire Susan Easton-Black Membership of the LDS Church 1830-1858 database
- Data is text in Folio 4.2 format, so it is
every-word searchable by combinations of names, dates, places,
and even by misspellings
- We are also adding stories from other sources such
as Church News FH articles
ADDING DATA EVENTS
- Have developed a way of entering web links
in
PAF
events so the generated
web page has these as “hot” links and are entering User Events for Internet
links,
Residences, Overland Travels, etc.
- Examples -- see Residence Events for Mormons living in
Mormon communities in Iowa -- do an Advanced Search on http://earlylds.com for Other Events, Residence contains Montrose
- Will add other Resident Events
from sources such as journals, when we find them
- Are adding Overland Travels Events for when they crossed the plains with links to the
Pioneer Company on http://lds.org/portal/site/LDSOrg (Click on About the Church/Church History, then link to Mormon Pioneer Overland Travels)
MOVING SOURCES FROM NOTES
- Problem of moving sources from notes into real sources --major problem
- PAF 2.31 (1994) -- sources were kept in notes for the person, so sources entered at different times or by
different people
was usually in different format and not uniform
- PAF 3 (about 1997) -- allowed entering real sources and citations, instead of putting them in notes
- Advantages of entering real sources instead of putting them in notes
- Source only has to be entered once, then cite that reference, page number, etc.
- All references to that source are uniform
and can be edited one time to change it throughout entire database
- Reports are uniform
- Can enter an active Internet link for that source, if it has one
- Can generate a report in PAF 5.2 of all
individuals citing that source -- go to Print Menu/Lists/Citations Referencing a Source
- For small databases easy to move sources, but not
for large databases; e.g. if there is an average of 4 sources
for each
individual in our 54,000-name database, that
means moving over 200,000 source citations!
- Many of us have old PAF files that we never
moved
the sources from the notes since it's too much work.
- New program written to
help us move sources from notes into real sources and citations -- see the results in http://earlylds.com
- For now we have left the sources in the notes, too.
GETTING COMPLETED TEMPLE DATA
INTO THE FILE
- For historical reasons we want the earliest completed
temple ordinance data in the file, not later proxy data, unless original data can't be found
- No easy way to find earliest
ordinance dates automatically -- new FamilySearch may help it's when released
- May have to use a combination of PAF
Insight, FamilySearcher, and
PAF's ALT-S-F trick to find earliest temple data
COMMENTS AND THINGS STILL PLANNED FOR THE
EARLY LDS MEMBERS DATABASE
- We are still working on this project even though released from our missions in Nauvoo moree than a year ago.
- Susan Easton-Black's (SEB) Early Church Membership
Database
- Are forming a GEDCOM of the entire SEB file, including the sources
in the notes, so we can run the new Notes-To-Sources
program and then merge it into our database
- Already have a version of the SEB file in a large PAF
file with 132,000 names -- probably half duplicates, so merging will put families together
- SEB's file in PAF format
gives a way to find problems, typos, dates that
don't make sense -- very helpful format
- Much of the SEB data is already in
our Early
LDS database, but not all -- it will help put
families together when we can merge it in
- Overland Travels data on http://lds.org/portal/site/LDSOrg
- Will use Church Historian's Office spelling
of
names as a standard
- Will enter more Internet links in Events in our
file showing
the company and year they
crossed the plains
- Are working on a way to convert the
online Overland Travels html files to GEDCOM’s so we can enter the data more easily
- Mormon Communities spreadsheet list
- Have made a Mormon
Communities
Spreadsheet File with info such as the dates Mormons settled that
community and when they left
- Have info about communities around Kirtland,
around Nauvoo in both Illinois and Iowa, and some Mormon
communities in the west
- List helps when we come across places
like Zenos, Arizona -- Zenos is
the
old name of Mesa, Maricopa, Arizona
- Further data to check
- Harvey Black's Early Seventies Data to be sure
it is all included
- Deaths on the Trail to be sure we have as complete
a
list as we can
- Data on web sites
such as Illinois Marriages, Illinois
Deaths, Utah Marriages, Utah Deaths, etc., to see that we have all data
available and correct and with sources and links to the web sites
- Several other projects to incorporate since so much new information about early LDS is coming online all the time
CONCLUSION
- This is a major project and we
hope it is helpful to family and Church historians.
Return to Don's
Class
Listings page or to the home page of Utah Valley Technology and Genealogy Group .