CLEANING UP YOUR DATABASE
©2010 by Donald R. Snow
Sections of the Class Notes
This page was last updated 2010-04-01.
Return to the
Utah Valley Technology and Genealogy Group Home Page
or
Don's Class
Listings Page.
WELCOME AND INTRODUCTION TO CLASS
- Instructors are Donald R. and Diane M. Snow (
snowd@math.byu.edu
and
dms34@juno.com
) of Provo, Utah; St.
George, Utah; and Nauvoo, Illinois.
- These notes with active Internet links are posted on the Utah Valley Technology and Genealogy Group website http://uvtagg.org > Class
Outlines >
Don's Listings
. Many other class notes for
family history are posted there, too.
-
These notes are written for PAF (Personal Ancestral File), but the ideas are much the same for any genealogy program, i.e. Ancestral Quest, RootsMagic, Legacy, and Family Tree Maker
- Any database being worked on needs cleaning up periodically -- example of our Early LDS online
database at http://earlylds.com
-- 54,000 names of early LDS Church members -- have made thousands of corrections, but many more still needed
CLEANING UP YOUR DATABASE - WHY
- Reasons you need complete, uniform, and accurate
names, dates, and places
- To be able to find what you are looking for when
filtering and making reports and lists for library visits and/or genealogy field trips
- To eliminate duplicates of same person in twice
with different spellings or even the same spelling
-
To help with combining and correcting all the duplicates and data
in New FamilySearch
- For temple work
-
To be able to find already-completed ordinances so you don't duplicate them again
- To be able to send names to the temple with accurate data
- To have countries, states, and locations recognized
- To not submit duplicate names to the temple since
you may have same person in your database twice -- it is estimated
that 25% of current temple duplications come from the same person
submitting the name twice
HOW DATA SHOULD BE ENTERED
- NAMES -- names (in English and most languages) should be entered in PAF as spoken
- PAF Preferences can be set to put the slashes
around the surname, e.g. [Given Names] /[Surname]/ [Title, e.g. Jr., Sr.,
III]
- See Tools/Preferences in PAF 5 for options to automatically enter slashes // around surnames depending on the way the surname occurs in that language
- In English enter the name as spoken [Given Names
Surname Jr.] and it enters the slashes and knows that Jr. is a suffix and
not part of the surname
- Names and places should be entered in mixed case (upper and lower case letters), e.g. Richard Daniel Smith, not as RICHARD DANIEL SMITH nor Richard Daniel SMITH
- But, for ease of reading later in PAF 5.2, click Tools/Preferences/Names and check the box for Surnames in Caps
- That way they show in caps in PAF, but they can be shown as mixed case when you want
- PLACES -- places should be entered completely from smallest to largest jurisdiction, e.g. "City, County, State, Country", separated by comma-space
- The commas are required and act like "place holders" in numbers, e.g. 321 is different than 3210
- The spaces are for uniformity and ease of reading
- PAF 5.2 is NOT limited to 4 fields for places like old PAF was and some locations require more than 4 fields
- Names and places should be complete and uniform
- No abbreviations -- spell it out so no confusion -- examples of confusion from abbreviations
- "CO" can mean County or Colorado
- "CA" can mean California or Central America
- DE can mean Delaware or Germany (Deutschland)
- Include country to avoid problems -- e.g., Does
"Georgia" mean the state of the US or the country?
- New FamilySearch wants United
States spelled out, not USA since that's the abbriviation for Union of
SouthAfrica, too
- Places should normally be entered as at time of the event -- see PAF 5.2 Users Guide, p 45 by clicking on the PAF Help/Users Guide
- Most people don't follow this rule completely, e.g. they write Woburn, Middlesex, Massachusetts, USA, for a location in the 1600's even though there was no USA before 1789
- Crucial thing is (1) to identify the individual, and (2) so someone can find the location and records
- Can include
information on county changes in the notes
- Get exact places, where possible
- If you don't have proof of the place yet, but you know there is some connection, put "of, [Town], [County], [State], [Country]", or "of, , [County], [State], [Country]"
- Separate the "of" by a comma so the place alphabetizes near where it belongs instead of with the o's -- also TempleReady might not recognize it
- Dates and places that are inside angle brackets, < >, have been computer estimated, as in the Ancestral File -- It helps to change them to "Abt xxxx" and "of, ..." until you find the exact data.
- DATES -- genealogy fornat is DD MMM YYYY with no hyphens between
- PAF will automatically put most dates typed
in any manner into this genealogy format -- experiment by typing a date in some other format
- Until you find the exact date for birth and marriage it helps to distinguish people by putting "Abt
[year]" in birth and marriage fields
- For suggestions on estimating dates see
page 222 in A Users Guide to the New FamilySearch Website (March
2010) -- download the pdf from New FamilySearch > Help Center
> Additional Resources - New FamilySearch Overviews and
Guides
- DON'T use these "Abt" dates for temple
work, unless you can't find the exact dates with reasonable effort
- For temple work include all dates and information
you can find for the person (birth, christening, marriage, death, and burial
dates) since these are included and indexed in the IGI, so someone searching
the IGI on any of these terms will find the person.
- With New FamilySearch you can correct and add data even after the ordinances have been done.
FINDING AND CORRECTING FILE STRUCTURE ERRORS
- File structure errors are computer-science type errors that you have no control over and that get in your database from various operations.
- Open the File > Check-Repair menu in PAF
- Run the Check option in this menu (NOT Check-Repair at first) and look at the report
- If file structure errors exist, but none involve notes for any individual, then run the Check-Repair option in this menu to try to repair the database
- If file structure errors exist, and some involve notes for an individual, DO
NOT run the Check-Repair option in this menu since there is a bug in
PAF 5.2 and it may disconnect the notes for some individuals in your
database --use other methods to correct your
database, e.g. Family Insight or try GEDCOM'ing everything out and
into a new database
- (REPEAT FOR EMPHASIS) -- DO NOT run the File/Check-Repair/CHECK-REPAIR option until you know there are no structure errors involving notes for any individual since there is a bug in PAF 5.2
- Family Insight (old version was PAF
Insight) -- http://www.ohanasoftware.com -- will correct some
file structure errors that PAF won't and vice versa -- Family Insight doesn't
have the notes bug that the PAF 5.2 Check-Repair has
- Always check your file for structure errors before exporting any data
FINDING DATA ERRORS
- KINDS OF ERRORS
- Data errors -- duplicate names, sources, repositories, typos, logical data errors, "loops", incomplete places, places not separated by commas, county not shown, abbreviations
- Many location errors can be corrected without much research since they are just misspellings, wrong jurisdictions, etc.
- Errors in dates usually take more research to correct
- Long-range goal should be to verify and get copies and have sources for each piece of data in your file
- For large databases there are additional complications -- see my notes Working With Large Databases on Don's Class Listings page on http://uvtagg.org
- FINDING AND CORRECTING DUPLICATES
- Several ways to find and correct duplications -- PAF Tools > Match-Merge or use Family Insight -- http://www.ohanasoftware.com or another
program like GenMerge -- http://www.genmerge.com/index.php
- Always make a backup before any merging operation since merging is a "dangerous" operation and can't be undone without going to a backup -- see my Basic PAF class notes about backups on Don's Class Listings
- Finding duplicates in a large database may be difficult since names may be spelled slightly differently and not alphabetize near each other at all
- http://www.namethesaurus.com/ is a website that shows surname and given name variant spellings, plus Soundex and Metaphone codes
- Merging duplicate sources
in PAF -- run Tools > Match-Merge Duplicate Sources and Repositories
- Everything in sources and repositories has to be exactly the same before they will merge -- you may have to edit one to be exactly the same as the other, including sources and repositories before you can get them to merge
- Other programs may help in merging sources, e.g. http://www.rootsmagic.com
- FINDING LOGICAL DATA ERRORS
- POSSIBLE PROBLEMS REPORT in PAF -- Print > Lists > Possible Problems
- Preview this list (Print > Lists > Possible Problems > Preview) and begin correcting errors, e.g. "Children in a family are in wrong birth order" or "Child was born before its mother", etc.
- Can save the Problems list as a text file so you can have it opened in WordPad while you are making corrections in PAF
- Click Print > Lists > Possible Problems > Print-to-File (lower right side of screen), then Print -- save it as "problems.rtf" file (a rich text file)
- Remake this list periodically, clear WordPad, and load the new list which will include the changes you have already made
- Can change some options of what PAF shows as errors -- click on the Options button on Lists > Possible Problems page and set these the way you want
- LOOPS in the data, e.g. someone who is linked as
their own ancestor -- sometimes very hard to find
- One way to find a loop is to check Possible Problems report to
see if a child was born before parents were born -- may not indicate
a loop, but only a typo
- Programs other than PAF 5.2 to use to
find loops
- PAF Companion 5.2 (bundled with PAF 5.2 CD) -- do a Pedigree Chart preview and it will stop on RIN's that are in loops
- GenMerge finds loops in preparing file for
merging -- http://www.genmerge.com/index.php
- To break a loop unlink the child from the parents
where the first "repeat" occurs
- FINDING PLACE NAME ERRORS
- SETTING COLUMNS IN INDIVIDUAL VIEW to find problems
- In Individual View you can set the columns to show data you want -- may see some obvious spelling errors or no county shown, etc.
- To select the columns to view in Individual View, right-click any column heading to open menu,
then left-click Add or Modify Columns, and select the columns you want
- Move items between the left and right panes by the arrows ">" and "<"
- Move items up or down in the right pane to get the order you want the columns in; then click OK
- Can control width of columns by dragging the column title separators right or left
- Can also move entire columns by dragging and dropping their headings right or left
- Can sort the data by RIN's or alphabetically by clicking in title bars of those two columns -- second click there reverses the sort order
- Limitations of Individual View
- Can't sort the data by any column except RIN's or alphabetically
- Can't show marriage places nor dates
- Other programs will show these, e.g. Ancestral Quest and GENViewer (see below in Tools paragraph)
- Use Individual View to see if names or places are entered in mixed case
- Go to Tools > Preferences > Names and uncheck Show Surnames in Caps
- If surnames or places still show as capitalized, except for temple codes, they were entered in caps -- To change all, see Changing to Mixed Case in paragraph below
- PLACE SORTED LIST in PAF to find problems with
places
- To find problems in places, can generate a Places Sorted Alphabetically list
- Preview the Places Sorted Alphabetically list by Print > Lists > Places Sorted Alphabetically > Preview
- Can save this Places list as a text file by
checking Print-to-File (lower right hand side of screen), then Print --
save it as "places.rtf" file (a rich text file) and it opens in WordPad --
use it and make some corrections, remake the list, clear WordPad, and load the new list, and make more corrections
- If the Places Sorted Alphabetically list doesn't
show the RIN's, they are turned off in that database; to turn them on go to Tools > Preferences > Names and tell PAF you want RIN's appended to names -- then remake the Places Sorted Alphabetically list
- OTHER PROGRAMS TO FIND PLACE ERRORS (see details in
later paragraphs in these notes)
- STANDARD FINDER on http://labs.familysearch.org/ -- open to everyone, but you have to search it yourself
-
FAMILY INSIGHT (old program was PAF Insight) -- free for FHC's -- very helpful viewer and editor -- http://www.ohanasoftware.com
-
Has an alphabetical listing of all locations in your PAF database
and they can be edited right there to make them uniform
- If you have access to new
FamilySearch, Family Insight will check your locations against the place list on new
FamilySearch and show you the problems and give you options to change them very easily
- MAP MY FAMILY TREE (used to be WORLD PLACE ADVISOR
- commercial program) -- http://www.progenygenealogy.com/
-
- Finds location errors in your PAF, Legacy, FamilyTreeMaker, or other type file
- Checks spelling, counties, provinces, ambiguities
- Has 3 million towns and cities world-wide in its database and you can add others
- Viewer, not an editor, so you use your genealogy program to correct the errors it finds
- FAMILY ATLAS -- free for FHC's -- viewer, not an
editor, but can help find corrections needed -- http://www.familyatlas.com/
- U.S. CITIES GALORE2 -- free for FHC's -- http://www.uscitiesgalore.com/ -- viewer and
editor, but only works on GEDCOM's
- GENVIEWER -- free version for FHC's -- http://www.mudcreeksoftware.com/ -- viewer, not an editor
- Very helpful program to find errors in data since you can set columns to show whatever data you want and sort on any column to find problems
- Runs on PAF files, GEDCOM's, Legacy, Family Tree Maker, and other genealogy programs
- Can download and try out the full version free for 15 days
- Also does Internet searches on selected data
- See my notes on Working With Large Databases on Don's Class Listings page on http://uvtagg.org for many uses to find data errors
CORRECTING DATA ERRORS
- You can correct anything by hand, but there are easier ways to do many things.
- PAF PAL
-
Free program now (since Fall 2009) -- download from http://www.ohanasoftware.com/
- PAF 5.2 requires version 5.1 of PAF Pal PAF 4 requires the older PAF Pal version 2.3)
- Has to be installed on the computer with PAF and then runs from inside PAF 5.2 by clicking on Tools > PAF Pal
- Has many uses, including expanding or contracting
names for states, Canadian provinces, British counties, add/remove "USA", remove "Submitted's", remove all LDS data, change temple codes, change names or parts of names
- Can do any of these by hand, for example by doing each of the 50 U.S. states one at a time, but PAF Pal does them all at once
- PAF Pal only recognizes and changes to or from standard postal abbreviations for U.S. states, e.g. will change MA to Massachusetts, but won't change Mass nor even MA. (MA followed by a period)
- For British counties and Canadian provinces it uses the 4-letter codes that the FHL uses, not the standard Chapman country and county codes, and it does not work for counties in Scotland, Wales, Channel Islands, and Isle of Man since it says they are in England instead of being countries in the UK
- MIXED CASE -- names and places should be entered in mixed case with large and small letters
- To change names to mixed case
- Run PAF Pal to expand all state names and remove USA from all entries (details below in Auxiliary Tools) -- reason for spelling out state names and removing USA before you change to mixed case is so that it doesn't change things like MA to Ma and USA to Usa, or you may have to correct those later
- Run Tools > Change Names and Places to Mixed Case
- US Cities Galore (see below) will also change places to mixed case in a GEDCOM, but not in a PAF file
- After correcting names and places go back to Tools > Preferences > Names and turn Capitalize Surnames back
on, so surnames are easy to find and read
- CORRECTIONS USING THE PICK ARROWS
- If only a few RIN's need the same place correction and you know the RIN's, do the following
- Find one individual needing the correction and correct the problem place there
- Now go to another field needing the same location, click on the pick arrow (small inverted triangle) at the right end of the line, and you see the last 15 locations you entered with the one you just corrected at the top
- Select the correct location and press Enter to copy it into the place field
- Repeat this for the other RIN's and fields needing that same place correction
- This method requires you to know the RIN's of the few places needing that correction
- CORRECTIONS USING GLOBAL SEARCH AND REPLACE
- If many RIN's need the same correction, or if you
don't know how many or which ones need it, use Global Search and Replace, but MAKE A BACKUP FIRST!
- Locate an individual needing the correction, highlight the place, and right-click to copy the incorrect data onto the clipboard
- Go to Tools > Global Search and Replace and paste the incorrect data into both the Find field and the Replace With field, but then correct it in the Replace With field
- Check the "Case Sensitive" box to avoid problems
and click the "Show Report" to be able to see all the changes it made
after you do the Replace
- Click Replace -- will change all instances of that incorrect data in any place field, whether you know the RIN's needing it or not
- Check the Corrections Done report and see if
all changes are OK -- Rerun Global Search and Replace to make further corrections, if needed
- Be careful using this options since it is easy to make mistakes -- but you can always search for the mistakes and correct them or else go to a backup and start over
- CORRECTIONS USING FAMILY INSIGHT (old program
was PAF Insight)
- Family Insight works with new FamilySearch as
well as the IGI -- Available from http://www.ohanasoftware.com/
- Place Corrections feature is very helpful to make all places uniform
- Edit Places feature shows an alphabetical list of all places in your database with each spelling and allows you to edit them right in the list and it makes the correction in your entire database
- When highlighting a place another window shows all the individuals having that place in some event
- Can also edit the place in the list or can
drag-and-drop a place onto another to have it change all occurences to the new spelling
- Changes aren't saved to your database until you tell
it to save the file -- can give it a new name so you don't wipe out the old file in case you want to go back to it
- CORRECTIONS USING US CITIES GALORE2 (Version 2, May 2005)
- Commercial program, but free
for FHC's -- http://uscitiesgalore.com/
- Only works on GEDCOM files and not PAF files
- Checks U.S. information, i.e. cities in counties, counties in states
- Can copy and paste any of the data from it into your PAF or other database
- Has an internal location database of about 300,000 U.S. cities, counties, and states
- Does some automatic checking and correcting for you
- Checks spelling and counties, changes all caps to mixed case, and sets spacing correctly
- But first use PAF Pal to remove USA before making the GEDCOM, since it may not recognize USA -- shows as errors all non-U.S. locations and anything with USA on the end
- Suggests changes automatically -- shows columns of the way you had the places and what it thinks you might want
- Doesn't make the changes in your GEDCOM until you save it -- save the GEDCOM with a new name so you don't wipe out the old GEDCOM in case you want to go back to it
- Can then have it open the corrected GEDCOM and analyze that, so you can correct more locations
- Expands state abbreviations and makes the spacing uniform
- The location columns can be sorted alphabetically by clicking on the column headings -- makes finding and correcting problems very easy, but only works on GEDCOM's
- Has a "global search and replace" function for the GEDCOM's "corrected locations" and this works even on non-U.S. places -- very helpful
OTHER TOOLS TO FIND CORRECT PLACE NAMES -- Finding correct spelling of places, county that
a city is in, etc.
- INTERNET SEARCH ENGINES
- FHLC -- FAMILY HISTORY LIBRARY CATALOG
- Use Search by Place, enter the city, press
and if FHL has any records from there, you will see a list of locations around the world with parish, county, state, etc.
- Online version at http://www.familysearch.org/Eng/Library/FHLC/frameset_fhlc.asp is updated daily
- Also gives a brief history of U.S. county formation, and countries by clicking on Related Places
- GETTY MUSEUM THESAURUS from Getty Art Museum in Los Angeles, California -- very helpful web site to find info about places world-wide
- http://www.getty.edu/research/conducting_research/vocabularies/tgn/index.html
- To find this web site without having to remember this URL, do a Google search -- http://www.google.com -- for "Getty Thesaurus" and it comes up at or near the top
- Getty Thesaurus gives all places, rivers, mountains, etc., containing that name world-wide
- Shows a hierarchy of the places from smallest jurisdiction to largest, historical names of the location, and geographical coordinates
- Can copy and paste this data into your PAF notes, if you want
- Black entries at bottom of note are old names of current locations, e.g. Zenos is old name for Mesa, Arizona
- FUZZY GAZETTEER -- http://dma.jrc.it/services/common/xsltransform.asp?xsl=fuzzyG2html.xsl&xml=http://dma.jrc.it/fuzzyg/xml/?fuzzy=0!start=0!end=10!q=
- Shows locations with spellings near what you have
- Can also find it with a Google search for fuzzy
gazetteer
- NORTH AMERICA -- maps and gazetteers
- BRITISH ISLES -- maps and gazetteers
UPDATING TEMPLE CODES
- Temple codes are currently 4 or 5 characters -- old temple codes were two letters
- See a list of temple codes, both old and new on http://www.geocities.com/rgpassey/temple/abclist.htm
- The pick list of temple codes is in PAF 5.2 in any
temple field, but is only up to about 2004.
- When you find the incorrect ones, do a Global Search and Replace on Temple Codes to correct them.
- Four ways to determine incorrect temple codes in your PAF database
- Use GENViewer with the columns set to show the
ordinance temples and sort on these columns -- incorrect codes show up
alphabetically between correct ones
- Open your PAF database in Ancestral Quest (free at FHC's) and Name List columns can be set the way you want and you can sort on any column to find the wrong ones
- Make a GEDCOM of entire file and import it into a new temporary database -- the listing file for the import will show the incorrect temple codes, but not in order and will repeat the incorrect ones each time they occur
- Make a custom report -- select all individuals with any ordinance done and have the report show RIN, Name, Sex, Baptism Temple, Endowment Temple, Sealing to Parents Temple, and Sealing to Spouse Temple
- Sort the custom report on Baptism Temple, save it to a text file, and open it in Wordpad -- incorrect Baptism Temple codes show up alphabetically between the correct ones -- make the corrections by Global Search and Replace on Temple Codes
- After correcting the codes in the Baptism Temple list, remake the custom report and sort it on the Endowment Temple to find additional incorrect ones there; correct these, and repeat this for the Sealing to Parents Temple -- each correction made carries over to the other fields
- Note: The custom report in PAF 5.2 will not sort correctly on the Sealing to Spouse Temple (bug in PAF 5.2), so you have to look over that list without it being alphabetized, but most of the incorrect temple codes
will have already been found and corrected from the earlier sortings
CONCLUSION
- Cleaning up your database is a never-ending process and needs repeating regularly
- There are many other procedures and tools, e.g. how to find duplicates in large databases, how to change sources from Notes into real Sources -- see more ideas and procedures in some of my other notes, e.g. Working With Large Databases on Don's Class Listings page on http://uvtagg.org
ASSIGNMENT
- Form a Places Sorted Alphabetically list from your database and look through it to recognize some errors you have.
- Use the Family History Library Catalog to find the county for some town in the U.S. or parish in the U.K. .
- Go to a FHC and use Family Insight, Ancestral Quest, and/or US Cities Galore 2 on your database to see and correct some place errors.
- Check the temple codes in your database and correct the ones you need to.
Return to the
Utah Valley Technology and Genealogy Group Home Page
or
Don's Class
Listings Page.