Wednesday, February 28, 2018

Progress report for February 28

Here is our progress as of today, February 28, 2018.


Project 1
Project 2 - USCT

n = 8,500
n = 4,500
Total soldiers complete
7891
3875
Soldiers completed during last  week
55
35
Soldiers w/o children (to be removed from sample)
691
719
% of completed soldiers w/o children
8.8
18.6
Soldiers with children complete
7200
3156
% with children complete
84.7
70.1

Friday, February 9, 2018

Social Security numbers

Please make sure you always record (if available) the Social Security number for the soldier, his spouse, and his children.

If your death source is the Social Security Death Index, select that as the source and input the number in the box that appears.

If you find the U.S. Social Security Applications and Claims Index, 1936-2007, enter the Social Security number in the Remarks field of the appropriate individual's Death screen. Type SSN: and the number.

If you don't find death information for an individual, but you did find her/his Social Security number, please enter it in the Remarks field of the last census decade input. Type SSN: and the number.


We want to make sure we ALWAYS collect the Social Security number.

Thursday, February 8, 2018

Some thoughts on linking

I was discussing quality code 4 matches and census linking with Irene this week. She had some good comments that I'd like to share with you.

We both agreed that sometimes people take a QC4 match because they don't want to leave a no find on the grid. They think that any match is better than no match. That is not the case. We don't want to collect false matches. Just because you only found one person with the right name living in the right county doesn't mean you've found the correct person. If you've found two possible matches, you can't necessarily choose the person whose age is only off by 3 years instead of 5 years. Beware of common names. When an individual lives in a large city, there might be multiple people with the same name on the census and others who just got missed by the enumerator.

Here is what Irene had to say about linking records:
I see linking records and clues like stepping stones across a creek - you need to step on each stone to get you to the next one without skipping over any of them. There needs to be a clear link between the starting info and the records we use and attach to the family.

Please keep this in mind as you do census linking.

Monday, February 5, 2018

January 2018 checking stats

In January, we checked 31 soldiers under our system. 

I've reviewed all of the checks, and I've tallied the number of differences. Some of these are errors, and some are judgment calls. Here are the categories and the total number of differences for each category.

GRID Errors

MILIN?/MAR? - 3
Missing HH member - 3
Duplicate people - 1
Wrong person - 0
Other - 1

Inferred Relationships

Incorrect relationships - 10

Census Errors

Name - 8
Typo/Reading/Wrong - 35
State Code - 0
Missing/Wrong URL - 3
Missing data - 13
Additional finds - 23
Quality Code - 15

Death Errors

Typo/Reading/Wrong - 4
Missing data - 23
Missing/Wrong URL/Source - 4
Quality Code - 6
Additional finds - 8

Tree Errors

Missing/Incorrect information/relationships - 11

The total number of differences for all 31 soldiers is 171. This is 41 fewer differences than November, when we checked 38 soldiers. The number does not include errors found while assigning mothers to children. Probably 15-20 children who were listed on the Mil Info were left of the grid. That is a major error in a study that is collecting data on these children. Some differences/errors affect the data more than others. If we checked other pensions, we'd probably find similar differences. Some mistakes are inevitable, but please pay close attention so that the number of errors can be minimized. Please make sure all the soldiers' children are searched and added to the grid.

Friday, February 2, 2018

Use quality code 4 sparingly

Some of you probably remember back when we collected the Urban and USCT samples that we were sometimes inclined to take soldiers at a quality code (QC) 4 match. The idea was that even though we didn't have much supporting evidence, the individual was demographically similar even if he wasn't a perfect match. Then we could tell users that they were weak matches, and they could decide to drop them from the data they analyzed.

Now that we are searching for the soldiers' children, this doesn't really work. First of all, we don't want to spend two days searching for the 12 children of a QC4 soldier. Why waste all that time if they might not be the people we're looking for. Additionally, we have many more records available to make a solid match than we did for the previous samples. That is to say, most of the time, you should be able to find solid supporting evidence for the match you've made.

Because of all the great records available, we should only add a QC4 match (of soldier, spouse, child) to the grid on rare occasions. Use it sparingly. Occasionally, you can make an assumption and justify a QC4 match. I was trying to come up with an example of this, but it is so infrequent, that I couldn't. If you have to make multiple assumptions (even if those assumptions are logical and possibly correct) to add an individual to the grid, then you should not add that individual.

You should hardly ever add a QC4 individual to the grid. If you do, you should be able to defend the choice without making multiple assumptions about the individual or family. Just because a soldier is married, doesn't mean he has kids. Just because an individual is the right age in the right town doesn't mean she is the soldier's child. Maybe the child is related but is a niece or nephew, not a child. Make decisions based on all the information you have at your disposal. Whenever I'm tempted to take someone who is a weak match, I ask myself, "How do I know this is my person and not some random person with the same name?" What are the reasons to rule out this person?

This applies to both Project 1 and Project 2, but it is especially relevant to Project 2. Many of the birth/death/marriage dates change from decade to decade and sometimes the family structure has broken down. All this must be taken into account when matching individuals and families.

Many QC 4 matches will turn out to be wrong. It is so much worse to have wrong data than missing data.

Thursday, February 1, 2018

Dates on family circulars

When you find information about the soldier's spouse and children in the Mil Info, most of it comes from the family circular in the pension record. When you see this information, check the living/dead dates for the kids, and you'll probably get a good idea which family circular the soldier submitted.

Family circulars were sent to the soldiers in 1898 and 1915. The questions were slightly different. In 1898, the soldier was only asked to list his living children. In 1915, the soldier was asked to list all his children, living or dead.

If you notice that the Mil Info lists four children living in 1898, you know the soldier has at least four children. You might find additional children on an 1880 census because children were alive in 1880 that weren't alive in 1898.

If you notice that the living/dead date is 1915, then you probably have a pretty good record of all the soldier's children, living and dead. If it says he has two children, think twice before accepting the census decade with six children.

The Mil Info is not perfect, but you can trust it. Sometimes the soldiers didn't follow instructions filling out the form. Sometimes they can't remember when their children were born or when they married their wives. But, on the whole, you can trust the Mil Info even if it occasionally gets the details wrong. Paying attention to the dates gives you some extra context about the information you're seeing in the Mil Info.