Friday, February 2, 2018

Use quality code 4 sparingly

Some of you probably remember back when we collected the Urban and USCT samples that we were sometimes inclined to take soldiers at a quality code (QC) 4 match. The idea was that even though we didn't have much supporting evidence, the individual was demographically similar even if he wasn't a perfect match. Then we could tell users that they were weak matches, and they could decide to drop them from the data they analyzed.

Now that we are searching for the soldiers' children, this doesn't really work. First of all, we don't want to spend two days searching for the 12 children of a QC4 soldier. Why waste all that time if they might not be the people we're looking for. Additionally, we have many more records available to make a solid match than we did for the previous samples. That is to say, most of the time, you should be able to find solid supporting evidence for the match you've made.

Because of all the great records available, we should only add a QC4 match (of soldier, spouse, child) to the grid on rare occasions. Use it sparingly. Occasionally, you can make an assumption and justify a QC4 match. I was trying to come up with an example of this, but it is so infrequent, that I couldn't. If you have to make multiple assumptions (even if those assumptions are logical and possibly correct) to add an individual to the grid, then you should not add that individual.

You should hardly ever add a QC4 individual to the grid. If you do, you should be able to defend the choice without making multiple assumptions about the individual or family. Just because a soldier is married, doesn't mean he has kids. Just because an individual is the right age in the right town doesn't mean she is the soldier's child. Maybe the child is related but is a niece or nephew, not a child. Make decisions based on all the information you have at your disposal. Whenever I'm tempted to take someone who is a weak match, I ask myself, "How do I know this is my person and not some random person with the same name?" What are the reasons to rule out this person?

This applies to both Project 1 and Project 2, but it is especially relevant to Project 2. Many of the birth/death/marriage dates change from decade to decade and sometimes the family structure has broken down. All this must be taken into account when matching individuals and families.

Many QC 4 matches will turn out to be wrong. It is so much worse to have wrong data than missing data.

No comments:

Post a Comment