AncestryDNA is making several changes to its matching algorithm in the next week or two (an exact time is not yet available). You may recall an announcement that was made earlier this month entitled “New Advances in DNA Science Coming Your Way” (pdf) in which they stated the following:
“These advancements are expected to deliver more-precise predictions of whom you are related to, and how closely, among the million-plus others in the AncestryDNA database.”
There were no specifics in the announcement, however. Last night, AncestryDNA provided additional information about the changes that we will be seeing in our match lists in the next week or two.
Before I launch into the specifics, here is a very high-level summary, based on the information we were provided:
Most people will see MORE matches in their match list as a result of these changes. And most matches will be sharing MORE DNA with each other. Lost matches, if any, will most likely be at the very lowest matching levels.
Among the changes that are expected in this update are the following:
- Phasing Improvement – AncestryDNA has significantly increased the reference haplotype set used for phasing prior to cousin matching, meaning that the quality of AncestryDNA’s phasing will increase. This should result in fewer phasing errors, and thus fewer lost matches and false positives.
- Matching Improvement – AncestryDNA is changing how they identify matches between individuals. Previously, AncestryDNA was using “windows” or blocks of DNA to compare two individuals. They will now be used a SNP-based method to compare people. The problem with the previous method is that if the windows didn’t overlap a segment properly, they could either miss the segment entirely or shorten the segment. The SNP-based method will no longer miss or shorten these segments. As a result of this switch, it is expected that many matching segments will increase in size. The total shared cM with many, and possibly most, matches will increase.
As a result of improved phasing and the SNP-based method, there will be a net gain in matches for most people. You may lose a very small number of matches at the lowest matching level, but you’ll also gain matches (more than you’ll lose) at this level. In other words, the matching changes will result in a gain of new matches at the lowest level thanks to more matches that meet the minimum threshold. The matching changes will likely result in the loss of existing matches thanks to improvements in phasing and implementation of the SNP-based based. I’m predicting (and hoping) that the lost matches will most likely have been false positives.
AncestryDNA does plan to let you download data about matches that you lose, if you’ve starred them or made a note for them. Since I think the minimum threshold is far too low anyway due to false positives, I’m personally not concerned about losing matches “down in the weeds.” I’m seriously concerned that based on what I’m seeing, many people are spending way too much time fishing in the realm of false positives and making incorrect conclusions. But it is to AncestryDNA’s credit to allow us to save data for matches we’ve decided are important.
- Match Confidence Changes – there will also be a change to the matching thresholds/confidence scores; specifically, the relationship prediction thresholds will be more stringent. As a result, we will see some of our existing matches SHIFT; they will NOT be disappearing. Thus, we will NOT be losing any 2nd, 3rd, or 4th cousin matches, in the sense that these matches are gone. Instead, we will see some 2nds go to 3rd or 4th (possibly, but this will be a rare shift for most existing customers), see some 3rds go to 4th or distant (slightly less rare), and we will definitely see many 4ths go to the distant category.
ALERT! The very last statement is where there may be an issue for most people. Remember that we can only use the Shared Matches tool to see shared matches that are fourth cousins or closer. If many of our fourth cousins shift to distant cousins – and possibly the majority will – then we will lose the ability to see these shifted fourth cousins as shared matches with the Shared Matches tool. This is my biggest concern about this shift, as much of the value and discovery I’ve made with AncestryDNA has been with this Shared Matches tool. My hope is that AncestryDNA will allow us to use the Shared Matches tool to see every shared match, not just fourth cousins or closer, understanding that many of the shared matches with very distant cousins will be fraught with risk. This hope was adamantly expressed to AncestryDNA.
We all know that relationship prediction is notoriously difficult (see The Shared cM Project), and each of the companies have a hard time with it. On the positive side, with this change, when you have a predicted 2nd cousin, there is almost no doubt that the person is a 2nd cousin (the very endogamous aside of course, sorry!). When you have a predicted 3rd cousin, there is almost no doubt that the person is a 3rd cousin or even closer. When you have a predicted 4th cousin, there is almost no doubt that the person is a 4th cousin or closer. The issue, however, may be 2nds, 3rds, and 4ths that have a lower-than-average sharing amount, as they will be processed into lower prediction buckets.
NOTE: As a result of the phasing improvement, SNP-based method, and match confidence scores, we may see a loss of one or more DNA Circles and/or New Ancestor Discoveries. I don’t have any additional information about this at the current time.
Match Shifting – Not Loss
Unfortunately, we’re going to see a lot of “No, I really lost my match” over the next month, because AncestryDNA doesn’t let you search by username. As a result, people won’t be able to find their shifted matches (i.e., people who go from a higher matching category to a lower matching category) in their match list, and will assume they’ve lost that match. If your match shared enough DNA with you to be out reliably out of the false positive range (say 7 cM or more), I can almost guarantee that you didn’t lose that match; instead, the match shifted. I hope that AncestryDNA implements the ability to search by username to resolve this issue.
EDIT (4/19): also see Judy Russell’s post at The Legal Genealogist.