AncestryDNA Plans Update to Matching Algorithm

Blaine Bettinger19 April 2016 46 Comments

AncestryDNA is making several changes to its matching algorithm in the next week or two (an exact time is not yet available). You may recall an announcement that was made earlier this month entitled “New Advances in DNA Science Coming Your Way” (pdf) in which they stated the following:

“These advancements are expected to deliver more-precise predictions of whom you are related to, and how closely, among the million-plus others in the AncestryDNA database.”

There were no specifics in the announcement, however. Last night, AncestryDNA provided additional information about the changes that we will be seeing in our match lists in the next week or two.

Before I launch into the specifics, here is a very high-level summary, based on the information we were provided:

Most people will see MORE matches in their match list as a result of these changes. And most matches will be sharing MORE DNA with each other. Lost matches, if any, will most likely be at the very lowest matching levels.

Among the changes that are expected in this update are the following:

Phasing Improvement – AncestryDNA has significantly increased the reference haplotype set used for phasing prior to cousin matching, meaning that the quality of AncestryDNA’s phasing will increase. This should result in fewer phasing errors, and thus fewer lost matches and false positives.
Matching Improvement – AncestryDNA is changing how they identify matches between individuals. Previously, AncestryDNA was using “windows” or blocks of DNA to compare two individuals. They will now be used a SNP-based method to compare people. The problem with the previous method is that if the windows didn’t overlap a segment properly, they could either miss the segment entirely or shorten the segment. The SNP-based method will no longer miss or shorten these segments. As a result of this switch, it is expected that many matching segments will increase in size. The total shared cM with many, and possibly most, matches will increase.

As a result of improved phasing and the SNP-based method, there will be a net gain in matches for most people. You may lose a very small number of matches at the lowest matching level, but you’ll also gain matches (more than you’ll lose) at this level. In other words, the matching changes will result in a gain of new matches at the lowest level thanks to more matches that meet the minimum threshold. The matching changes will likely result in the loss of existing matches thanks to improvements in phasing and implementation of the SNP-based based. I’m predicting (and hoping) that the lost matches will most likely have been false positives.

AncestryDNA does plan to let you download data about matches that you lose, if you’ve starred them or made a note for them. Since I think the minimum threshold is far too low anyway due to false positives, I’m personally not concerned about losing matches “down in the weeds.” I’m seriously concerned that based on what I’m seeing, many people are spending way too much time fishing in the realm of false positives and making incorrect conclusions. But it is to AncestryDNA’s credit to allow us to save data for matches we’ve decided are important.

Match Confidence Changes – there will also be a change to the matching thresholds/confidence scores; specifically, the relationship prediction thresholds will be more stringent. As a result, we will see some of our existing matches SHIFT; they will NOT be disappearing. Thus, we will NOT be losing any 2nd, 3rd, or 4th cousin matches, in the sense that these matches are gone. Instead, we will see some 2nds go to 3rd or 4th (possibly, but this will be a rare shift for most existing customers), see some 3rds go to 4th or distant (slightly less rare), and we will definitely see many 4ths go to the distant category.

ALERT! The very last statement is where there may be an issue for most people. Remember that we can only use the Shared Matches tool to see shared matches that are fourth cousins or closer. If many of our fourth cousins shift to distant cousins – and possibly the majority will – then we will lose the ability to see these shifted fourth cousins as shared matches with the Shared Matches tool. This is my biggest concern about this shift, as much of the value and discovery I’ve made with AncestryDNA has been with this Shared Matches tool. My hope is that AncestryDNA will allow us to use the Shared Matches tool to see every shared match, not just fourth cousins or closer, understanding that many of the shared matches with very distant cousins will be fraught with risk. This hope was adamantly expressed to AncestryDNA.

We all know that relationship prediction is notoriously difficult (see The Shared cM Project), and each of the companies have a hard time with it. On the positive side, with this change, when you have a predicted 2nd cousin, there is almost no doubt that the person is a 2nd cousin (the very endogamous aside of course, sorry!). When you have a predicted 3rd cousin, there is almost no doubt that the person is a 3rd cousin or even closer. When you have a predicted 4th cousin, there is almost no doubt that the person is a 4th cousin or closer. The issue, however, may be 2nds, 3rds, and 4ths that have a lower-than-average sharing amount, as they will be processed into lower prediction buckets.

NOTE: As a result of the phasing improvement, SNP-based method, and match confidence scores, we may see a loss of one or more DNA Circles and/or New Ancestor Discoveries. I don’t have any additional information about this at the current time.

Match Shifting – Not Loss

Unfortunately, we’re going to see a lot of “No, I really lost my match” over the next month, because AncestryDNA doesn’t let you search by username. As a result, people won’t be able to find their shifted matches (i.e., people who go from a higher matching category to a lower matching category) in their match list, and will assume they’ve lost that match. If your match shared enough DNA with you to be out reliably out of the false positive range (say 7 cM or more), I can almost guarantee that you didn’t lose that match; instead, the match shifted. I hope that AncestryDNA implements the ability to search by username to resolve this issue.

See Roberta Estes’ post about this update at DNAeXplained, and Tim Janzen’s comments at Rootsweb.

EDIT (4/19): also see Judy Russell’s post at The Legal Genealogist.

CarolynGM 19 April 2016 / 7:57 am

At this time, you can check for shared matches with ANY match at any confidence level and often you will see some. However, the only people you will see in the shared matches list are those of Very High confidence or higher.
- Blaine Bettinger 19 April 2016 / 8:13 am
  
  Yes, you’re right, I should have known better since I wrote about it here: https://thegeneticgenealogist.com/2015/08/28/ancestrydna-announces-new-in-common-with-tool/
  
  I changed the text of this post, thank you.
Ann Turner 19 April 2016 / 8:31 am

Thanks for the additional insights. One way to sidestep the issue of shifting levels would be to use the DNAGedcom Client, which will do a batch download of all your matches, the ancestral names of those matches, and the Shared With lists. You can search your file for user ID. There is a monthly subscription fee of $5.

https://www.dnagedcom.com/doc/welcome-to-the-dnagedcom-client/
- Blaine Bettinger 19 April 2016 / 9:20 am
  
  Great point Ann! I love the DNAGedcom client, it always runs smoothly for me.
- Crisa Baker 19 April 2016 / 3:02 pm
  
  Thanks Ann!
- Debbie Kennett 20 April 2016 / 4:43 am
  
  In case it’s of any help to anyone I wrote a blog post yesterday about using the DNAGedcom Client to download my AncestryDNA matches:
  
  http://cruwys.blogspot.co.uk/2016/04/changes-to-ancestrydna-matching.html
  
  It all worked very smoothly for me, and it was interesting to see all the results in the match list.
- Dan 22 April 2016 / 7:14 am
  
  When using the Client be sure to also run match, ICW and Tree reports on all of your other kits that have been shared with you. Also either save these latest runs with a different file name or to a separate folder so they are not overwritten the next time you run them.
  
  You could then be able to view and compare matches from pre Ancestry changes run to post Ancestry changes run and see how the changes impacted all your matches. You will need to import into a spreadsheet naturally.
Evan Rofheart 19 April 2016 / 8:42 am

My concern is losing very distant matches, who I know I am related to, as we both have very developed trees. Mostly I expect the most losses to be with others who go back ten generations, using well documented sources to the founders.

The unfortunate thing is that these documented matches will be replaced by people who have no interest in building a tree, and usually won’t even respond to enquires.

I expect a net loss of useful matches.

Evan
- Blaine Bettinger 19 April 2016 / 10:06 am
  
  There’s a lot of huge assumptions there, but I’ll agree that it is so frustrating to match people who have no tree or no interest in sharing!
  
  Also, I’ll note – because I see this a lot – having a small shared segment and confirmed documentation of a shared ancestors does not, by itself, validate the small shared segment.
- Amby Rogers 22 April 2016 / 12:13 pm
  
  I feel the same way. Our research has been over the last 30 years and distant relatives have been a real treat with the shared matches and hints.
  
  Most people who do DNA today only care about their direct line, never adding information on “All” their families.
  
  The last “Update” Ancestry did from V1 I lost numerous 3rd cousins that are proven to be from the same line. This does not give me much faith in the direction Ancestry is going.
Steven Frank 19 April 2016 / 9:20 am

I hope they listen to you about the option to let advanced users see every shared match! Until then, I’m adding notes to all my “very high” confidence level 4th cousin matches on who they match so I can still reference that if they get bumped to distant matches.
Jim Bartlett 19 April 2016 / 11:05 am

I thought you could search for a match by username – I do it all the time. But maybe it’s an add-on I installed long ago…
- Blaine Bettinger 19 April 2016 / 11:19 am
  
  That’s probably the AncestryDNA Helper add-on (aka the Snavely tool). I always had trouble getting it to run properly in the “early days”, looks like a good time to try again!
  - Diana 19 April 2016 / 1:28 pm
    
    Yes, it’s the Snavely tool. Very helpful.
    - Sue Reed 20 April 2016 / 5:49 pm
      
      Where do I find the Snavely tool?
      - Sue Reed 20 April 2016 / 5:55 pm
        
        Got it. Thanks.
  - Amby Rogers 22 April 2016 / 12:15 pm
    
    Ancestry DNA Helper is the best tool that I have on Ancestry. Thanks go to the people who took the time to make it and share with us
  - Betty 29 April 2016 / 12:56 am
    
    You should definitely retry the Snavely tool, especially if you manage multiple kits and/or share your DNA results with family members. It allows you to search by username, and it allows you to tell if your other kits or shared DNA also match a given match. That’s not quite as beneficial as it was before Ancestry rolled out the “matches in common” feature, but it still has its uses. I don’t know how anyone does without the “search by username” feature. That ought to be a standard feature of Ancestry DNA.
- Wanda E. 19 April 2016 / 11:55 pm
  
  I search for surnames quite a lot. Just go to view all matches and then select search matches.
DJ Williamson 19 April 2016 / 1:43 pm

Maybe bear changes will help me a little better than what I already have as far as matches then. I’m adopted and have pretty much zero information about either biological parent or the families and I have several DNA matches that are third and fourth cousins butt I have no clue which way to go from there. I’ve uploaded 2 GED match and I’ve sent messages on Ancestry to my highest matches but the two that responded weren’t able to help me. I’m really kind of worried because my membership for ancestry expires around the beginning of May and I don’t have the funds now to renew it so I don’t know what I should do because I’m not sure if I’ll still be able to see my DNA information or not. When I got the membership and did the DNA test last year I was under the hope that the six months I paid for would be long enough for me to find at least some of my close family.
- Dan 19 April 2016 / 5:12 pm
  
  People seeking bio-parent information should be researching their closest matches, not their farthest, so losing estimated very distant matches ought not negatively affect your search.
  
  And if the net result of these changes make estimation of kinship (for the closer matches) more accurate, you may find the changes helpful in knowing how to weigh any match in your search.
- Kath 19 April 2016 / 10:33 pm
  
  I believe that you have ongoing access to your DNA results and ability to contact your matches, you just lose the ability to contact random Ancestry members who aren’t matches. See: http://ancestryau.custhelp.com/app/answers/detail/a_id/9087/~/ancestrydna-with-and-without-an-ancestry-subscription
- sara crystal 20 April 2016 / 9:26 am
  
  you may enjoy watching a couple recent episodes of “who do you think you are”, and long lost family, since they use dna matching to backtrack combined with genealogy research, tracing forward from shared ancestors. good luck in your search, there is plenty of help out there in the adoption reform movement, no need to give up. love, sara
- Billie 20 April 2016 / 5:34 pm
  
  Our local library has Ancestry access. You do have to go in person, but check and see if you can’t get free access through your local library.
Sheryl 19 April 2016 / 1:57 pm

Let’s hope they figure out a way to just let us seach by name, and surname?
Wallace Fullerton 19 April 2016 / 3:17 pm

“The issue, however, may be 2nds, 3rds, and 4ths that have a lower-than-average sharing amount, as they will be processed into lower prediction buckets.”

And do we have any way of knowing the actual ranges used by Ancestry to predict any given level of “cousinship” or confidence level? Their simplistic “you share x cM across Y segments” is of little help.
Marj 19 April 2016 / 3:57 pm

I think if you “star” the match it will stay. At least I am hoping that will happen.
- Vickie Belzer 19 April 2016 / 5:54 pm
  
  I read the same thing this morning so I went in and starred who I wanted to save starting with those with leaf hints.
  - Gail Zeigler 19 April 2016 / 6:35 pm
    
    Apparently so many are trying to do this that leaf hints has crashed.
chekwriter 19 April 2016 / 4:28 pm

I wonder if ANCESTRY has discovered just how popular the web site, http://www.gedmatch.com has become

and once again, they are investing in time, site up grades, programming, etc, to further their customer

base. This will of course, de rail many from going to the http://www.gedmatch.com web site and donate funds to their upper level “Tools”.

Hmmm, never count them out, to see a way to make another ‘buck’.

Wonder how soon Ancestry will ‘buy’ them out.
- Sally Jo Fuhr 20 April 2016 / 8:42 pm
  
  I notice my ancestry subscription price now covers six months where initially, it was annual. Apparently they’ve already come up with a way to make another buck.
Seaton Smithy 19 April 2016 / 8:24 pm

Thank you for this excellent summary of the changes. I’m looking forward to this. I will not be too concerned if some of my lower 4th-6th ranked matches drop to 5th-8th and I’m sure I won’t even notice if some of the matches at the end of my list of over 5,000 5th-8th ranked matches disappear.

The increased certainty for closer matches and the removal of false positives is a very welcome development.
- Blaine Bettinger 20 April 2016 / 9:24 am
  
  Thank you! I agree, I think there is always a lot of room for improvement when it comes to matching, and hopefully this will be a step in the right direction.
McNeil 20 April 2016 / 1:18 am

They will enforce the match confidence but they are saying they will find more sharing dna between and your matches maybe theses 2 aspects would compensate each other. I have 3 real 3rd cousins suggested like 4rd cousins , only around 60% of my known real 4rd cousins are matching me and around the half of them are matching as distant cousin. that make the “sharing in common” tool not to usefull with real 4rd cousin….they must to improve this
Tim Skinner 20 April 2016 / 7:50 am

Great informative article! Hopefully this will be a change for the better. And if there are any unwelcome and unexpected side effects, ancestry will listen to their customers this time.
- Blaine Bettinger 20 April 2016 / 9:18 am
  
  Thank you! It will certainly be very interesting to see how my match lists look after the upgrade. I tend not to spend much time with very distant matches (say 7 cM or less), so I’m not expecting to notice a huge change.
Michelle Cole 20 April 2016 / 8:11 am

I haven’t had any new matches since April 14. Is anyone getting new matches?
Chris 20 April 2016 / 10:30 am

The biggest problem I find with any matches, strong or weak, is the lack of a tree attached and/or failure of the match to respond to an inquiry. With these problems, matches are worthless no matter at what confidence level, 2nd, 3rd, 4th, or whatever.
Jane 20 April 2016 / 10:36 am

What I would like to know is whether certain segments do not recombine, and, if so, are any of the companies doing autosomal dna matching using this information to determine more reliable distant matches.

Hope to see some of you in Burbank in June.
River 20 April 2016 / 1:44 pm

I was looking to understand the cm ranges associated with the various confidence levels as I don’t think Ancestry provides that. Wondering if there are any informal estimates available on this. I am currently completely ignoring matches that are Good and lower as I am not not confident enough on whether these might be IBS and these are like 90+% of my matches
- Ernie Kapphahn 20 April 2016 / 10:55 pm
  
  Good is from about 6.3 to 11.9 cM. They are sorted best first and if you click on the “i” next to the rating you will see the cM of the match.
Judy 21 April 2016 / 2:22 am

Does this mean that the ethnicity reports will be tweaked as well?
Janet 30 April 2016 / 11:22 pm

Has anyone heard any news yet? This is taking way to long. No update on dna matches for at least 2 weeks. Can’t really work on my tree, waiting on my sister’s dna results, we may only be half siblings. I shared 790 centimorgans with someone that definitely isn’t my first cousin, I’m afraid I my be her half aunt. This wait is really testing my patience!
B. Maurer 8 May 2016 / 12:11 am

With Ancestry’s new dna analysis, my family members lost so many matches. We are so upset.
Juan Gonzales 12 May 2016 / 9:46 pm

Recently I was working on my family tree and requested a DNA test. My test matched with two male first cousins with an Anglo surname. I grew up with a Hispanic family. My DNA did not match either of my adoptive parents. All parents have passed away. I was never informed that I was adopted.

Can the raw data from Ancestry DNA test be analyzed to determine a half-brother?

Listed below are the results from GEDMATH for my two cousins and me.

Cuz-1 V Cuz-2 Cuz-1 V Me Cuz-2 V Me

Large segment 104.3 cM 95.5 109.9

Total Segments 844.4 749.5 1091.2

Est. number of generations 2.1 2.1 1.9

I have searched as far as I can go. I need the help of a professional. What do you suggest?
Juan Gonzales 23 February 2017 / 4:26 pm

is this posting active?