The Danger of Distant Matches

We know that small segments shared between two individuals can be problematic (see Small Matching Segments – Friend or Foe?), whether the two individuals are closely related or distantly related (or not related at all, as we’ll see). I call small segments (which I usually classify as 5 cM or less) as POISON because it is currently impossible to decipher between which are real segments and which are not.

In the following analysis, I use the wonderful new Match-O-Match tool at DNAGedcom to compare my and my parents’ match lists from AncestryDNA. The Match-O-Match tool is a powerful spreadsheet analysis tool developed by Don Worth. It is available to DNAGedcom subscribers as part of the DNAGedcom Client. For more, see page 10 of the PDF HERE. Thank you Don for this great new tool!

What the following analysis shows is that 32% of my matches are not shared with either parent. For my matches, there is a SAFE ZONE above 15 cM in which a match has a 99.3% probability of being shared by either or both parents.

Below 10 cM, there is only a 59% probability of being shared by either or both parents. And below 7 cM, there is only a 40% probability of being shared by either or both parents. This numbers will vary for others, and I’d love to see others joining in this analysis.

Conclusions

I conclude from this data that matching below 10 cM is poisoned by false matches. Yes, without a doubt a significant percentage of these matches are real, but how do you decipher between them? In other words, if the poison is colorless and odorless, how do you know what has been poisoned? Having both parents tested may help, but it is not a guarantee as false segments might be inherited.

Do I think that all segments below 10 cM are forever poisoned? No, I believe that this an area where whole genome sequencing will have the strongest impact. And other methodologies and technologies may help alleviate this issue in the future. In the meantime, however, genealogists must be aware of the issue of false matches and how that might impact their research.

NOTE: I’m not the first genetic genealogist to do this type of analysis, by far, but I’m just adding my voice to the mix.

My Matches at AncestryDNA

Right now, I have a total of 16,193 matches. The matches breakdown like this:

  • 39 share 50 cM or more (of these, I target-tested 15)
  • 169 share 25 cM or more
  • 411 share 20 cM or more
  • 1,130 share 15 cM or more
  • 4,190 share 10 cM or more
  • 12,003 share fewer than 10 cM (75%)
  • 5224 share between 6-7 cM (32%)

Sharing With My Parents

Of these 16,193 matches, I share 3977 (25%) with my mother and 7144 (44%) with my father. There’s some overlap there, my parents share 93 of my matches in common (eliminating me and my children).

A total of 5135 (32%) are not shared with either my mother or my father. These matches – not shared by either parent – breakdown like this:

  • The largest match is 19.1 cM
  • 8 share 15 cM or more
  • 261 share 10 cM or more (5%)
  • 1784 share between 7-10 cM (35%)
  • 3090 share between 6-7 cM (60%)

Welcome to the Danger Zone

So the safe zone – meaning a match is almost guaranteed (99.3%) to match my mother and/or my father – is 15 cM or more, which is 1,130 of my matches. You might be able to lower this safe zone to 10 cM (4,190 of my matches), as there is a 94% probability that these matches will match my mother and/or my father.

The danger zone – meaning a match is NOT guaranteed to match my mother and/or my father – is below 10 cM, as 41% of my matches below 10 cM were not shared with either parent.

The really dangerous zone – meaning a match is not likely to match my mother and/or my father – is below 7 cM, as 60% of my matches below 7 cM were not shared with either parent.

Why do I have matches that neither parent has? These matches could be false positives, meaning that I have the match when I shouldn’t. These matches might also be false negatives, meaning that my parent fails to have the match when they should.

Having my parents tested may help me decipher which of my matches might be real, although of course there is no guarantee that a distant match I share with a parent is a real match. A false positive match that I have could be a parent’s false positive match as well.

But, if you don’t have both parents tested, how do you determine which of your matches below 15 cM are solid matches?

 

 

60 Responses

  1. Don Worth 6 January 2017 / 2:57 pm

    Wow! It never would have occurred to me that you could get all that from M-O-M!

    I don’t have either of my parents tested (because I am old as dirt) so this isn’t something I could have attempted, although we just tested my daughter over Christmas – I don’t think she was all that excited about finding the Ancestry kit under the tree, though. So maybe I can follow your path here and give it a try. I often urge people to test at Ancestry first because I have 10x the matches there that I have at Family Tree DNA and 5x the matches I have at GEDMATCH (due to the size of the databases and privacy restrictions), but considering that 33% of them are likely junk matches, that attenuates the benefit a bit.

    With autosomal DNA I’m quite interested in “playing the odds” and working with mass analysis. For example, I’m thinking about using surname frequency analysis of a larger match database to pick out which surnames “pop out” as being dominant when you compare the trees of shared matches. I would think the junk matches would wash out as noise if you have enough matches to look at? I’m also interested to know whether a small segment that is part of a large triangulated group is more likely to be IBD than a small segment you find that is isolated. It seems like a preponderance of smaller, weaker clues might add up to a stronger one. What do you think?

    Anyway, thanks for the shout-out! My programming is not to make money – I’m just doing this for the fun of it and to keep my programming skills going into retirement.

    Don

    • Blaine T Bettinger 6 January 2017 / 3:43 pm

      I agree completely that bioinformatics, or “playing the odds”, might be one way to work around the limitation of distant matches. With lots of data, we might be able to navigate through the issues. The real danger comes when someone picks a single match at 8 cM and wants to do lots of things with it.

      My pleasure, thanks for another great tool!

      • Gail Wilson 7 January 2017 / 12:04 pm

        Hi Blaine, you just answered a question I had on the Gedmatch FB page about a match my dad had on FTDNA but it doesn’t who up at Gedmatch unless I lower the threshold and then there are several small segments. I am going to see what comes up in the M-O-M app.

  2. Roberta Estes 6 January 2017 / 3:01 pm

    I would like to see this information using Family Tree DNA or GedMatch data. My concern with Ancestry is that since they strip out anything they feel is “too matchy” with Timber you’re really not seeing your real matching segments, or not all of them anyway. You don’t know how much of the parents’ DNA was removed. I wonder if you could compare the same people at FTDNA and/or at GedMatch that you compared at Ancestry, how the match numbers would compare. That would be the best scenario, if that’s possible, and I think more compelling than data that Ancestry has already manipulated in unknown ways.

    • Blaine T Bettinger 6 January 2017 / 3:41 pm

      Roberta,

      I too think that Timber and phasing at AncestryDNA greatly affects this analysis. I would like to repeat this with my Family Tree DNA matches. It isn’t the same list of people, so it will have to be viewed with that limitation.

      Since 23andMe and GEDmatch both have matching limits based on number of matches versus size of segment, we can’t do this analysis using any of that data, unfortunately.

      • Roberta Estes 7 January 2017 / 12:02 am

        I didn’t mention 23andMe because of the match limit, but you’re right about GedMatch having a limit as well. If you can use the same primary people, the matches won’t be the same of course, but the concept should work. Even if you can’t use the same primary people, the concept should still be comparable and more reliable because the data has not already been manipulated with parts removed. I hope you can find time to do this – probably between 3 and 5 AM during your two hours of nightly sleep:)

    • Don Worth 6 January 2017 / 4:49 pm

      I suppose you could massage a match file from FTDNA or GEDMATCH to look like an Ancestry match file and run it through M-O-M. Should work. And in the longer term I could have M-O-M detect the type of match file format and rearrange the columns internally to convert from FTDNA to Ancestry match file format, etc.

  3. Jason Lee 6 January 2017 / 4:43 pm

    Prescription: Provide a chromosome browser, raise the cM threshold, apologize to the customers for the inconvenience.

  4. Deborah Kennett 6 January 2017 / 5:24 pm

    Thanks Blaine for a very interesting analysis and thanks to Don Worth for providing such a useful tool. I haven’t tested my parents at AncestryDNA yet but I have tested them at FTDNA and when I last checked I found that about 23% of my Family Finder matches didn’t match either of my parents. Of course FTDNA have much higher match thresholds though the matches there are not phased.

    It would be interesting if someone could find a way to drill down into these non-matches at AncestryDNA and determine how many are false positives and false negatives. The phasing is done in windows, which has the effect of breaking up the segments into smaller sections. I suspect this might account for some false negatives.

    AncestryDNA do only assign “moderate” confidence to the matches in the 6-16 cM range and point out that there is only a 15-50% likelihood of a recent common ancestor. They say: “You and your match might share DNA because of a recent common ancestor or couple, share DNA from very distant ancestors, or you may not be related.” However, this chart is buried deep in their help menu and I doubt that many people find it. I think it might be better if they could flag up these small matches and highlight the uncertainty rather than listing them all as possible 5th to 8th cousins. I do wonder if it’s worth them including all the 6 cM matches. The last time I checked I found that half my matches at Ancestry only shared 6 cMs with me. The vast majority of these 6 cMs matches are going to predate a realistic genealogical timeframe.

    • Blaine T Bettinger 6 January 2017 / 6:47 pm

      I wanted to add my Family Tree DNA analysis to this, but I ran out of time. It would have been much more intensive too, without the Match-O-Matic tool.

      Yes, if it were up to me, I would up the threshold to somewhere between 7 and 10 cM, with a flag that segments below 10 cM are likely to be false and beyond a genealogically relevant time frame.

      • Jim Bartlett 6 January 2017 / 6:52 pm

        Blaine,
        The Speed chart of IBD vs Probability, in the 5-10cM band, shows IBD segments for the first 10 generation occuring about 20 percent of the time. So “likely” to be false is true, but some of them (1/5) being within a genealogically relevant time frame, is also true.

        • Deborah Kennett 6 January 2017 / 7:21 pm

          If it were me I would up the AncestryDNA threshold to 7 cMs rather than 6 cMs, and I would highlight the 7-10 cM segments as potentially being problematic and only to be used with extreme caution.

          The Speed chart is based on simulations. With simulations you have an idealised world where all the segments are real.

          I think we’re always going to have problems detecting IBD segments with the current chips. If the companies were to move to new chips with more SNPs then the accuracy might improve. The new Illumina chips can test up to 5 million SNPs but I imagine the cost is beyond the reach of the average genetic genealogist.

          • Alan P S D Mayer 7 January 2017 / 1:30 am

            Thanks Blaine once again for another excellent article.
            I was just thinking in relation to the 5E6 SNP chip issue – consider the progress that computers, etc have made in the last 20 years, or even 50 years !!! The same with forensic profiling – as illustrated recently it’s vastly easier to catch a criminal with a drop of blood nowadays that it was decades ago. Consider how the number of markers used in various police DNA tests has changed. Maybe in 5 to 10 years 5E6 SNPs will be the norm, and full genome testing will be a minor addition. An intel i5, or i7 computer may be considered slow in 10 years time, and 5E6 SNPs may be able to be processed by the layperson in a matter of minutes. Maybe gedmatch will have a function to upload Y and mt DNA, and compare it, rather than just Y and mt “typed” haplogroups.
            Yea for progress !!!

        • Blaine T Bettinger 6 January 2017 / 8:18 pm

          Jim – I can’t argue with any of that! I can only say that if you give newbies an inch, they’ll take a mile. I see abuse of these segments every day, and it’s getting much worse by the day as the proportion of newbies taking atDNA tests grows. I think I’m a bit more dictatorial; I prefer to withhold data that just 1% of test-takers can use responsibly (i.e., segments smaller than 10 cM) than to see the small segments pollute genealogy.

      • Jan Noack 9 January 2017 / 4:02 am

        What do you call a genealogical significant time frame.?
        I have many (most) smaller matches as, of course ,does everyone. All come from the same area with similar first names. We can’t trace the records back, as they don’t exist…BUT I think, to me, that implies that is the area we came from and somewhere a few generations or few hundred years before we were related. As I’m trying to find where my ancestors came from in Ireland and Prussia…this is GOOD. Smaller matches with me tell me I match them but not their parents or aunt etc and find it strange..and yet we can trace our family trees back in books written decades ago with photos of the people. So far I have never discovered one false positive.. ie one “match” that has not come from the general area that i suspect my ancestors lived. if I ever find a “match” that doesn’t think they originated from a similar area I’ll change my mind. But most of the tests I run are NO matches..and by that I mean nothing above 2cM . So I really don’t think I have seen “false Positives” at all. That said, My father matches a no of those ice men (if I consider 2cM tc a match..I don’t for genealogy unless there is a known tree as well!) ..BUT again..the three he ” matched” all came from each side of the area of where his family is thought to have originated…and he matched NONE of any of those further away. So yes, I think it tells us something. Will the match be a third cousin..well not in any of my cases but it’s theoretically possible. I haven’t keep this documented..just what I’ve been observing and what I have found helpful to narrow down regions where my ancestry originated.
        My understanding is that false positives (from what i’ve read, but may not be correct) are part of each of our two chromosomes being read,.this joining together to get a length of matching DNA segment with a chromosome. So , as far as I can see, this just means part from your father’s side and part of the match from your mother’s side..nothing false there at all..actually expected?
        Of course if you are only looking for 4th gen and earlier..none of this probably applies. Unfortunately I’m looking at 6th to 8th gen or older matches to try to work out areas,..and even more with US “Matches”. Funny thing about the US “matches” too. They are living in areas that has been passed down by word of mouth of where the rest of the family migrated to…
        I really hate it when I have a known word of mouth history of an ancestry find a small match with someone who has the same name in them family and I am summarily dismissed as he atch is OBVIOUSLY a false positive..when they may have the actual place they migrated from… (happens too many times)
        Thing is I compare most folk and I get ZERO for a match. “small” matches are really only a small fraction of all the people in the datatbase. if they were more numerous , I’d say you have point..but they are more “rare”.

  5. Jim Bartlett 6 January 2017 / 6:47 pm

    I think that AncestryDNA’s algorithm mix reports most segments smaller than the other two companies and GEDmatch; and many Matches are mixed. I have a lot of GEDmatch Matches with AncestryDNA kits, which I can find the owners at AncestryDNA along with a report that they are not a DNA Match with me…. duh! I’m looking at it, at GEDmatch, in a TG.)

    Also, based on analysis I did of one persons trio-data at FTDNA and GEDmatch, the FTDNA IBC rate was around 5-10 percent, the rest of the matches, not found at FTDNA, could be found at GEDmatch (which doesn’t have such a restrictive algorithm)

    And, I’ve found many shared segments in the 5-10cM range at GEDmatch to NOT Triangulate with sufficiently overlapping segments found in TGs on either side. This clearly marks them as IBC, IMO. So I use Triangulation to eliminate many 5-15cM segment which are IBC. Can I guarantee that all IBC shared segments are identified? – no I cannot (I’m not sure how we could prove that). Am I confident that almost all IBC shared segments are identified this way? – yes (and if a few slip through, so what? over half of the Matches won’t even reply, and the identification of a CA for a TG takes much more than a single CA anyway.)

    • Louis Kessler 7 January 2017 / 10:32 am

      I’ve been using much of Jim Bartlett’s excellent segment analysis material and I agree with his observation that almost all triangulated segments are IBD down to 5 cM.

      I think Double Matching and Multiple Matching and Crossover comparison methods are new methods that also work down to at least 5 cM, could and should be used to help you determine which of your matches below 15 cM are solid matches.

      • Blaine T Bettinger 7 January 2017 / 10:56 am

        Yes, Jim is in a very unique position to evaluate IBD! I value his insights immensely.

        And I agree, tools like triangulation may be helpful in identifying IBD, as long as it isn’t relying on close relatives on either side to be the third leg of the triangulation group.

  6. David Negus 6 January 2017 / 7:40 pm

    I reposted this comment on ISOGG because I don’t know how to attach a diagram here. If you are interested in the diagram, look at ISOGG on Facebook.
    Like Don Worth in the comments to Blaine’s article I am also older than dirt, and lack parents, aunts, uncles, or siblings to test. My son has tested at 23&me, and I have uploaded his kit to gedmatch.
    Although it is not a substitute for having tested parents, I can semi-phase and check for false positives using his kit for those kits which are uploaded to Gedmatch, and semi-phase ,without having the check on false positives, for kits at 23&me.
    I have attached a chromosome map from my tree so you can see how it works.
    I locate my son’s (abug) crossovers by looking for places where I know from my matches that he has switched sides, and more frequently for places where he matches neither side. The LL segment in the diagram is such a place. Using 3cM 300 Snps resolution on Gedmatchs I can see abug matches the first half of the LL segment only and 0abug (the evil twin) matches the second only. The evil twin utility is now one of my most used tools on Gedmatch. It doesn’t tell me which parent the segment comes from, but it is a pretty accurate about which matches are on one side and which on the other.
    I don’t chart segments under 10 cM, so I can’t comment on that. I do find false positives in the 10-15 cM range. I think we would be more accurate if we used cM to assess how far back in time a match is, and SNPs to discuss the probability that the match is accurate. (I got this idea from Paddy Waldron, A Beginner’s Adventures in Genetic Genealogy). Using SNPS, the 95% range is, I would guess, from 1000 to 2000 SNPS. I would love to see studies like Blaine’s using SNPs instead of cM.
    I have found it relatively easy to figure out false positives (MO FA MO segments) using Gedmatch at high resolution, comparing myself and the match to other segments in the triangulation group.
    Testing errors–false negatives mostly, I would think–are harder to detect. Many are due to the fuzzy boundaries of segments–especially comparing one testing company to another–so that a segment is over the threshold in one test, and under in another. I recently tested a second time at Ancestry. My V1 self had 5% fewer 4th-6th cousins matches than my V2 self. The V2 chip seemed to tend to slightly high cM numbers.
    Maybe I’m missing something, but the only ways I can see passing a false positive from parent to child, is if both parents share a segment (which is easy to test for), or if the match is the false positive, and falsely matches the parent child shared segment. This can be discovered in Gedmatch.
    A lot of the problems which people are facing in using small segments are being solved by the ever increasing size of the databases. In 2012, when I first tested, the matches were so few that I mined those 3 cM segments fruitlessly. Now that my 2 American colonial grandparents bring in unambiguous matches faster than I can log them, it is easy to only focus on those that clearly match. I am still waiting for my English grandfather and German grandmother–but the English have started to trickle in, and I have resigned myself to the probability that people in Opava in the Czech Reupblic, won’t be testing in my lifetime.

    • Blaine T Bettinger 6 January 2017 / 8:20 pm

      I am mystified by the fascination with small matches. It will take a lifetime to work through the large matches I have!!

      • Maureen 7 January 2017 / 2:06 pm

        I would definitely agree with you, Blaine. I have so many large matches with unknown cousins (probably because they often have no/small tree, few, if any, shared matches and I have two brick walls after 2 great-great-grandfathers) that if I find small cM matches I really do not pursue it…life is too short. But your explanation was great and reinforces to me even more why pursuing small matches is not worth it until any large matches are explained.

      • Kathleen 7 January 2017 / 5:34 pm

        Well, aren’t you blessed, Blaine? LOL!

  7. Curtis Uhre 6 January 2017 / 7:41 pm

    Thanks Blaine for a very helpful study. I just checked one Ancestry DNA file that has approximately 120 Ancestry “leaf matches”. According to Ancestry this means there is a DNA match as well as an common ancestor match in the respective trees. About half of Ancestry’s “leaf” matches have segments of less than 10 cM and 10% were at 6 cM.

    While I completely agree that segments of 10 cM or less have a large percentage of false matches and people need to be aware of that fact, let’s not throw the baby out with the bathwater. I hope everyone will continue to work to find ways to dig the gold from those 10 cM matches. I think Don’s suggestions of possible methods are intriguing.

    • Blaine T Bettinger 6 January 2017 / 8:24 pm

      I agree completely! I wrote above:

      “Do I think that all segments below 10 cM are forever poisoned? No, I believe that this an area where whole genome sequencing will have the strongest impact. And other methodologies and technologies may help alleviate this issue in the future.”

      And I firmly believe that. If anything does it, it will be better testing, better phasing, and bioinformatics. And we MUST pursue those avenues.

      But, in the meantime, we have to throw the baby out with the poison bathwater. Or, perhaps better, we have to put the baby and the bathwater away for a while.

  8. Caz Brymora 7 January 2017 / 1:09 am

    Great article Blaine.

    I agree with more and more people coming on board with DNA, but not understanding false positives, there is a lot of wasted time and effort. How to work out what to put our time into, is the question.

    I wonder whether Ancestry already filters out small segments with small SNP counts?

    Some analysis (done by someone else) with my kit and both my parent’s kits indicated that the SNP count was important for accuracy of segments.
    For example at 1000SNP, a match of 7cM had 24% chance of being accurate and at 12cM it was 88%. At 2000SNP, a match of 7cm had 74% chance, and at 12cM it was 99%
    I think I have seen a few of my smaller matches upload to gedmatch, and the segments had higher SNP counts that generally seen for 5-8cM segments. Unfortunately I didn’t take notes of which kits these were.

    I would love to see more analysis of child/parent/parent triples using Gedmatch data.

    • Blaine T Bettinger 7 January 2017 / 4:01 pm

      Thanks Caz!

      Ancestry filters out all segments smaller than 6 cM, but I don’t know what their SNP threshold is. It might be in their white paper, I’m not sure (and too lazy to check!).

      I would love to have seen the 23andMe study on small segments include SNP density. Maybe that’s a citizen science study that we all need to do! It’s impossible to use GEDmatch data (or 23andMe data) to do the analysis I did, since they have an artificial threshold of 2000 matches.

  9. caith 7 January 2017 / 10:05 am

    Age and apples-to-apples: A consideration for exploring a 5cMs match. I am 73, and if I have a match with Mary, who is 25 years old, Mary could conceivably have been through recombination 2 more times than I have. Meaning: to compare apples to apples, I could possibly have had a well over the threshold match with her grandparent who would have been approx. my age. Should we put our match on the same generational playing field to evaluate possibilities. Think about it………

    • Jim Bartlett 7 January 2017 / 11:54 am

      Caith,
      I have thought about it a lot, and studied it. In almost all cases, a small segment is passed down intact, or not at all! It is a popular misconception that all segments are halved in each generation – they are not. Only a few (34 plus or minus 10) crossovers occur each generation over all 22 autosomes, and generally only the largest segments are subdivided – the many, many smaller segments are left intact.

  10. Brooke 7 January 2017 / 11:05 am

    This sounds like a fantastically useful tool especially for those of us dealing with endogamy, but I am either really dense or my coffee has yet to kick in because for the life of me, I cannot locate this new little bit of magic anywhere on the site! And yes, I am a subscriber 🙂

    • Brooke 7 January 2017 / 11:07 am

      Yep, found it!

    • Blaine T Bettinger 7 January 2017 / 4:02 pm

      Great! You’ll love the tool, but beware the hours you’ll spend with it! 🙂

      • Brooke 7 January 2017 / 6:32 pm

        I gave up! I kept getting an error report…after hours of waiting for my father’s Ancestry data to be “gathered.” Probably user error, but still frustrating none the less. Erghhh.

        • Brooke 7 January 2017 / 6:33 pm

          And P.S. Blaine…love your book 🙂

        • Robin 17 March 2017 / 1:47 pm

          I found that you can’t save it to a folder where the file path has blank spaces (like My Stuff). It would have to be My_Stuff

  11. David 7 January 2017 / 11:47 am

    For our DNA matches at Ancestry.com, we are given a solid-looking number of CentiMorgans (cM) in common, but I’ll offer a recent example of mine from Ancestry.com of third person who matches 42 cM / 2 segments to my mother, and 45 cM / 2 segments to me, on the face of it impossible. The same person on GEDmatch shows us with 53.1 and 52.2 cM in 2 segments, suggesting that perhaps the cM numbers from Ancestry.com are soft, at least by 3 cM, and perhaps more likely 11 cM, or roughly 20% off in this case. So, not exactly solid.

    If all or a lot of Ancestry.com’s numbers are this soft, it makes the ground we tread anywhere near their match/no-match limit especially treacherous. I am grateful for the size and growth of Ancestry.com’s DNA database, and that their matching works at all and as well as it does when it does, but am convinced their matching algorithms are the least trustworthy of anybody’s—they need to improve, a lot.

    Ancestry.com’s invisible, dirty bath water needs a much better filter so I can see my babies (matches) more clearly and return to view those I can’t even see anymore because of Ancestry.com’s ridiculous, muddying Timber routine, and while we’re at it, do an about face on the indefensible position of not giving customers some method of seeing the details of how we match our matches. Is this is all a cover-up for poor work?

    Or do we need to consider allowing the DNA department to be run by someone free of the dictates of management and a board of directors to reduce costs at any cost, forcing employees to lie to customers. For a solution to controlling the related Customer Service costs, just tell customers they are on their own regarding working with the DNA match details, or even shift DNA-details support to the Hire an Expert department if you have to, at some reasonable cost, with charges dropped if indeed there is an Ancestry.com technical problem, and perhaps even cash rewards for discovering problems that would benefit everyone. Please, keep me from having to try to remember where I put my torch and pitchfork. 😉

  12. Sue B 7 January 2017 / 12:26 pm

    Blaine, I have just this past week completed a chart of about 60 matching known cousins, (matches greater than 1st 1xR), showing where we all match each other. I used the one-to-one tool at Gedmatch, and 3cM as my lowest shared cM. I am well aware of the 10cM standard, yet I wanted to see what this chart would accomplish.

    We 60 cousins all share the surname Lewis from 2 known lines. Each of us matches with autosomal DNA to 5 male Lewis cousins who have also tested yDNA in the rare F Haplogroup. We know our trees only to 5th generation, and are missing 2 generations of Lewis to tie our 2 lines together. We’ve paid researchers and only gotten so far. I devised plan to work this as a pyramid to build around the yDNA matches. I am lucky to have very eager cousins also helping to keep our group working.

    The results are amazing, we have multiple triangulations between 6-7th generation cousins, those between 3-5cM. We hope to use this data to snag more matches and find yet another living male Lewis to add another line from that 6-7th gen. I’m not sure what to do with it all yet, but we now have the data in one place. Your blog this morning made me wonder if you new what I was doing!

    Sue

    • Blaine T Bettinger 7 January 2017 / 4:13 pm

      Nice work! Just use caution with those small segments. There’s no evidence, yet, that triangulation of small segments increases the probability that they are real. Plus, shared small segments are almost always beyond the genealogically relevant time frame, according to several scientific studies. But I encourage people who know what they are doing to push the envelope, as long as they are aware of the limitations! Keep up the great work!

  13. Andreas West 7 January 2017 / 2:22 pm

    I don’t agree with Blaine’s strict view that we shouldn’t work with small segments. Luckily it seems I’m not the only one who has had success in triangulating then and also mapping the TG to the correct side as both my parents luckily are tested.

    I do agree with Blaine that bioinformatics is the way to go and as most of you know I’m working on an app that does help us indeed to go through thousands of matches to sort out the bad ones via triangulation.

    Then we can hopefully all concentrate on those 40% of gold nuggets in small segments while the rest is thrown away automatically.

    Now if I only didn’t had to work at a proper (paid) job to finance this hobby work, I would not only be more happy but we all could start to use bioinformatics to our aid and concentrate on communicating and collaborating with our true DNA cousins.

    Still, thanks for the effort of putting up those percentages Blaine, it shows clearly that not 100% of small matches are bad.

    To me, as a positive thinking person, that is rather the important message of this study!

    • Blaine T Bettinger 7 January 2017 / 4:20 pm

      As long as there is a recognition (and perhaps a disclaimer) that the best scientific research we have shows that most small segments are false or beyond the genealogically relevant time frame, and that there is yet no evidence that triangulating small segments increases the probability that they are real, I don’t have an issue with working with them. Since genetic genealogy is entirely a science, we can only rebut research with research, and all rebutting data so far has been entirely anecdotal.

      My problem is that people triangulate these small segments, find a common ancestor, and think that means that the small segments are real and came from that common ancestor. There’s never a discussion of the problem with small segments, and NEVER is there a discussion of Tree Completeness. If three people share a segment and one or more of their trees is woefully incomplete yet they find a common ancestor, how can there be any confidence in that common ancestor?

      And I know this comes off as negative, but I see small segments abused on a daily basis. I see people making incredible claims about their ethnicity and shared ancestry using these small segments every day. And, this is getting much worse since most test-takers these days are genealogy newbies, so they take everything and run.

      For the first time, perhaps, we have an area of genealogy that can be treated entirely as a science, and yet it feels like very few do.

  14. Andreas West 7 January 2017 / 2:25 pm

    For those interested in my project please click on my name as that is leading you directly to the website.

    Also thanks to Don Worth for continuing to build helpful tools for us. I too, keep my mind young by programming, so I feel very related (and can’t wait for retirement either).

  15. CathyD 7 January 2017 / 2:46 pm

    Am downloading my Dad’s matches from Ancestry to try this, and will attempt finagling our FTDNA matches (where my mom and sibs have tested also) into an Ancestry match format to use M-O-M for my FTDNA matches. I’ve already ranted on the ISOGG FB page about Timber giving my dad and I 113 matching segments instead of 22 (autosomal) so I’ll spare everyone here. 🙂

    • Kevin Ireland 7 January 2017 / 4:33 pm

      Arggh… sorry about the typos; my keyboard is dying. 😉

    • Blaine T Bettinger 7 January 2017 / 4:47 pm

      Excellent, thank you for sharing! With the exception of the total number of matches not shared with your parents, all your other percentages are incredibly similar. Supports my hypothesis that small segments are problematic.

  16. John Abbott 7 January 2017 / 4:52 pm

    Very interesting blog post. Thank you!

    One idea that occurred to me while reading is to follow your analysis procedure with our two children, looking at their small segments — they have had their parents tested (i.e. my wife and me) so that your procedure works. This might also be of interest to others from an understanding perspecive even if they don’t have both of their own parents tested.

    I was also interested to read that triangulation doesn’t necessarily work for small segments –need to study that comment

    • Blaine T Bettinger 7 January 2017 / 5:03 pm

      To be clear, triangulation MIGHT work to help identify real small segments. There’s just no scientific evidence to show that, just anecdote. Now, granted, it is anecdote from someone (Jim Bartlett) who I respect deeply and is probably the best “triangulator” in the world, but as a science we need lots of data from many people.

  17. CathyD 7 January 2017 / 7:30 pm

    I posted my Ancestry comparison on the ISOGG FB page, and then took a look at FTDNA, where both my parents have test results. Of course, we know FTDNA has its own limits, discussed elsewhere in this comment thread. I looked at the data by segment length rather than total cM. Approximately 1/2 of my matches (615 total) don’t match to either parent — and only 2 of those matches had segments >= 15.0 cM. My sister and my brother have approximately 42% matches that don’t match either parent; of those matches, ALL are < 15.0 cM. If I limit to 15 cM, I've got 50 phased matches to work with — plus another 67 of my sister's and 90 of my brother's. Plenty, really!

    (I suspect — but can't say for sure — that my match numbers are a bit different from my sibs because I transferred my 23andMe data, while my sibs and my mom took Family Finder tests. My dad also transferred — but from Ancestry.)

    • Ann Turner 8 January 2017 / 9:02 am

      CathyD — that’s the highest number I recall hearing about FTDNA. Are you using the longest segment column? The more typical number seems to be in the 20% range. The current number for my son (transfer from 23andMe) and my husband and me (both transfers from AncestryDNA) is 15.8% not found in either parent. I don’t have the exact number at hand, but it was an improvement from their recent change, where they no longer require a total of 20 cM if the longest segment is at least 9 cM.

      • CathyD 4 March 2017 / 1:58 pm

        Hi Ann,
        Yes, I was using FTDNA’s longest segment column in my matches list for my analysis. 615 of 1236 matches were not assigned to either parent. I’m a transfer from 23andMe. My Dad transferred from Ancestry. My mom and my 2 siblings did FTDNA’s native “Family Finder” test. Of my sister’s 968 matches, 402 (41.5%) don’t match either parent, and my brother’s % is similar (456 of 1054, or 43.8%) don’t match either parent.

  18. steve echard musgrave 7 January 2017 / 11:37 pm

    Thanks Don,It is a great tool however in my case using my son my wife and my dna, I am getting lots of matches for both my family and my wifes for my son. WE are related a bunch of fifth generation ancestors in common. Compounding that they are from two very endogamous populations, Acadians and Colonial Virginians particularly Shenandoah Valley colonists. For myself I set the bar pretty high 15cm minimum or 10 if there is matching genealogy.

  19. katy 15 January 2017 / 1:49 am

    like i mentioned in a fb group a few days ago, i usually look at the matches with a cm total of 11cm and above, but with the exception when dealing with triangulation segments, or when comparing my grandpa’s kit with someone he didn’t share 7cm with, but yet shared over 10cm with some of his matches that he shared over 15 cm with. when i compared my grandpa’s kit with their’s at a 3 cm/300 snp ratio, i found that he shared 4.3 (or something like that) cm segment with 1000 snps. plus that person’s last name isn’t that common and appears on his branch of my family tree.

  20. Dennis Lee Burman 18 January 2017 / 1:30 pm

    I see that GEDmatch has a new one-to-many feature where you can apparently now see all your matches, not just 2,000. Does this now allow you to do a similar analysis with the GEDmatch data?

  21. Mike 21 February 2017 / 12:58 pm

    My name is Mike. My grandmother is African American and part Afro Puerto Rican( Some African Dominican mixed in their too)She also has distant Malagasy and colonial English roots intertwined in her African American roots. My grandmother had a match that I remember seeing that was half Puerto Rican and half Dominican. The match was at 5.1 cm and 742 snps showing African ancestry from the chromosome painting. I saw three other people who are Puerto Rican and they matched that small 5.7 CM with the same number of 800 Snp’s and also showed African Ancestry in that area in all three. The match did not get passed down to me or my father. So a question I had is what happened if it is a real match at 5 cm, but multiple people from that same population, race, nationality, ethnic orgin, etc match, but the match did not get passed down. How do we decipher this. The match is real even though it is small. The match did not accept me when I reached out, but sisnce it was such a small match, I decided just to move on then to try to convince smeone we are related. From this it did open my eyes that some 5cm matches can be useful, but like everyone say just be careful and use Snp’s too alongside Cm’s.

  22. Muriel Certo 20 March 2017 / 6:05 pm

    I have 2700 cm match with one person and a 3400 cm match with another with the same name as that person – but none more than 5 cm – under ged match these would be false positive but it seems like 2700 and 3400 are high to be false – it says 1 and 1.2 generation
    Any ideas?

Leave a Reply

Your email address will not be published. Required fields are marked *