Sunday, March 12, 2017

A Technique for Endogamous DNA Using GedMatch

I've written many times about how much more difficult genetic genealogy is when you're from an endogamous population, such as Ashkenazic Jewish as am I.  Since Ashkenazim are all descended from a relatively small group of people whose descendants all married within that group of descendants, we tend to share a lot of DNA--and the various genetic genealogy programs tend to predict matches are much more closely related than we actually are.  But there are ways to deal with that--and I'll discuss one below using my uncle's DNA.
My uncle's DNA Matches on GedMatch

The above figure is the first screen of my uncle's one-to-many matches on GedMatch.  (If you've tested at any of the big 3 DNA companies--FamilyTreeDNA, Ancestry or 23AndMe--upload your raw data to GedMatch for free.)  GedMatch has an algorithm which calculates the average number of generations apart various kits are from one another.  You'd think that concentrating on the closest predicted kits to find relationships would work--but that doesn't generally work well when you're from an endogamous population.

My uncle has lots of known relatives who have tested (mostly because I asked them to do so, and a few who I've connected with via DNA).  And the closet predicted relatives are some of those relatives.  But there is a clear difference between those known relatives and the predicted relatives--and that is the largest segment shared.  While "Unknown relationship1" (UR1) is predicted to be 3.3 generations from my uncle--the same as his known third cousin who appears two lines below UR1, look at that largest segment.  To me, that says that UR1 is likely much further away than 3.3 generations, and the amount we share is because of endogamy.

There are two people on this first screen who do look interesting--"Unknown Relationship5" (UR5) and "Unknown Relationship9" (UR9).  They have segments in common with me over 20cM and share quite a bit of DNA.  Well, it turns out that UR5's father's family is from the same very small village as my maternal grandmother's family.  I've been able to help UR5 bring his grandfather's family back multiple generations with no interesection to mine.  We're waiting for his grandmother's death certificate to arrive to find out her parents' names--and likely we'll be able to connect our families using that information.

As for UR9, it looks like I should contact him and see what we can figure out.  Maybe there'll be a future blog post on our relationship.

On GedMatch, you can sort by largest segment by clicking on that header.  Doing that with my uncle's DNA yields the following:
My uncle's matches on GedMatch sorted by largest segment
Sorting by largest segment has my uncle's closest known relatives clustered at the top (similar to sorting by estimated relationship), but then there are multiple individuals with at least one large segment in common with him about whom I know of no connection.  That continues onto his next screen of results:
My uncle's matches on GedMatch sorted by largest segment--second screen
You can see that these known cousins all share reasonably large (20cM+) segments with my uncle.  In addition, they also share significantly more DNA in general with my uncle, making them jump out from the rest of the pack.

While it doesn't mean that the others on this list aren't related in a way we can trace, it does demonstrate that in endogamous populations, you need to look at both largest segment and at total DNA shared, and initially investigate those matches with both factors.

This technique can be used with FamilyTreeDNA as well, as it allows for sorting by largest segment and also shows total DNA shared.  Did it work for you?  Let us know in the comments!

Note:  I'm on Twitter.  Follow me (@larasgenealogy).

Want to get future blog posts emailed to you automatically?
Enter your email address:

21 comments:

  1. My one true 3rd cousin that I know of at GEDmatch matches 21.4 cM maximum with me and 25.7 cM maximum with my uncle. I've got 50 people at GEDmatch who have at least a 20 cM maximum that don't look close at all. But maybe I can get lucky and find out how at least one or two of them are related.

    ReplyDelete
    Replies
    1. And that's still above that magic 20cM threshold. How much DNA do you & your uncle share with the cousin overall?

      Delete
    2. At GEDmatch, I share 79.3 cM and my uncle share's 95.8 cM. I have 260 relatives at GEDmatch who share more than 79.3 cM with me, yet I have no idea who any of them might be or how they connect.

      Now that I look closer, I see that two of those 260 are people you administer. One is at 105.2 cM, max is only 9.4 cM, but the other 83.9 cM with a good max of 17.7 cM. Hmmm. It seems that 17.7 cM segment appears to be one that both you and I share. It's on Chromosome 1 between 81,704,406 and 101,397,056 and is 5,211 SNPs which should be significant enough to be considered relevant. You, my uncle and I triangulate on that segment, and from my DMT program, I have many others who triangulate with us there as well. Do you know who that segment maps to for you?

      Delete
    3. That segment comes from my paternal grandfather (as my father shares it, but my paternal grandmother does not). However, none of my grandfather's 6 tested cousins share it, so I can't narrow it down further. The family was from Volhynia.

      Delete
    4. That is very interesting. My mother's father's family (surnames Girman/Herman, Lapides/Lapedes, Zew/Zeff, Tenenbaum, Pisetsky, Gershfield, Sitner, Zimberg, Zaidman, Ginpil, Mindess, Shapiro, Gurevitch) comes from Mezhirichi which is in the province of Volhynia. A large number of people from that town emigrated to Winnipeg in the early 1900's and built their own Mezhiricher shul here where they congregated. I'm hoping in the future (when I get some time - ha ha) to do a one-place study of Mezhirichi.

      In addition, my uncle (my father's brother) triangulates on that same segment but only from 88,633,118 to 101,159,333 (10.7 cM). This could be indicative of a more distant common ancestor on my father's side.

      Delete
    5. After looking through all your posts about your Volhynia relatives, I see that there is still a major gap in us getting a connection here. First none of our surnames match. Second, our closest towns are about 100 km from each other.

      Seems like at least for now, this 17.7 cM segment (which may actually be two shorter overlapping segments - one on my maternal and the other on my paternal side) does not have enough genealogical evidence to put any substance to it.

      But I'm sure one day, we'll figure it out. Until then Lara, we'll just have to be satisfied being DNA-cousins.

      Delete
    6. I'm honored to be your DNA cousin.

      Delete
  2. Thank you for reminding me of this tip. When I sort my husband's results at GEDmatch this way, I see that the first unknown relation shares a 35 cM segment with him.

    A known maternal first cousin shares a largest segment of 80.3 cM and a known paternal second cousin shares a largest segment of 43.7 cM so the next few might be worth reaching out to.

    ReplyDelete
  3. Thanks for a great review, Lara! Of course for those of us who have fewer tested relatives close in the family, the table may be predominates by spurious matches, I suppose. Gotta take a look...

    ReplyDelete
    Replies
    1. True, but this can help you decide which may not be spurious.

      Delete
    2. I have two matches with 64 and 70 cM longest fragments (128 and 117 total). The Gen estimate is a bit longer than your top lines in the first table show, at 3.4, 3.5 (it's primarily a function of the overall share length). Both of our families have deep trees, and no common 2nd gg parents, albeit one town is shared. The segments fall apart when we try to phase, though...
      The way I look at your great data collection here is, for up to 2nd cousins, you can trust the estimate. Often, for 3rd cousins. too, the estimated relation comes close. Anything more distant is a wash... getting more and more equivalent-length spurious matches as the true relationship grows more distant...

      Delete
  4. Generally I agree, but I've found that DNA comparisons are not an exact science. I have a couple of known 3rd cousins with segments as large as 77 cM; and one known 2nd cousin who only has a 17 cM segment as the largest common denominator.

    ReplyDelete
    Replies
    1. Oh, it's definitely not an exact science. This is just a good start in terms of who to contact, and then you can go further down the list.

      Delete
  5. I also have found largest segment to be a better predictor IF there is also a fair amount of shared cM overall. I have decided that unless a kit shares at least 115 cM and a shared segment of at least 20 cM, it's not even worth contacting the person. And even in those cases, I have yet to be able to find a common ancestor. These are all on my maternal side where I have been unable to trace any of my own ancestors any further back than my great-great-grandparents. VERY frustrating!

    ReplyDelete
    Replies
    1. Don't give up. It'll come together at some point!

      Delete
  6. What about those that share a longest segment of 21.1 on #6 and another of 11.5 on #16 but a total of cM's over 7 of only 32.6? This by the way is for Tolchin that you administer and my dad. My dad has 8 mathces to you and my mom has 12. But when the total cM's are not very high but the longest is over 17 is this something to look at?

    ReplyDelete
    Replies
    1. While numbers like that wouldn't make your kits high up on my list of people to contact, it's always possible. If you email me your dad's kit number, I can take a look.

      Delete
  7. Thanks Lara, the wealth of examples and your point of view are very helpful! I would like to remind that the total amount of shared cMs which Gedmatch' one-to-many page presents for the two matches - can't be compared with FTDNA's "total cM" which you've mentioned in your other reviews on Ashkenazi endogamy; this is because Gedmatch most probably sums only the 5cM+ segments (with more than 500 SNPs each - in case one is an FTDNA client; it's different for most of the 23andMe clients, as I've noticed); while FTDNA presents on its matches lists (in the "total" column) the sum of all the 1cM+ segments which have at least 500SNPs each. This makes a very big difference sometimes, and usually Gedmatch' "one-to-many page" totals are much lower numbers than FTDNA's totals; in fact, in my case FTDNA categorizes as a 2nd-3rd cousin of mine every match who shares with me a total of 130cM or more, no matter how long (or short) our longest shared segment is; in my brother's case it's not the same, and so far I haven't succeeded to find out why matches who share with him about 150cM in total and a longest segment of about 20cM are still estimated to be his 3rd-5th cousins, while matches sharing with him a lower totals and shorter longest segments are estimated to be his 2nd-3rd cousins... Anyway, when you write about high-enough totals like 140cM and higher, do you relate only to Gedmatch' "5cM+" toatls or also to FTDNA's "1cM+" ones? Thanks in advance for your reply!

    ReplyDelete
    Replies
    1. True. Here I was using GedMatch numbers. I should (in my free time :) ) run some of the same things on FTDNA and do a comparison.

      Delete