[EDIT – June 26, 2016: Updated and detailed histograms are now available and should be utilized. See: “Update to the Shared cM Project.“]
One issue with the Shared cM Project, however, is that it is user-submitted data, meaning there are invariably two inherent problems that will affect that data: (1) data entry errors; and (2) relationships that are not accurate.
It is actually a very simple matter to resolve both of these issues, and that is to provide the distributions for the data. The distributions will show clearly where the outliers (the errors and the incorrect relationships) reside. To generate distributions, I enlisted the help of mathematician Ingrid Baade, who volunteered all of her time. I am forever in her debt for this contribution!
Here is a wonderful diagram that Baade generated with data from the Shared cM Project, showing distributions:
This image/file is shared pursuant to a CC 4.0 Attribution License (23 December 2015).
Reviewing the image, the outliers (errors) become clear. For example, looking at the ranges from “Visualizing Data From the Shared cM Project,” we see that someone reported 121 cM for an Aunt/Uncle/Niece/Nephew relationship. This relationship is reported as “Degree 2” in this chart. We see the 121 cM in the middle expansion box, which is an extreme outlier from the central distribution of “Degree 2” around 1700 cM.
As another example, for half-siblings, the reported range was 787 to 2143 cM, with an average at 1731 cM. Half siblings are also “Degree 2,” and we see that for the small handful of “Degree 2” relationships reported in the range of 787 to 1200 cM, these are extreme outliers. Accordingly, these individuals should be considering another explanation for their observed genetic relationship. Indeed, thanks to the chart we see that these appear to fall squarely within the distribution for “Degree 3” rather than “Degree 2.”