(Jim Watson via Wikipedia)
As if there wasn’t enough to worry about during the genetic revolution, researchers have found a way to characterize redacted genetic sequences from whole-genome or large-scale sequencing.
Here’s how it works. Let’s say that Mr. X has had his genome sequenced, but doesn’t want to know the results of some genes known to influence the development or progression of Alzheimer’s Disease. So when he receives his genomic sequencing, these genes have been ‘redacted’, or removed from the data. This is exactly what James Watson decided to do when he received his data.
Characterizing Redacted Genes
However, researchers have characterized one of Watson’s redacted genes by examining the sequences surrounding the gene in question. Often, when we inherit a gene from our patents, we receive that gene as well as some of the surrounding genetic sequence. By examining the surrounding sequence, some insight into the redacted gene is gained. For example, if I gave you the quote “A penny _____ is a penny earned”, you can derive from the surrounding words that the missing word is “saved.”
From an article discussing the researcher’s work:
“When the researchers told Watson about the paper’s results prior to publication, he redacted an additional 2 million DNA letters surrounding his APOE gene. This will make determining his redacted sequences much more difficult to decode – but not impossible, the authors write.”
Ethical Concerns
This ability, of course, raises numerous ethical concerns. If we value the protection of privacy, even for people who make part of their genetic sequence available online, how do we protect their privacy? Asking people to avoid this type of analysis won’t work, of course. Is the only answer to redact huge portions of DNA surrounding redacted genes? Or are we faced with an all-or-nothing question: either people put their entire sequence online (or just portions but face the risk of this analysis) or they keep their sequence private?
The authors of the study are also concerned about the potential problems. From the paper:
“We believe the potential for such indirect estimation of genetic risk has considerable relevance to concerns about privacy, confidentiality, discriminatory and defamatory use of genetic data, and the complexities of informed consent for both research participants and their close genetic relatives in the era of personalized genomics.”
For more discussion, see the always-great Genetic Future. See also “DNA detectives can decode ‘censored’ genomes” in New Scientist.
The article: Dale R Nyholt, Chang-En Yu, Peter M Visscher (2008). On Jim Watson’s APOE Status: Genetic Information is Hard to Hide. European J. of Human Genetics (DOI: 10.1038/ejhg.2008.198).
Thank you for highlighting this very important development. I think promising subjects the ability to completely and permanently redact information they wish to keep private is not a wise approach. Informed consent protocols need to adjust, and fast, to this new reality – similar to the recent decision by NIH to pull their pooled GWAS data as researchers were able to discern individual identities. I dont know what the answer is, except that the model of ” selective redaction ” of relatively small parts of the sequence is no longer viable. Evidence is mounting quickly in this regard, and I imagine there is much more to come..
Great point, Dana. As a number of academics in this field have pointed out, genetic anonymity is nearly impossible. This research really supports that conclusion.
I don’t know what the answer is, other than to inform people that redaction and anonymity is almost certainly impossible. And more than just informing them on paper, I would try to ensure that they understand the possible ramifications.