Study Finds Public DNA Databases Cannot Guarantee Anonymity
Computer-savvy individuals with Internet access could potentially identify names of people who ‘anonymously’ donate their DNA to widely available public databases used by scientists conducting research studies. For that matter, these computer connoisseurs may even be able to identify people who never provided a genetic sample themselves, but their relatives did.
This conclusion was reached by a consortium of researchers from Baylor College of Medicine, the Whitehead Institute in Cambridge, Mass., and Tel Aviv University who published their report in the Jan. 18 issue of Science.
|Amy McGuire, Ph.D.|
A number of databases house genetic samples from people who are participating in research studies. In the interest of advancing science, those databases are publicly available on the Internet. The idea is that scientists around world will have free access to samples, be able to compare DNA from many people, and link certain genes to certain diseases or traits which ultimately will lead to treatments and cures.
While previously published scientific articles have looked at the vulnerabilities of such databases, “this is the first one that shows you can identify a person even without a genetic sample from that individual,” said Amy McGuire, Ph.D., director of the Center for Medical Ethics and Health Policy at Baylor.
The ability to identify study participants and even their relatives based on genetic samples is made possible by combining openly available genetic databases created for scientific and medical research purposes with genealogy databases that trace family history and are sorted by surname, McGuire said.
“One big elephant in the room is that you could potentially identify someone based on publicly available data that his or her fourth cousin decided to share in a genealogy database,” McGuire said.
To illustrate this point, Yaniv Erlich, Ph.D., a fellow at the Whitehead Institute, led a team of researchers who looked at genetic samples from 32 men participating in the 1,000 Genomes Project, an international research effort to sequence the genomes of 1,000 men from all ethnicities and thereby establish the most detailed catalogue to date of human genetic variation. The men’s genomes are publicly available in a shared database.
The researchers compared a pattern of repeating DNA letters from the men’s Y chromosomes to the corresponding patterns of men who had posted their genetic data on two popular genealogy websites. Y chromosomes correlate with surnames because both are passed directly from father to son.
By doing this comparison – and using public records such as obituaries and the Coriell Cell Repositories website which handles 1,000 Genome Project resources and includes the men’s ages and states of residence – Erlich’s team was able to identify five of the men and many of their family members. In all, they identified almost 50 people.
The Whitehead researchers contacted McGuire about the ethical issues involved and subsequently shared their results with officials at the National Human Genome Research Institute and the National Institute of General Medical Sciences, which are both part of the National Institutes of Health. As a result of this disclosure, those two institutes moved some demographic information from the publicly accessible portion of the Coriell cell repository which is funded in part by the National Institutes of Health and accessed by scientists around the world.
The leaders of the two institutes, Judith Greenberg, Ph.D., who heads the National Institute of General Medical Sciences and Eric Green, M.D., Ph.D., who heads the National Human Genome Research Institute, wrote a perspective on the issues arising from this work in the same issue of the journal Science, calling for renewed dialogue on the issues involved.
“It is time for public dialogue about the risks to people’s privacy,” McGuire said, “and how to promote research in this area while protecting individuals from unwanted intrusion.”