Inside every human is a story written in genetic code

A sequence of 3.2 billion letters, made up of chemical compounds known as DNA, weaves together a narrative that stretches across eons back to the dawn of life of this planet. In 2000, for the first time ever, scientists were able to announce that, thanks to the human genome project, they had finally read the entirety of that story from cover to cover, and that they were busy drafting a plot summary. 

It was a hopeful moment, filled with expectations that new genetic insights would usher in the golden age of human health and wellness. US President Bill Clinton summed up some of those expectations as follows: 

“With this profound new knowledge, humankind is on the verge of gaining immense new power to heal. Genome science will have a real impact on all our lives and even more on the lives of our children. It will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases. In the coming years, doctors increasingly will be able to cure diseases like Alzheimer’s, Parkinson’s, diabetes and cancer by attacking their genetic roots.”

By 2019, however, it has become evident that the links between gene sequences, gene expressions and diseases are in reality much more nuanced and complex than first envisioned. Nonetheless, the scientific dream of deciphering the genetic code continues to advance, and with the cost of sequencing becoming increasingly cheaper, the ability for every individual to have his, or her, genome sequenced is now a reality.  

Through direct to consumer testing (DTC), some companies provide this specific service. Customers are asked to supply a sample of their DNA, usually a tube of spit, pay a nominal fee, and will then receive a report with their raw genomic data. According to the initial marketing efforts of these companies, the data was supposed to reveal incredible insights about the user’s ancestry or medical risks.  Unfortunately, it turns out that these tests are not exactly the genie in the test tube they are made out to be. Depending on how the test was developed, individual results might be extremely helpful, or incredibly misrepresentative or altogether unclear. 

Historically, the problem with decoding a person’s DNA is the time and resources required to process such a large amount information and find minor abnormalities.  Our 23 chromosomes contain our DNA as a neatly packaged up version of this information. DNA consists of three nucleotides: Adenine, Cytosine, Guanine, and Thymine (ACGT). 3.2 billion of these nucleotides are aligned in different orders to form patterns, called genes, which are recognized by a cell’s toolkit to build protein products.  For many of these genes, researchers have identified the resulting sequence to protein relationship, but for many other genes the function is still unknown. The burning question that arose once these relationships became known was what would happen if one of those nucleotides were missing, or in the wrong order?  Would that disrupted sequence of nucleotides upset a critical biological function?  These individual nucleotide disruptions, called Single Nucleotide Polymorphism (SNP, pronounced “snips”), is what DTC testing is looking for.

A major drawback of DTCs is they can only look for known patterns.  The process of going to one specific location on an individual’s DNA to identify which nucleotide is located there is called genotyping.   Genotyping by design does not sequence an individual’s entire DNA sequence. In fact, the average DTC test only looks at 600,000 sites, which, considering the billions of sites available, means only about 0.02% of an individual’s DNA is revealed when using such a test.  If a sequence, like UUUGUACG, changes, to UUUGCACG, DTCs tests detect this change but they only bother looking for sequence changes where the function is understood. Thus the research community may already have established that the disrupted code UUUGCACG is associated with the development of cancer.  However, just identifying this change does not imply that an individual will develop a particular disease.  

Part of the reason for this is because our DNA contains two copies and for certain diseases, both copies need to be disrupted for a disease to occur.  Additionally, the association between a change in sequence and disease prevalence is established when sequencing is done on individuals with the disease. However, in order to be truly confident in that association, several large randomized studies need to be performed on thousands of individuals, and this level of research simply has not been done for many diseases. 

“People need to have a great deal of caution when an asymptomatic person comes walking in with an unsolicited genetic result,” said Dr. Jill Hagenkord, a molecular pathologist speaking at this year’s American Association of Clinical Chemistry (AACC) conference.  

Dr. Hagenkord was the former chief medical officer of several health tech startups including 23 and me, and color genomics.  In her years working with DTC testing, she has become known for cautioning the utility of information obtained from such testing.  One of Dr. Hagenkord’s trepidations, which are very well shared amongst peers, is based on concerns related to the population used to establish the genetic relationship to disease.  This means that the population utilized to study a connection may be enriched for individuals with the disease, known as a high-risk population, and as such not truly reflect the true risk to an average person.

Dr. Hagenkord’s foreboding comes from personal experience having worked with HFe a protein associated with iron storage disorders, Hereditary hemochromatosis (HHC). The recessive gene, meaning both copies of your DNA needed to have the sequence change, was thought to be the next big thing in population screening. 

“We thought we were going to be able to identify these people early and treat the disease better, and even identify people before they show symptoms,” said Dr. Hagenkord.  However, as it turns out, the prevalence of HHC was not as common as they thought and making the cost of screening entire populations impractical. A test to detect HFe abnormalities was developed on high-risk populations; however, when a more expansive study was performed it was revealed that only 2% of the group was at risk.

Additional problems arise with confirmation; there is no functional confirmatory test that can be done once an unsymptomatic person’s DNA sequence reveals an abnormality.  Comparatively, if a diagnostic test like a pap smear is positive a colonoscopy is done to rule out cancer.  

For most of these conditions, we don’t know the penetrance in an untested population, and we also do not know the prevalence. Nonetheless, the risk is often inferred when research shows a genetic link to a disease.  It is because of such uncertainty that government regulators prohibit DTC tests from providing a report that advises a consumer on their risk for developing a disease based on their DNA genotyping. It matters which population was used to develop the test.

Assessing the positive predictive value of a test, or in other words, measuring its ability to do what it is designed to do, is challenging. Further complicating the problem is genotyping algorithms, which don’t work all that well yet.  In order to scan even just 0.02% of a person’s genome, a computer has to learn to detect abnormalities. However, if the computer has not seen enough of the abnormality it will incorrectly identify them. 

With all of these caveats, it is difficult to distinguish useful information from the unreliable noise. Dr. Theodora Ross, molecular pathologist at the University of Texas, discussed this problem in more detail at AACC’2019.

“Silicon Valley may think genetic counsellors and MD’s that are seeing patients may be like taxi drivers, they don’t really need to exist in our day and age because we have Uber”, explained Dr. Ross.

Nevertheless, when it comes to taking action on disease treatment and prevention, interpreting the significance of genetic information is still tricky and requires expert review. At this stage one simply cannot teach an algorithm to detect disease reliably and accurately.  A confounding problem is that the genetic test is taken out of the context of the other symptoms, or lack of symptoms, that a person may be experiencing. Certainly, there is potential for the future utility of these tests, but at it, current state experts are skeptical to rely solely on a genetic test as proof illness.

That is not to say all genetic tests are useless, there are some incidences where prevalence in an unselected population has been studied, notably Lynch syndrome, hereditary breast cancer, and familial hypercholesterolemia.

Thus your spit may be able to tell you if you can smell asparagus in urine it cannot yet with certainty tell you whether you will develop heart disease.