Exploring the Known Unknowns Using the Power of Metagenomics: Discovery of the crAssphage

By Connie Chen, Microbiology, ’15-’16

Author’s Note:

“Metagenomics is the study of genetic material directly from environmental samples such as the soil or the human gut. With whole metagenomic sequencing, it is possible to obtain and analyze every piece of genetic material in the sample. As we being to learn more about the world, it becomes evident that there is more that is unknown. The crAssphage is an example of a “known unknown” because through metagenomics, the virus’s genome has been built and certain properties can be interpreted from the genome, but it has never been seen under a microscope and there is much still unknown about the virus. Metagenomics have opened the doors to analyzing multiple sequences and determining the ecology of the environment. Because metagenomics is becoming more prevalent, it is essential to understand the potential of this growing field. I hope that by learning about the potential of metagenomics, new ideas can sprout from using this technology in order to help others.”

CrAssphage: Bacteriophage with a circular double-stranded DNA viral genome, 97 kilobase pairs that contain 80 predicted open reading frames (ORFs). It was named after the cross-assembly analysis software used to find the viral genome.


The crAssphage caught my attention because it was discovered very recently in 2014, and was found solely through metagenomics. Metagenomics is the study of genetic material directly from environmental samples such as the soil or the human gut. With whole metagenomic sequencing, it is possible to obtain and analyze every piece of genetic material in the sample. According to Dr. Robert Edwards, who is the principal investigator of the crAssphage project, the bacteriophage is very widespread and found in every other person. It has also been on earth as long as humans have roamed (Price, 2014). This bacteriophage has never been viewed under a microscope, yet the entire genome has been constructed based on the DNA sequences found in metagenomic databases through cross assembly analysis. Dr. Edwards describes the analysis strategy as “identifying the phage in different samples, putting it together in one bucket and analyzing them as one group” (Price, 2014). In other words, they identified genetic sequences that was present in all the samples, and then piece them together based on similarity. Properties of the phage and its possible hosts have been deciphered through analysis of the constructed genome. I decided to further study this bacteriophage, and find out what potential it could have in future studies.


In the first article published about the crAssphage from Nature Communications by Dutilh et al. (2014), a team of researchers used metagenomic strategies in order to assemble the crAssphage genome, encode proteins, predict the phage-host, and determine the ubiquity of crAssphage in public metagenome databases. In order to assemble the genome, the team used an analysis strategy termed cross-assembly analysis. Fecal samples from twelve different individuals were sampled and analyzed. It was noticed that a cluster of viral DNA, about 97 kilobase pairs long (ten times larger than the human immunodeficiency virus (HIV)) was found in all the samples (Price, 2014; Yong, 2014). According to Dutilh et al., around 75% of the DNA from any stool sample- and as much as 99%- won’t match any of the known viral sequences in databases (Price, 2014). After checking the genome against this list, Dutilh et al. determined that the crAssphage was a new virus.


The crAssphage is a bacteriophage, which is a virus that infects bacteria. Interestingly, the human gut virome is dominated by bacteriophages that are mostly unknown (Minot et al., 2013). However, it was found that the phage is widespread and abundant after searching through many public metagenomic libraries. Dr. Edwards states that the phage is very common and is likely to be in every person, if not every other person. The team of researchers predicted the host bacteria as Bacteriodetes, a common phylum of gut bacteria that live toward the end of the intestinal tract in humans, and suspect to play a major role between gut bacteria and obesity. Perhaps the crAssphage can determine the abundance of Bacteriodetes that can be present in the human gut and have an effect on the human gut ecology. Unfortunately, it is not certain if Bacteroidetes is the clear host for the phage. Homology searches were done and results did not provide clear clues as to what the bacterial host(s) is for this phage. In order to determine the true host of the phage, the phage must be isolated and produced, which has not been successful so far.

No sources have mentioned if the bacteriophage is pathogenic to humans. Due to the abundance and large presence, it is not likely that the phage is dangerous to humans. Although the method of transmission is unknown, it is predicted that the phage is transmitted from mother to child (Prince, 2014). More research is needed in order to confirm what the transmission sequence is. According to Dr. Edwards, he is curious about the mechanism of entry and what the phage does to Bacteriodetes in future studies (Price, 2014).

Future Direction:

In order to study the mechanism of entry and replication of the phage, a mass amount must be made in order to perform tests. However, it is not possible if the host is not grown. Unfortunately, the crAssphage has not been isolated in order to study its mechanism of entry and replication.  Since most gut bacteria won’t grow easily in a lab, the phage is difficult to rear (Price, 2014; Yong, 2014). Not enough is known in order to determine the life cycle of the virus, like whether the bacteriophage is a lytic phage such as Bacteriophage T7 or a temperate phage such as Bacteriophage λ. In an attempt to isolate the bacteriophage, a plaque assay was performed. However, the results were unfavorable and Dutilh et al. determined the plaques that did form are from different bacteriophages and not the crAssphage. The research team determined that more information about the phage itself is needed in order to find better ways to isolate the virus.


Ever since metagenomics was introduced, there has been a great effort towards discovering viruses that are known to exist, but have not been identified -also referred to as the “known unknown.” After a sample has been processed, data is given back as mass amounts of genetic sequences and most are identifiable after aligning or matching the sequences with an annotated reference database. These “known unknown” viruses are shown as genetic sequences that are difficult to identify even after using standard methods (Dutilh, 2014). The team consisted of both biologists and computer scientists that worked together in order to find the virus. They believe that metagenomics is the next step into personalized phage-therapy in the clinical world. By utilizing methods in order to discover unknown viruses, one could isolate and identify a virus, and then find a personalized treatment for that one specific virus (Yong, 2014). This could be a new path in the world of medicine.


Dutilh, Bas E (2014). Metagenomic ventures into outer sequence space, Bacteriophage, 4:4, e979664, DOI: 10.4161/21597081.2014.9796664.

Dutilh BE, Cassman N, Sanchez SE, et al. (2014). “A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes” Nature Communications. Vol. 5, Article Number 4498.

Mello, LV., Chen, X. & Rigden, DJ. (2010). Mining metagenomic data for novel domains: BACON, a new carbohydrate-binding module. FEBS Lett. 584, 2421–2426

Minot, S. et al. (2013). “Rapid evolution of the human gut virome”. Proc. Natl Acad. Sci. USA 110, 12450–12455. 

Price, Michael (2014). “Novel Virus Discovered in Half the World’s Population”. http://newscenter.sdsu.edu/sdsu_newscenter/news_story.aspx?sid=75082. Video.

Yong, Ed (2014). “Why Has This Really Common Virus Only Only Just Been Discovered?” National Geographic.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s