IT News & Events

News about IT at Indiana University and the world


IU doctoral student explores the history of life on Earth

Bioinformatics Ph.D. student Gregg Thomas develops and tests new theories using genomics data

As human beings, many of us want to understand where we come from as a species. The growing field of genomics means that we have more ability than ever to understand our origins, largely due to emerging technology.

With the goal to study the history of life on Earth, IU bioinformatics Ph.D. student Gregg Thomas has been hard at work developing and testing new theories using genomics data. The overarching goal of his research is to trace the pathways of life through time, such as his work reconstructing the history of arthropods. According to Thomas, “We all have a common ancestor in the past, and knowing the transitions from the past to present plays an important role in anticipating the future, which is especially relevant to the fields of conservation and human health.”  

IU Ph.D. student Gregg Thomas (and a furry friend)
IU Ph.D. student Gregg Thomas (and a furry friend)

Thomas’s work involves analyzing large files of encoded data that can be 2-3 billion characters long and represent multiple genomic sequences, and using these data to identify interesting differences and similarities between genomes. In a recent project, he and his collaborators were able to identify genes that were important in AIDS resistance by sequencing and analyzing the genome of the sooty mangabey, an Old World monkey. He explains that one misunderstood aspect of evolution is that it is not a step-by-step process but rather should be understood in terms of common descent and splitting lineages by isolation.

“The outcomes of evolution are shaped predominantly by adaptation to the environment and random chance,” he says.


An example of Thomas’ work—a phylogeny of 76 insects inferred using Carbonate.

To do his research, Thomas needs far more computing power than offered by a standard desktop machine to analyze genomes. Just one genome can range from a dozen to a hundred  gigabytes in size - and he is juggling 30 genomes in his current project alone. Looking for a better tool for analysis led Thomas to IU’s Carbonate cluster computer.

“The large memory capacity of Carbonate means I’m able to analyze large chunks of the genome at once without worrying about running out of RAM, and the many processors available mean I can analyze several genomic chunks at once, which vastly cuts-down on run-times”, says Thomas. Thanks to these available computing resources, Thomas has become more proficient at multiprocessor programming.

Thomas notes that while the genomics era is just beginning to boom, he’d like to raise more public awareness of data privacy and to actively work against misinterpretation of findings. While Thomas works mainly with data obtained from animals, he wants to emphasize that human data has a whole host of ethical concerns that researchers have to be mindful of. In Thomas’s own research, genomic data is a powerful tool for insight into animal disease and behavior that can be easily accessed and shared. However, the ease of access and use that makes genomic data so powerful for research could leave individuals more vulnerable to privacy invasions if the data is not carefully safeguarded and used only for specific, well-justified purposes.