Skip to main content

IT News & Events

News about IT at Indiana University and the world


Scientists use Jetstream to improve the usability of massive genomic sequence archive

Chances are, you’ve had the experience of searching a topic online and coming up with far more results than you could ever sift through in a human lifetime. You might find one article you like, but what are the odds of finding another? Some research databases (like JSTOR and Google Scholar) offer the handy option of seeing articles similar to the one you like, magically whittling the millions of options down to a few that contain exactly the information you need.

Now imagine you’re a microbiologist working to understand the human microbiome (the collective genome of the microorganisms living in and on the human body). Maybe there’s a particular type of bacteria you’re studying, and you want to know where else it has been found. On the bright side, for the last 20 years or so, scientists have been working to sequence the genetic codes of millions of organisms, viruses, bacteria, and environments. Based on long-standing agreements between scientists, completed sequences are deposited into one of three linked repositories, making them available to you and other scientists around the world. However, the number of sequences in the repositories sits at the impossibly large 1016 bp (each bp is like one letter in a word), rendering it inaccessible to researchers without the specialized skills necessary to search it.

Rob Edwards, Professor of Biology and Computer Science at San Diego State University

Rob Edwards, Professor of Biology and Computer Science at San Diego State University, seeks to simplify the research process through the Sequence Read Archive (SRA), a searchable bioinformatics database that makes sequence data available and accessible. The "search SRA service" uses Jetstream, an on-demand, cloud-based high-performance computing resource, to create a platform through which scientists can compare their sequences to others throughout the archive. The search SRA service effectively eliminates the learning curve associated with searching the repositories by allowing scientists to search for genes, bacteria, viruses, etc., and then, within a few short hours, providing a summary of similar sequences and other environments in which it has been found.

According to Edwards, Jetstream makes the computing side of the SRA search service easy, to the degree that he can enjoy not knowing the ins and outs of how, exactly, it works. When his users need computing resources, he notes that, with the assistance of the Jetstream team, “it just gets done.” Jetstream’s scalability also allows the project the flexibility it needs to expand its services based on user need and to use fewer resources when there’s less demand. Demand has been increasing steadily, though, since the search SRA service was released to the public in late 2018. The project began in the fall of 2017, with prototypes coming online in March of 2018. It was initially released to a small group of colleagues until Edwards tweeted about it in December of 2018, gaining hundreds of interactions almost immediately. The SRA search service, through Jetstream, makes the backend work, allowing scientists to focus on what’s important: their research questions.