IT News & Events

News about IT at Indiana University and the world

  • Thursday, June 20, 2019

Using high performance computing to mine the NCBI Short Read Archive (SRA)

The National Center for Genome Analysis Support (NCGAS) offers this workshop as part of ASM Microbe 2019. Early bird registration closes April 1.

Event details

  • Date & time
    Thursday, June 20, 2019
  • Location
    San Francisco, CA
    ASM Microbe 2019

Register here

About this event

The National Center for Genome Analysis Support (NCGAS) at Indiana University is offering this workshop as part of ASM Microbe 2019. Early Bird registration closes April 1, 2019. There is an option to register only for the workshop or the whole conference.

About this event
The Sequence Read Archive (SRA) hosts ~14PB of data from across the world. Mining it is useful for research and provides the ability to add more datasets at no cost. The goal of this workshop is to help researchers working with or interested in working with SRA to be able to run their bioinformatics workflows efficiently and computational resources available to them through NCGAS/XSEDE. Workshop description >>  

The workshop will discuss the following topics:
Introduction to high performance clusters (HPC). This is an important topic that is often overlooked. We will cover the basics of what is an HPC, how are they set up, how to run analysis on these clusters, and data management. These topics may seem trivial but will provide insights on how to use these resources efficiently.

Bioinformatics programs available to mine the SRA. In this section, we will start with a brief introduction to SRA and then work with three bioinformatics programs we can use to mine SRA quickly:

  • SearchSRA - used to look up other SRA datasets that likely contain the sequence/genome of interest
  • E-utilities - used to look up metadata information using SRA accession IDs
  • Sratoolkit - used to download specific datasets for analysis

R to visualize the data. We will end the workshop with visualizing the metadata and taxa/function information using ordination plots in R. The goal of this section of the workshop is not to provide ready-made R scripts, but to teach the participants how to look up R packages available, write commands from documentation and read R scripts.  

Datasets and scripts. During the workshop, we will walk through the programs with test datasets, and the commands/scripts used in the workshop will be made available through GitHub repository (coming soon).

Questions? Contact us