IT News & Events

News about IT at Indiana University and the world


Supercomputing for Everyone Series: De novo assembly of transcriptomes using HPC resources workshop

The National Center for Genome Analysis Support at Indiana University seeks interested participants for this National Science Foundation-sponsored two-day workshop

The National Center for Genome Analysis Support, or NCGAS, at Indiana University is offering a National Science Foundation-sponsored, two-day workshop on high performance computing usage and de novo transcriptome assembly. It will take place October 1-2 on the IU Bloomington campus.

The workshop will include discussions, lectures, and hands-on tutorials to cover topics important to getting started constructing and analyzing transcriptomes—without the use of a genome. Material will cover both the availability and use of high performance computing (HPC) resources, alongside the task of assembling a new transcriptome, in order to provide a more comprehensive preparation for this and future bioinformatic tasks.

Transcriptome assembly will consist of using four separate assemblers (Trinity, SOAP de novo, Velvet Oases, and TransABySS), with multiple kmers, to be combined and curated with Evigenes. This combined assembly with multiple parameters is considered much more robust than simply using one assembler, and the NCGAS pipeline streamlines the process and allows for customization if desired. 

While material will make heavy use of XSEDE and IU machines, the material is transferable to any cluster.


  • Workshop cost: Free (travel/lodging assistance will be provided for a limited number of participants)
  • Applicant deadline: Until all spots are filled
  • When: Monday, October 1 - Tuesday, October 2, 2018, 8am-5pm each day
  • Where: Indiana University - Bloomington, IN, USA
  • Hotel: A block has been reserved at $134 per night + tax
  • Travel: Fly into Indianapolis International Airport (IND), shuttle/drive (45 minutes) to Bloomington, IN
  • Attendees: By invitation only (use the call for participation link to receive an invitation)

Day 1 (Monday, October 1):

  • Introduction to NCGAS Staff
  • Introduction to Clusters and Usage
  • Optimizing Jobs
  • de novo Transcriptome Analysis Pipeline Overview
  • Data management and movement tutorial
  • National HPC resource availability discussion

Day 2 (Tuesday, October 2):

  • Galaxy Tutorial
  • Common Problems in HPC work
  • Using and Troubleshooting the Pipeline
  • Differential Expression Calculation
  • Discussion of other downstream analyses

Participants should leave with the following knowledge:

  • familiarity with nationally available compute resources
  • an understanding of the differences, pros, and cons of VMs, Gateways, Clusters, and Clouds
  • how to run and optimize a job submission on a cluster
  • how to manage large data sets and move data between resources
  • how to run NCGAS’s de novo transcriptome assembly pipeline to produce robust transcriptomes
  • how to check quality and clean up a de novo transcriptome
  • understand some of the considerations in downstream analyses
  • know how to get help for both genomic and computational questions

Attendees will not be assembling participant data during the workshop (takes more than two days!), but the entire pipeline will be used by attendees with smaller scale demo data.

This workshop is aimed at beginners, but basic Unix commands will not be covered. As such, participants must have basic Linux functionality (sign in, moving around file system, etc.), but expertise is not required. It would be helpful if participants had some exposure to using a cluster for compute jobs and an idea of what their end goals for their data are.

Participation in this two-day workshop is by invitation only—you must submit the registration form here to be invited.  Participation is limited by the facilities (up to 30 people).