IT News & Events

News about IT at Indiana University and the world

Menu

Supercomputing for Everyone Series: De novo assembly of transcriptomes using HPC resources workshop

The National Center for Genome Analysis Support at Indiana University seeks interested participants for this National Science Foundation-sponsored two-day workshop

The National Center for Genome Analysis Support, or NCGAS, at Indiana University is offering a National Science Foundation-sponsored, two-day workshop on high performance computing usage and de novo transcriptome assembly. It will take place April 30-May 1 on the IU Bloomington campus.

The workshop will include discussions, lectures, and hands-on tutorials to cover topics important to getting started constructing and analyzing transcriptomes—without the use of a genome. Material will cover both the availability and use of high performance computing (HPC) resources, alongside the task of assembling a new transcriptome, in order to provide a more comprehensive preparation for this and future bioinformatic tasks.

Transcriptome assembly will consist of using four separate assemblers (Trinity, SOAP de novo, Velvet Oases, and TransABySS), with multiple kmers, to be combined and curated with Evigenes. This combined assembly with multiple parameters is considered much more robust than simply using one assembler, and the NCGAS pipeline streamlines the process and allows for customization if desired. 

While material will make heavy use of XSEDE and IU machines, the material is transferable to any cluster.

Logistics:

  • Workshop cost: Free (travel/lodging assistance will be provided for a limited number of participants)
  • Applicant deadline: March 15, 2018 (or until all spots are filled)
  • When: Monday, April 30 - Tuesday, May 1, 2018, 8am-5pm each day
  • Where: Indiana University - Bloomington, IN, USA
  • Hotel: A block has been reserved at $134 per night + tax
  • Travel: Fly into Indianapolis International Airport (IND), shuttle/drive (45 minutes) to Bloomington, IN
  • Attendees: By invitation only (use the call for participation link to receive an invitation)

Agenda:
Day 1 (Monday, April 30):

  • Introduction to NCGAS Staff
  • Introduction to Clusters and Usage
  • Optimizing Jobs
  • de novo Transcriptome Analysis Pipeline Overview
  • Data management and movement tutorial
  • National HPC resource availability discussion

Day 2 (Tuesday, May 1):

  • Galaxy Tutorial
  • Common Problems in HPC work
  • Using and Troubleshooting the Pipeline
  • Differential Expression Calculation
  • Discussion of other downstream analyses

Objectives:
Participants should leave with the following knowledge:

  • familiarity with nationally available compute resources
  • an understanding of the differences, pros, and cons of VMs, Gateways, Clusters, and Clouds
  • how to run and optimize a job submission on a cluster
  • how to manage large data sets and move data between resources
  • how to run NCGAS’s de novo transcriptome assembly pipeline to produce robust transcriptomes
  • how to check quality and clean up a de novo transcriptome
  • understand some of the considerations in downstream analyses
  • know how to get help for both genomic and computational questions

Attendees will not be assembling participant data during the workshop (takes more than two days!), but the entire pipeline will be used by attendees with smaller scale demo data.

Prerequisites:
This workshop is aimed at beginners, but basic unix commands will not be covered. As such, participants must have basic Linux functionality (sign in, moving around file system, etc.), but expertise is not required. It would be helpful if participants had some exposure to using a cluster for compute jobs and an idea of what their end goals for their data are.

Participation in this two-day workshop is by invitation only—you must submit the registration form here to be invited.  Participation is limited by the facilities (up to 30 people).

   

 

About NCGAS

The mission of the National Center for Genome Analysis Support (NCGAS) is to enable the biological research community of the US to analyze, understand, and make use of the vast amount of genomic information now available. NCGAS focuses particularly on transcriptome- and genome-level assembly, phylogenetics, metagenomics/transcriptomics and community genomics.