IT News & Events

News about IT at Indiana University and the world

Menu

Trinity and NCGAS partnership thrives as software popularity continues to grow

The Broad Institute’s messenger RNA (mRNA) assembler, Trinity (developed by Aviv Regev, Brian Haas and others), is one of the most popular tools for assembling short reads (~100-150 base pairs) of mRNA into the transcripts (~200-15000bp) produced by an organism.

Trinity Users 2017

The Broad Institute’s messenger RNA (mRNA) assembler, Trinity (developed by Aviv Regev, Brian Haas and others), is one of the most popular tools for assembling short reads (~100-150 base pairs) of mRNA into the transcripts (~200-15000bp) produced by an organism.  According to recent Github download stats, over 16,000 people have downloaded the program over 4.4 million times since 2015.  This number is increasing every year, with approximately 1000 downloads to unique IP addresses per month in 2017 thus far, up from ~770/month in 2016 and ~500/month in 2015.  This trend is interesting, especially in light of the heavy use of shared installations of Trinity on HPC resources, which only count as one download, despite being used by hundreds of individuals.

The Trinity software requires 1GB of RAM for every million reads of sequence, which means a single lane of Illumina sequence (a low end for many projects) requires ~180GB of RAM.  With sequencing costs continuing to decrease, the scale of mRNA sequencing projects has increased - resulting in many lanes of Illumina and TB-scale memory requirements. Therefore, Trinity requires HPC compute resources, such as IU’s Mason or through web interfaces such as the Trinity CTAT Galaxy, maintained by the National Center for Genome Analysis Support (NCGAS) at IU. Trinity CTAT Galaxy alone is being used by 665 users across 486 institutions in 51 countries (see map), averaging about 130 jobs per month—again, this high volume use only accounts for a couple of the 4.4 million downloads of the software. 

When hundreds of people are using a single instance of a software for very diverse projects, issues and complications become evident much faster than when single users are using individual local installations. NCGAS provides user support, and agglomerates user issues into direct feedback to the Broad Trinity developers.  This partnership between developers and user-facing centers like NCGAS contribute significantly to the continued success of software, as it becomes more efficient and better handles biological complexities.

Earlier in NCGAS’s partnership with Trinity, NCGAS made improvements boost speeds by a factor of 4.  Now NCGAS is working to seamlessly pass jobs to different clusters to handle TB scale memory jobs in a timely manner.