IT News & Events

News about IT at Indiana University and the world

Menu
  • Monday, November 4, 2019

Supercomputing for Everyone Series: Intro to R for Biologists Online

This is a six-part workshop that will cover the basics of R, with instruction via Zoom and the online Expand course materials.

Event details

  • Date & time
    Monday, November 4, 2019
    1:30pm-3pm
  • Location
    Day 1 of 6
    Online via Zoom
    with a live instructor

Register here

About this event

S4ES-banner.png

This is a six-part online workshop. Material will be available via IU Expand, with a lecture video, written text, and interactive questions. We will hold a live “recitation” for the course, where instructors lead a discussion on the material and provide extra content to further elaborate on the concepts.

The goal of this course is to help participants get started in R, so they’ll be able to read and write code, and figure out where to get help when needed.

This workshop covers three major concepts in R:

  • The general syntax of the language, the basic data types, and how to manipulate them (days 1 and 2)
  • Introduction to the two different plotting paradigms in R, and visualizing GIS and ordination as examples of plotting different data (days 3 and 4)
  • How to read and write functions in R (days 5 and 6)

The course does not focus on any particular analysis, but uses DNA sequences as a case study to apply the material covered. We will also cover how to use Jetstream (the research cloud) to power analyses in RStudio. Use of personal installations on laptops is fine for the workshop; however, we will not troubleshoot individual installations during class.


Objectives:
By the end of this workshop, users will be able to:

  • Navigate and use RStudio (on and off Jetstream)—load files, export graphs, etc.
  • Understand how to install, load, and use new libraries.
  • Become familiar with the Bioconductor Project.
  • Understand basic data types, functions, objects, and classes in R.
  • Write and use a custom function.

Prerequisites:

  • Unix familiarity is a plus, but not required.
  • A laptop is required—if you do not have one, contact the organizer to borrow one.
  • Materials/activities are expected to be COMPLETED before the online meeting—day 1 is to be done BEFORE we meet on November 4, etc.

Agenda:

  • Day 1: Introduction
    The goal of this section is to get you acquainted with R, both the environment and the language. We’ll discuss data types, manipulation, the structure of commands, how to get help and more information, how to load packages, and how to use the environment. The hope is that you will use R more intuitively. We will discuss some common errors and troubleshooting during the recitation meeting.

    This section does not focus on any individual analysis or demonstration, rather it focuses on reading and making sense of the language. This is very helpful for new users or anyone currently copying, pasting, and hoping the command will work.

    Requirements: There are no requirements for this section. Basic Unix skills (how variables work, cat, pwd, etc.) are helpful, but we will not be using command line, but will be referencing them throughout.

  • Day 2: Introduction lab
    A guided activity to practice your skills from day 1. This will give you practice using R and working with sequence data/vectors with a bit more independence. We will answer questions and help troubleshoot the activity during the recitation meeting.

  • Day 3: Introduction to visualization
    We will build on the basic data types and syntax of R to explore visualization of geological data. The two main families of plotting will be introduced (plot style and ggplot style), with examples of how to plot various types of data on geographical maps. This is a useful skill for ecologists and geneticists alike. During the online recitation meeting, we will further discuss options in graphing, troubleshoot setting up Google maps, and share some helpful tutorials/cheat sheets for the plotting language in R.

    Requirements: This is a lab based on the material covered in day 1—familiarity with that material will be useful. Day 1 material will be available online.

  • Day 4: Introduction to visualization lab
    This activity will extend the same plotting syntax types to a different kind of data—plotting ordination (PCA, PCoA, and nMDS plots) for use in exploring various data you may have. Microbiome, ecological, or population genetics are common examples. We will discuss ordination, when to use different types, and some of the finer points in choosing packages during the recitation meeting.

  • Day 5: Making your own scripts and functions
    The goal of this section is to get a bit more in depth on how to read, understand, and troubleshoot R code by introducing classes and functions. Classes and functions are a large part of R, and therefore a large part of understanding the syntax and function of the language. We will walk through creating your own function for summarizing tables of data (both ecological and genetic data sets are available for use). We will discuss more tips for designing and writing code in R during the recitation.

    Requirements: This material assumes basic usage of R covered in the previous two days, or a moderate familiarity with R basics.

  • Day 6: Making your own scripts and functions lab
    This activity builds on day 2's lab, where you will create a function to graph a sliding window plot for GC content. This activity is meant to practice building functions, but this particular example can easily be applied to visualize the variation across any continuous data, such as ecological measure through time, population variation over a genome, etc. We will help answer questions and troubleshoot this activity during the online recitation.

 The Supercomputing for Everyone Series (S4ES) workshops are taught by personnel from Research Technologies, a division of University Information Technology Services. This workshop is taught in conjunction with the National Center for Genome Analysis Support. Both are centers in the Pervasive Technology Institute at Indiana University.

The Supercomputing for Everyone Series aims to bring more users into the realm of advanced computing, whether it be visualization, computation, analytics, storage, or any related discipline. Let the Research Technologies staff take you to the next level of computing.


View all the workshops in this series by visiting http://go.iu.edu/24xc .


ncgas_logo.png 

 

 

 

 

RT-Education Outreach and Training