Weill Cornell Medical College

Applied Bioinformatics Core


Bioinformatics walk-in clinics

When: Thursdays, 1:30-3pm
Where: LC-504B conference room

Are you stuck with a bioinformatics analysis? Does your code not seem to do what it should? Do you feel overwhelmed by the myriad parameters of read aligners? Let us help you!

The clinics provide an informal setting to support faculty, staff, and students with bioinformatics analyses, and discuss how the ABC can help further research.

Our areas of expertise include:

  • transcriptomics (DNA microarrays & RNA-seq)
  • epigenomics (ChIP-seq, eRRBS, WGBS, …)
  • DNA sequencing & variant calling
  • sequence alignments & phylogeny
  • generation of publication-ready images
  • best practices for reproducible computational research

Differential gene expression analysis using RNA-seq
24 September - 2 October, 2015

The Applied Bioinformatics Core (ABC) at the Weill Cornell Medical College continues its series of workshops that promote the effective use of bioinformatics and computational methods in scientific research. These hands-on workshops focus on the practical application of contemporary bioinformatics and data analysis tools, demonstrating best practices for managing and analyzing scientific data, with a particular emphasis on the scalability needed to effectively deal with today’s large-scale datasets.

About the RNA-seq analysis workshop

Why learn about RNA-seq differential gene expression analysis? RNA-seq is a commonly used method of interrogating the transcriptome, enabling both measurement of gene expression levels and isoform quantification. There is a broad constellation of bioinformatics tools available for the analysis of the large datasets that result from an RNA-seq experiment. This workshop will review the appropriate selection and correct usage of these tools, and how these vary with the specific questions being investigated.

Why this workshop? This workshop will present current methods of mapping the reads generated by an RNA-seq experiment to the genome and assigning reads to genome features, and using the expression levels of those features to identify differentially expressed genes between conditions. At each step, we investigate some of the methods, their principles and biases. We will look at techniques to quantify our confidence in the results, and some of the pitfalls to be aware of.

At the end of this workshop, participants will have performed analysis of a realistic dataset, from data retrieval through differential gene expression, and have an appreciation of the available data sources and tools, be acutely aware of biases, and be able to use these insights to critically interpret published results.

Pre-requisites The workflows taught in this workshop will be executed at the UNIX command line, or using R/RStudio. This is not a UNIX or R course, and participants must have a working knowledge of UNIX and R/RStudio.

Specifically, participants should be comfortable with basic operations in a UNIX/Linux environment, including (1) moving around the directory structure and viewing and manipulating files (mkdir, ls, head, cat, cut) (2) running programs, redirecting standard input and output, using the pipe (|) operator in the command shell; and (3) using the grep and sed commands and crafting basic regular expressions.

Needed R skills include familiarity with all data types, ability to subset lists, matrices and data frames, use factor levels, and generate plots with base R graphics.

Instructors: Friederike Duendar, Paul Zumbo, Luce Skrabanek, Applied Bioinformatics Core, WCMC

Schedule: This workshop consists of four three-hour sessions.

  • 3:00 PM – 6:00 PM Thursday, September 24th, 2015
  • 3:00 PM – 6:00 PM Friday, September 25th, 2015
  • 3:00 PM – 6:00 PM Thursday, October 1st, 2015
  • 3:00 PM – 6:00 PM Friday, October 2nd, 2015

Course Materials: Freely available here.

Scientific Data Analysis Using R, 7-21 April, 2015

The Applied Bioinformatics Core (ABC) at the Weill Cornell Medical College hosts a series of workshops that promote the effective use of bioinformatics and computational methods in scientific research. These hands-on workshops focus on the practical application of contemporary bioinformatics and data analysis tools, demonstrating best practices for managing and analyzing scientific data, with a particular emphasis on the scalability needed to effectively deal with today’s large-scale datasets.

About the R workshop

Why learn R? R is a free software environment for statistical computing and graphics (www.r-project.org). It can effectively analyze large-scale datasets, such as those resulting from high-throughput sequencing experiments. It promotes automated and reproducible analyses of scientific data, creates a wide spectrum of publication quality figures, and has an extensive library of add-on packages to facilitate many complex statistical analyses. Because it is free and ubiquitously available (it runs on Windows, Mac, and Linux computers), your investment in learning R will pay dividends for years to come.

Why this workshop? Much of the power of the R platform derives from the fact that it is NOT a point-and-click environment, and thus has a steeper than usual learning curve. This is compounded by the reality that much of the documentation for R is geared toward those with very advanced training in statistics. This hands-on workshop will help you overcome these obstacles by providing participants with the knowledge and experience to use R for practical scientific data management and analysis applications. This workshop will also demonstrate principles and best practices for handling scientific data at scale, which is becoming ever more important in the ‘big data’ era.

Instructors: Jason Banfelder and Luce Skrabanek, Applied Bioinformatics Core, WCMC

Schedule: This workshop consists of four three-hour sessions.

  • 10:00 AM – 1:00 PM Tuesday, April 7th, 2015
  • 12:00 PM – 3:00 PM Wednesday, April 8th, 2015
  • 10:00 AM – 1:00 PM Tuesday, April 14th, 2015
  • 10:00 AM – 1:00 PM Tuesday, April 21st, 2015

All workshops will take place in the WCMC Library Computer Laboratory (D0.02). The entrance to the library is in the lobby of 1300 York Ave, at 69th Street. The Computer Laboratory is located on the basement level of the library – enter the library on the first floor, walk towards the back and go down two flights of stairs; you will then be directly facing the Computer Laboratory room.

Course Materials: Freely available here.

Registration: This workshop is available to all WCMC students, postdocs, faculty, and staff. Due to the limited class size (18 participants), registration is required.

Software Carpentry Bootcamp, 1-2 April, 2015

The Applied Bioinformatics Core and the Samuel J. Wood Medical Library invite you to attend an intensive, two-day scientific computing bootcamp.

Computing has become an essential part of scientific research, especially as automated and high throughput methods become more widely adopted. Many scientists spend an increasing amount of time repeatedly carrying out tasks that could be automated. The Software Carpentry bootcamp provides scientists with an intensive two-day workshop that will introduce the computing tools and practices that enable you to streamline the analysis of your data and to implement computational pipelines that enable reproducible analysis.

Software Carpentry's aim is to teach researchers basic computing concepts and skills so that they can get more done in less time, and with less pain. Topics to be covered in this workshop include:

  • Introduction to the Unix shell
  • Version control with git
  • Introduction to databases and SQL
  • Programming with Python

See the Software Carpentry for more information.

We gratefully acknowledge the support of the Weill Cornell Graduate School for this Software Carpentry Bootcamp.

Scientific Computing in Biomedicine, 8 September - 15 December, 2014

This is a two-semester graduate course which teaches students the fundamental skills and knowledge required for scientific computing in the biomedical sciences. Topics covered include scripting, working with large datasets, data and software management, and effective use of high-performance computing resources. Students learn the relevant theory as well as develop practical application skills using a number of contemporary tools and technologies including R for data analysis and presentation, SQL databases for structured data management, the Ruby scripting language for practical programming tasks, git for software and data revision control, Sun Grid Engine for batch job management on large clusters, and Maestro from the Schrödinger Suite for molecular modeling and visualization.

This course is intended for students who work regularly with computational tools and methods. Enrolling students should already be familiar with basic operations in a UNIX/Linux environment, including (1) editing text files using vi or emacs; (2) running programs, redirecting standard input and output, using the pipe (|) operator in the command shell; and (3) using the grep command and crafting basic regular expressions.

Instructors: Jason Banfelder, Luce Skrabanek, Paul Zumbo, Michael LeVine

Schedule: Classes are held on Mondays, from 5-7pm.

Scientific Data Analysis Using R, 10-19 March, 2014

Instructors: Luce Skrabanek and Jason Banfelder, Applied Bioinformatics Core, WCMC

Schedule: This workshop consists of four three-hour sessions.

  • 2:00 PM – 5:00 PM Monday, March 10th, 2014
  • 2:00 PM – 5:00 PM Wednesday, March 12th, 2014
  • 2:00 PM – 5:00 PM Monday, March 17th, 2014
  • 2:00 PM – 5:00 PM Wednesday, March 19th, 2014

Computational Workshop: Introduction to UNIX, 13-22 January, 2014

The Applied Bioinformatics Core (ABC) at the Weill Cornell Medical College hosts a series of workshops that promote the effective use of bioinformatics and computational methods in scientific research. These hands-on workshops focus on the practical application of contemporary bioinformatics and data analysis tools, demonstrating best practices for managing and analyzing scientific data, with a particular emphasis on the scalability needed to effectively deal with today’s large-scale datasets.

About the UNIX Workshop

Why learn UNIX? As mainstream science enters the “big data” era, researchers need tools and platforms that can effectively deal with ever larger datasets, such as those from genomic scale sequencing, 3D and 4D high-resolution imaging, and high-throughput screening methods. The UNIX operating system is engineered to work at this scale (it is no coincidence that 99% of the fastest supercomputers in the world run some flavor of UNIX). It enables collaborative data sharing, promotes the use of automated pipelined analyses by providing an easy mechanism for disparate computational tools to work together, and allows for scalable (and often massively parallel) analyses to be performed on an ad hoc basis.

About this workshop: This hands-on workshop will introduce you to the UNIX operating system’s command line interface, commonly used commands, text editors, regular expressions, command pipelines, and shell scripting for automation. It provides the knowledge and skills needed to use UNIX for practical scientific data management and analysis applications. This workshop will also demonstrate best practices for handling large-scale scientific datasets.

Instructors: Jason Banfelder and Luce Skrabanek, Applied Bioinformatics Core, WCMC

Schedule: This workshop consists of four three-hour sessions.

  • 2:00 PM – 5:00 PM Monday, January 13th, 2014
  • 2:00 PM – 5:00 PM Wednesday, January 15th, 2014
  • 2:00 PM – 5:00 PM Monday, January 20th, 2014
  • 2:00 PM – 5:00 PM Wednesday, January 22nd, 2014

All workshops will take place in the WCMC Library Computer Laboratory (D0.02). The entrance to the library is in the lobby of 1300 York Ave, at 69th Street. The Computer Laboratory is located on the basement level of the library – enter the library on the first floor, walk towards the back and go down two flights of stairs; you will then be directly facing the Computer Laboratory room.

Course Materials: Freely available here.

Registration: This workshop is available to all WCMC students, postdocs, faculty, and staff. Due to the limited class size (18 participants), registration is required. Please note that priority will be given to WCMC graduate and medical students. To request registration, please email Maureen Milici (maureen@physbio-tech.net), and indicate your position at WCMC.

Feedback: If you have taken this workshop, please fill out this survey.

Benchmarking multi-threaded samtools packages, October 2013

The most recent samtools package (0.0.19) allows the parallelization of many commonly used tools in this package, including the view and sort commands. We have benchmarked the performance of multi-threading for this tools and have estimated the most efficient trade-off between resources used and time saved. Details can be found here.

Big Data Workshop, 19 September 2013

2:00 – 5:00 PM
Weill Cornell Medical College
Department of Physiology and Biophysics Conference Room LC-504
1300 York Ave (at 69th Street), New York, NY 10065

The Applied Bioinformatics Core at Weill Cornell held a workshop to discuss big data management issues and solutions for research-driven medical institutions. Short, focused, expert presentations interspersed with Q&A and a panel discussion will make this an interactive event where we can learn from each other about how top medical research facilities in the region are successfully turning massive data growth into insights and results. Presentations will cover real world examples, problem identification, approaches considered, solutions applied and outcomes.

  • 2:00 Welcome and Introductions
    Jason Banfelder, Scientific Director, Applied Bioinformatics Core, Weill Cornell Medical College
  • 2:15 iRODS for Big Data Management in Research Driven Organizations [slides]
    Dr. Charles Schmitt, Director of Informatics and Data Sciences at the Renaissance Computing Institute (RENCI)
  • 3:15 DDN In Life Sciences and Research [slides]
    Josh Goldenhar, Technical Evangelist, Solutions Architect, HyperScale Acccounts, DataDirect Networks
  • 4:30 Speaker and Panel Discussion

Applied Bioinformatics Core

abc@med.cornell.edu