Weill Cornell Medical College

Applied Bioinformatics Core

Services offered

Data Lifecycle Management

This service includes full data lifecycle management for next-generation sequencing (NGS) applications. It is appropriate for core facilities, centers, and larger laboratories that wish to standardize their data handling and analysis workflows, where the application of consistent workflows and analyses is required, data provenance is of paramount concern, and where access to specialized computational and data storage resources are required. The scope of these activities covers the full NGS data lifecycle, including:

  • raw data capture from your NGS instruments such as Illumina sequencers or from intra- or extramural sequencing facilities such as the WCMC Genomics Resources Core or the NY Genome Center;
  • consistent and timely analysis results via automated execution of NGS analysis pipelines that have been developed specifically for your needs (see custom software development) on the ABC’s supercomputer resources;
  • rigorous provenance tracking of all data, so you will know where all raw data and results are, and exactly how they were received or produced;
  • secure data dissemination and visualization via hosting of an integrated portal tailored to your needs (see custom software development);
  • backup and archiving of all data.

This service seeks to develop an automated and highly optimized computing environment that is customized to your particular needs, and often involves significant custom software development and systems integration services (e.g., to your LIMS system).

To facilitate predictable costs for planned experiments and rechargeable cost recovery for core facilities, fees for this service are per gigabase of raw data captured, and vary with the data retention period. Fees for the creation and configuration of customized analysis pipelines and portals are covered separately on a fixed-cost basis (see custom software development). [See pricing]

Computing Infrastructure

For laboratories with bioinformatics expertise of their own, the ABC offers raw access to its shared supercomputing and high-performance data storage infrastructure. This service is appropriate for individuals and laboratories that are comfortable performing their own analyses and data management in a Linux command line environment with minimal technical support. The infrastructure emphasizes data processing on a compute cluster using a batch scheduler. While the ABC’s computational infrastructure was designed and built with data-intensive NGS applications in mind, the available resources are often appropriate for other computationally demanding scientific applications.

As our infrastructure is geared for data-intensive applications, the fees for this service are primarily based on the amount of storage allocated to your project. Each terabyte of leased storage includes one “share” of priority when accessing our cluster computing environment. You may lease any combination of backed-up and scratch space in 1TB increments (a minimum of 1TB of backed-up space is required). For projects that are more processing intensive, additional shares can be leased separately. [See pricing]

The ABC can also host specialized computational hardware and software resources that are integrated into its high-performance computing environment, and entirely dedicated to your lab or facility. For many computationally focused laboratories, use of a mixture of dedicated and shared resources is appropriate.

Custom Software Development and Systems Integration

The ABC can be engaged to develop, install, and maintain customized software and computing hardware that build on, and are tightly integrated with, its own resources, resulting in a scalable, tailored, and cost-effective computing environments for data intensive science. Typical activities include the selection, installation and maintenance of specialized, dedicated hardware; installation of software packages tuned to run in our environment; and the development of data analysis pipelines and secure portals for tailored data visualization and dissemination.

To mitigate the risks often associated with the development of software and computing infrastructure projects, these projects are formally scoped, and then executed by the ABC at a fixed cost to you. [See pricing]

Education and Training

As “big data” and computationally intensive applications become more pervasive in the basic and translational sciences, some laboratories, core facilities, and centers may find it beneficial to further develop their own capabilities in these specialized areas, and to forge closer technological ties to the ABC. To accommodate these goals, the ABC offers “peer residencies”. Peer residents are mutually selected members of your staff that work on your projects, but are embedded in the ABC; they are provided with office space within the ABC, have direct access to ABC staff, and participate in the ABC’s scholarly and technical activities, including seminars and group workshops that focus on computational methods and techniques. [See pricing]

Consulting

While the ABC’s service offerings focus on data intensive and large-scale computing projects that are formally defined, ABC staff can be engaged on an hourly basis for short-term or open-ended projects where the preparation of a formal scope may not be practical or desirable. This means of engagement allows us to respond agilely to immediate needs, as well as to engage in exploratory projects. [See pricing]

Applied Bioinformatics Core

abc@med.cornell.edu