Connectivity Map: Proof of Principle

See the Broad CMap website for details.

Library of Integrated Cellular Signatures (LINCS)

An important goal of the LINCS program is the development of comprehensive signatures of cellular states that could be used by the entire research community to understand protein function, small-molecule action, physiological states and disease states. The LINCS effort should include a diversity of cellular read-outs of experimental perturbation, and we propose here that one such information-rich source of such signatures is mRNA expression. Our group anticipated the need for such an effort, and began in 2003 to develop a capability that would serve as a ‘functional look-up table’ enabling researchers to generate testable hypotheses that might otherwise be overlooked. By using genomic signatures as a common language with which to describe different cellular states, researchers could connect signatures of genetic perturbation with signatures of disease states, thereby linking disease physiology to the genome. In addition, the mechanism of action of small-molecules (drugs) might be inferred by matching pharmacologic perturbations with genetic perturbations. Similarly, genes lacking functional annotation could be placed into pathways based on their common perturbational signatures. Because of the potential for such an approach to lead to the discovery of connections between cellular states, we have referred to the project as the Connectivity Map.

The present proposal builds on this Connectivity Map experience, and a) dramatically extends the scope and scale of the effort, and b) brings the Connectivity Map into the context of a broader, NIH-wide LINCS effort. Our existing Connectivity Map database consists of ~ 7,000 genome-wide Affymetrix expression profiles representing perturbational profiles of ~ 1,100 off-patent drugs and tool compounds in 3 cell lines. Despite the modest scale of this initial database, the Connectivity Map has over 10,000 registered users world-wide. This makes it clear that the approach has been found to be highly enabling to the research community, thereby warranting the expansion of the effort.

In order to facilitate the scale-up of the Connectivity Map, we have developed a new approach to expression profiling based on a reduced representation of the human transcriptome. Namely, we have identified 1,000 transcripts from which the remainder of the transcriptome can be computationally inferred, and we measure these 1,000 ‘Landmark’ transcripts on Luminex beads. This new approach is cost-effective (< $4/sample in reagent costs) and is amenable to 384-well format (the standard experimental unit in high throughput genomic and chemical biology studies). Our proposed LINCS project will fund the generation of 600,000 perturbational profiles (both genetic and pharmacologic perturbations) as well as the analytics to support this effort.

In Aim 1, we will profile the action of 4,000 small-molecule compounds in 20 different cell types. The compounds will be selected from multiple sources, including known drugs, pathway-specific tool compounds, and compounds of interest identified in NIH-sponsored small-molecule screening efforts. In addition, nominations of compounds will be sought from the research community. The cell lines will be selected based on their lineage diversity, and will span established cancer cell lines, immortalized (but not transformed) primary cells, and both cycling and quiescent cells. Again, community input will be sought for the specification of a subset of these cell lines.

In Aim 2, we will extend the perturbations to loss-of-function genetic studies of 3,000 human genes using lent virally-delivered shRNAs in the same set of 20 cell lines used in Aim 1. In order to address potential off-target effects of shRNAs, we will profile three different shRNAs for each gene, and will document that these shRNAs effectively down-regulate the gene of interest.

In Aim 3, we will profile the effect of over-expression of the same 3,000 genes studied in Aim 2, using a new collection of Open Reading Frame (ORF) constructs. Again, these gain-of-function experiments will be performed in each of the 20 cell lines utilized in Aims 1 and 2. As with the shRNA experiments in Aim 2, the 3,000 genes will represent a blend of pathways of interest, disease loci (e.g. candidate genes implicated in GWAS studies but for which function is unknown), and nominations from the community.

At the completion of this project, we expect to have generated 600,000 new perturbational profiles coupled with web-enabled analytical tools that promise to serve a broad range of biomedical researchers.