To analyze relationships between perturbations, we utilize the framework of connectivity. A connectivity score between two perturbations quantifies the similarity of the cellular responses evoked by these perturbations. A score of 1 means that these two perturbations are more similar to each other than 100% of other perturbation pairs. A score of -1 means that these two perturbations are more dissimilar to each other than 100% of other perturbation pairs.

Introspect means querying your dataset against itself. Make sure to "Include Introspect" if you would like to see connections within your dataset (in addition to connections between your dataset and Touchstone-P).

In computing connectivity, biological or technical replicates can be aggregated together. Please select which metadata fields should be used to recognize replicates. For example, if you wish to distinguish between different doses of the same compound, make sure to select "pert_dose" (or something similar) as one of the metadata fields by which to group replicates. The possible metadata fields by which to group replicates only appear after you have upload your GCT and selected "Yes" for "Are there replicates in your data?".

Access a suite of analysis apps by clicking on the menu (or type command-K to open)

The first step in using the Query App to compute connections with your gene expression data is to assign a name to your query.
Results will be stored in your Analysis History after your query is submitted.

Enter an up-regulated gene of interest, hit enter, and type in subsequent genes in the set you would like to query. You may also have down-regulated genes of interest. They can be entered in the box to the right.

Hit submit and the query algorithm will find connections between your genes of interest and perturbagens in CMap that have signatures
most similar to your query. Data are generated in approximately 5 minutes and will be stored in your Analysis History.

The L1000 assay directly measures or infers the expression levels of 12,328 genes. By evaluating the current statistical model against a large compendium of RNA-Seq profiles from over 100 tissues from the GTEx consortium, we have identified a subset of 10,174 genes that are either measured or well inferred. This subset is known as the Best INferred Gene (BING) space. The Query App uses BING space to compute similarities between users' gene sets and the gene expression signatures in the CMap database. Each user entry is therefore mapped into one of the three following categories. Invalid gene: Not a valid HUGO symbol or Entrez ID, and therefore not used in the query. Valid gene: A valid HUGO symbol or Entrez ID that is also part of BING space, and therefore is used in the query. Valid but not used in query: A valid HUGO symbol or Entrez ID that is not part of BING space, and therefore is not used in the query.

Search for one of almost 9,000 small molecule and genetic perturbagens in the Touchstone dataset.

Click on a perturbagen in this table to see a CLUE Card that contains all of the information available for this perturbagen.
You can also select any compound in the table to query connections with all other compounds in Touchstone. Click on Detailed
List to view connections in a table, or click Heatmap to see connections in a matrix powered by the Morpheus App.

Impact is assessed as a transcriptional activity score, which is calculated as a mean value of median replicate correlation and median signature strength of a perturbagen across multiple cell lines and doses. The score describes a perturbagen’s transcriptional activity, relative to all other perturbagens, as derived from its replicate reproducibility and magnitude of differential gene expression.

PCTCCi =
&nbsprank( median( CCi ) )N

PCTSSi =
&nbsprank( median( SSi ) )N

TASi =
&nbspPCTCCi + PCTSSi2

where:

TASi is the transcriptional impact score for the i-th perturbagen

PCTCCi is the percentile, relative to all other perturbagens, of the i-th perturbagen’s median replicate correlation coefficient (CC) across all of its signatures

PCTSSi is the percentile, relative to all other perturbagens, of the i-th perturbagen’s signature strength (SS) across all of its signatures

N is the total number of perturbagens

Signature diversity

Thick black bars signify Transcriptional Activity Scores greater than or equal to 0.5; thinner black bars denote scores less than 0.5. Absence of a bar means no data available. Colored lines (chords) signify similar connectivity scores between cell lines; red for positive connectivity scores of 80-100 (pale to intense color according to the score); blue for negative connectivity. Chords are only shown when TAS scores are > 0.5; thus absence of a chord either means that the perturbagen TAS score is very low, or that no data is available. Chords for individual cell lines can be isolated from the rest of the figure by hovering over the cell line name.

Baseline expression of this gene in each cell line is represented as a z-score (top numbers). Scores were calculated using robust z-score formula:

z-scorei = ( xi - median( X ) )/( MAD( X ) * 1.4826 ),

where:

xi is expression value of a given gene in i-th cell line

X = [ x1, x2 ... xn ] is a vector of expression values for a given gene across n cell lines

MAD( X ) is a median absolute deviation of X

1.4826 is a constant to rescale the score as if the standard deviation of X instead of MAD was used

Median and MAD expression values were calculated using RNA-Seq profiles from a total of 1022 cell lines, comprising data from the Cancer Cell Line Encyclopedia (CCLE; Barretina, et al.) and cell lines nominated by the CMap team. Plots show z-score values only for the core LINCS lines used by CMap in L1000 experiments. Light red or light blue regions indicate positive or negative outlier expression, respectively, of the gene relative to the other lines shown; z-score of a positive outlier in the corresponding cell line is in dark red and a negative outlier is in dark blue.

Summary class connectivity shows a boxplot that summarizes the connectivity of a class. Each data point, shown as a light gray dot, represents the median value of connectivity of one member to the other class members. (This corresponds to the median for each row, excluding the main diagonal, in the heatmap shown below.) The box is the distribution of those data points, where the box boundary represents the interquartile range, the vertical line within the box is the median, and the whiskers reflect the minimum and maximum values of the data (exclusive of extreme outliers, which may appear beyond the whiskers).

Connectivity between members of class is a standard heat map of the connectivity scores, summarized across cell lines, between members of the class, where dark red represents the highest positive scores and deep blue the highest negative scores. Individual scores are revealed to the left below the map by hovering over each cell of the map.

Class inter-cell line connectivity is a plot of the median (black line) and Q25-Q75 connectivity scores (blue area around black line) for each cell line as well as the summary scores across cell lines. In some cases perturbations have not been tested in every cell line; the absence of data is indicated by a “0” for that cell line. The example shown reveals that these estrogen agonists show the strongest connectivity to each other in MCF7, a human breast cancer cell line that expresses the estrogen receptor.

Profile status

Colored portion of top bar indicates the Broad assays in which this compound has been profiled.

L1000 cell/dose coverage

For compounds profiled by L1000, cell lines and dose range for which signatures are available are indicated by dark gray bars (lighter gray bar indicates no data is available for that cell line/dose combination). A bar displayed one row above the 10 uM row indicates that doses higher than 10uM were tested. The 6 rows correspond to 6 canonical doses: 20 nM, 100 nM, 500 nM, 1 uM, 2.5 uM, and 10 uM. (In some cases non-canonical doses were tested; these are rounded to the nearest canonical dose for the purpose of this display. For example, if the dose tested was 3.33uM, the 2.5uM bar is shown in dark gray here.)

The following discloses our information gathering and dissemination practices for the CLUE website (https://clue.io):

Information gathering

We may use your IP address to help diagnose problems with our server and to administer our website by identifying (1) which parts of our site are most heavily used, and (2) which portion of our audience comes from within the Broad Institute network. We do not link IP addresses to anything personally identifiable. This means that user sessions will be tracked, but the users will remain anonymous.

Use of information

CLUE staff uses the information gathered above to tailor site content to user needs, and to generate aggregate statistical reports. At no time do we disclose site usage by individual IP addresses. Web server logs are retained on a temporary basis and then deleted completely from our systems.

Security

This site has security measures in place to protect the loss, misuse and alteration of the information under our control. CLUE however, is not liable for the loss, misuse or alteration of information on this site by any third party.

Effective date

The effective date of this policy is Jan 31, 2015.

The CLUE website is intended to provide gene expression data and analysis tools for use in research. This site is not an attempt to provide specific medical advice, and should not be used to make a diagnosis or to replace or overrule a qualified health care provider's judgment. Users should consult with a qualified healthcare professional for answers to personal questions.

You assume full responsibility for using the information on this site, and you understand and agree that the Broad Institute is not responsible or liable for any claim, loss, or damage resulting from its use by you or any user. While we try to keep the information on the site as accurate as possible, we disclaim any warranty concerning its accuracy, timeliness, and completeness, and any other warranty, express or implied, including warranties of merchantability or fitness for a particular purpose. The Broad Institute also does not warrant that access to the site will be error- or virus-free.

By choosing to use the CLUE web site, you acknowledge and agree to these Terms and Conditions and to our Privacy Policy. We encourage you to read them. We reserve the right to modify these terms and policies and recommend that you periodically review them, because your continued use of this site signifies your agreement with these terms.

Your access to and use of this site, and these terms and conditions, are governed by the laws of the Commonwealth of Massachusetts and applicable U.S. federal laws. You consent to the jurisdiction and venue of the state and federal courts located within Massachusetts and agree that any action related to your access to or use of this site and these terms and conditions must be brought in a state or federal court located within Massachusetts.

Nothing on this site grants any license or right to use any trademarks, logos or other names, including but not limited to those identifying CLUE, CMAP, the Broad Institute or any officer, director, employee, affiliated investigator, or agent of the Broad Institute, without express written consent of the Broad Institute or other such owner.

Access Keys, Code, and Data Files are provided on the following terms:

Access Keys, Code, and Data Files are single user and assigned to the particular named individual on the registration form.

If anyone else in your group seeks access, please have them fill out the request form and we will be glad to provide them a personalized key.

You agree to not redistribute Access Keys, Code, and Data Files

Access to these resources is restricted to use by you within your research group. Please do not redistribute them.

If you have a derivative work that is significantly different from what we provide and you would like to distribute it, please contact us with the details. Our goal is to encourage significant improvements while maintaining provenance and reproducible research standards.

Any discoveries you make in the data are yours.

We encourage you to publish results from analyses of these data. Please see "Publication Policy" below.

Access Keys, Code, and Data Files are for research use only.

Usage of Access Keys, Code, and Data Files are restricted to academic use within not-for-profit institutions.

Publication Policy

We are glad if you have found the CLUE data to be useful and would like to incorporate it into your publications. You do not need to include us as authors when you publish your CLUE analysis results. The only exception to this is a paper describing the overall contours of the LINCS dataset (i.e., the manuscript that we at the Broad are working on in collaboration with LINCS). If your paper needs a citation to our work on L1000 or LINCS, please contact us at clue@broadinstitute.org.

Examples of groups that have published their work on this basis include:

Data and Tools

The CMap dataset of cellular signatures catalogs transcriptional responses of human cells to chemical and genetic perturbation. Here you can find the 1.3M L1000 profiles and the tools for their analysis.

A total of 27,927 perturbagens have been profiled to produce 476,251 expression signatures. About half of those signatures make up the Touchstone (reference) dataset generated from testing well-annotated genetic and small-molecular perturbagens in a core panel of cell lines. The remainder make up the Discover dataset, generated from profiling uncharacterized small molecules in a variable number of cell lines.

Start exploring the data by using the text-box on this page to look up perturbagens of interest in Touchstone. To see the suite of tools, including apps to query your gene expression signatures and analyze resulting connections, click on Tools in the menu bar.