Ra re C ancer E xplorer

Rare Cancer Explore (RaCE) is a dedicated database and analytical platform focused on rare cancers. It provides innovative tools to explore and visualize complex data related to these uncommon cancer types. Designed to be user-friendly, it enables clinicians, researchers, and biologists—even those without bioinformatics expertise—to comprehensively investigate the clinical significance and biological functions of rare cancer biomarkers. Accessible online, RaCE simplifies data analysis, allowing users to gain valuable insights and contribute to advancements in rare cancer research.

Number of datasets

Number of samples

5451

Number of cancer types

Datasets

This database includes more than 50 different cancer datasets. These datasets encompass a wide range of cancer types, providing comprehensive data for research and analysis. Each dataset includes detailed information on patient information, overall statistics, and outcomes.

Guide

Select a dataset of interest by clicking one of the circle buttons on the left of the Data Overview table.
Details and summary plots are made on the bottom panels once a dataset is selected. Depending on the availability of the information, if some of the panels are showing “No xxx data available”, that means this dataset does not contain such information, and please try other ones.
The Data Overview table provides high-level information of what functionalities are included for each dataset. For example, if a dataset has “Expression” checked, the expression data is available and expression-related plots can be made on the “Expression” module.
Columns can be sorted by clicking the column header.
The search box can be used to filter the table by typing in keywords.

Data Overview

This table provides an overview of the data used for the analysis. Select a dataset to view more information about it below.

Summary

Gender Distribution

Cancer Stage

Survival Information

Kaplan-Meier Overall

Survival Time Overall

Diagnosis Age (Distribution)

Expression Data

This module provides the expression data for the selected cancer type and gene.

About this module

In this module, you are able to:

Select a cancer type and gene(s) of interest.
Visualize the expression data of the selected gene(s) in different datasets.

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Expression plot single/multiple genes

Unlike other modules, this module allows you to select multiple genes for expression data visualization. There are two modes for plotting using the same gene list:
- Single gene: This mode only takes the first gene in the list and plots the expression data of this gene in different datasets.
- Multiple genes: This mode takes the whole gene list (limits to 20 genes) and plots the expression data of these genes in different datasets.
Use the dropdown menu to add or type in genes to search for the gene(s) of interest. Use backspace to delete or left click on a selected gene and press delete key to remove it.
Switch between the single and multiple gene mode.
Single gene expression data plot: This plot shows the expression data of the selected gene in different datasets. Within each dataset, the samples are is sorted by expression from low to high.
Multiple gene expression data plot: This plot shows the expression data of the selected genes in different datasets. The genes are ordered exactly the same order as the selection in (1).

Analysis Options

Step 1: Select a cancer type

Number of cancer types

Cancer Type

Step 2: choose a gene to plot

Select or search for a gene

Expression Plots

Differentially Expressed Genes Analysis

Differentially Expressed Genes (DEG) Analysis provides a comprehensive view of gene expression changes between different groups of samples. In this module, one representative dataset is selected from each cancer type to perform DEG analysis.

About this module

In this module, you are able to:

Select a cancer type and perform filters to DEGs.
See the filtered DEG table.
Visualize the DEG volcano plot.
Visualize the DEG PCA plot.
Perform 5 different enrichment analysis on the DEGs.

Guide

DEG filter

LogFC filter: Filter DEGs by log fold change, usually set between 1 - 2, default is 1. The larger the value, the more stringent the filter.
adjusted P-value filter: Filter DEGs by adj.p, usually set to 0.05. The smaller the value, the more stringent the filter.
Every time you change the filter, click the “Show DEGs” button to update the results.

DEG table

This table shows the detailed information of the filtered DEGs.
Columns can be sorted by clicking the column header.
The search box can be used to filter the table by typing in keywords.

DEG volcano plot

This plot shows the upregulated (red), downregulated (green), and non-significant (grey) DEGs.

DEG PCA plot

This plot shows sample clustering based on the expression of DEGs.

Enrichment analysis

Use the panel to switch between other utilities and enrichment analysis.
Based on the selected enrichment analysis, the full enrichment table with p.adjust < 0.05 value terms will be shown.
Enrichment method selection and other plot options are available. Note that unlike other modules, you need to click the “Show XXX” button to update the results. All plots are responsive to plot options immediately (no need to click the button). Some options and plots are only available for certain enrichment methods.
Different enrich plots.

Differential Expressed Genes Analysis

This page provides Differential Expressed Genes (DEG)Analysis.

Step 1: Select cancer type

Step 2: Select filters

logFC (log2 based fold change)

Adjusted P-value

Step 3: Show results

Volcano Plot

PCA Plot

Enrichment results

Enrichment table

Enrichment barplot

Enrichment dotplot

GSEA Plot

Enrichment method

Number of top terms to show

Number of characters in one line for terms

Addtional options, only displayed when certain enrichment methods are selected.

Ontology Type (GO only)

Gene set to show (GSEA only)

Survival Analysis

Survival analysis provides a comprehensive view of patient outcomes over time. This module allows you to explore the survival analysis of different cancer types and genes of interest using the Kaplan-Meier (KM) or Cox proportional hazards model.

About this module

In this module, you are able to:

Select a cancer type and gene of interest.
Select the survival analysis method (KM or Cox ).
Visualize the survival forest plot.
Generate the survival analysis result table.

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Survival forest plot

Survival forest plot: This plot shows the survival analysis result of the selected cancer type and gene. The x-axis represents the hazard ratio, and the y-axis represents the the value in different datasets of the selected cancer type.
confidence interval: The 95% confidence interval of the hazard ratio.
p-value indicator: The p-value of the hazard ratio, which indicates the significance of the result, log-rank test (KM) or Wald test (Cox).
Different colors represent survival events.
- OS (overall survival)
- disease-specific survival (DSS)
- disease-free interval (DFI)
- progression-free interval (PFI)
- relapse free survival (RFS)
Survival analysis result table: This table shows the detailed information of the survival analysis result. Significant results are highlighted in green.

Analysis Options

Step 1: Select a cancer type, survival analysis method, and grouping

Cancer Type

Survival Analysis Method

Choose grouping method (KM only)

Step 2: choose a gene to plot

Select or search for a gene

Survival Forest Plot

Survival Table

Rare Cancer Cell Line Encyclopedia

The Cancer Cell Line Encyclopedia (CCLE) is a comprehensive resource that provides detailed genetic and pharmacological information on a wide array of human cancer cell lines. The project is hosted by Broad Institute. Instead of archiving data of all cancer types, RCCLE in this database focuses only on selected rare cancer types and the gene effect in different cancer/cell line.

About this module

In this module, you are able to select your gene of interest and:

Visualize its dependency scores in different cancer types.
Visualize its dependency scores in top/bottom 10 associated cell lines.

Guide

Gene effect plot

Cell line distribution: This subplot shows the number of cell lines in this score range.
Cancer score boxplot: This subplot shows the boxplot distribution of the selected gene scored in different cell lines of the cancer type. The number on the y-axis is the the number of cell lines this cancer type has.
-4. The Chronos dependency score is based on data from a cell depletion assay. A lower Chronos score indicates a higher likelihood that the gene of interest is essential in a given cell line. A score of 0 indicates a gene is not essential; correspondingly -1 is comparable to the median of all pan-essential genes.

Cell line dependency Score plot

Top 10 lines with the highest dependency score (The lower the score, the higher the dependency).
Bottom 10 lines with the lowest dependency score.
Cell line names.
The color code indicates the cancer type of the cell line.

Analysis Options

Step 1: Select a gene

Number of cancer types

Number of Cell Lines

168

Number of genes

17916

Select or search for a gene

Expression Plots

Gene Effect

Dependency Score

CCLE Data

Font size

Show points on plots (Gene Effect plot only)

Point size (Gene Effect plot only)

Bar width (Dependency plot only)

Cancer Mutation Signatures

Somatic mutations are the driving force of cancer development. In this module, the top 15 most frequently mutated genes in each cancer type (according to the TCGA PanCancer Atlas) can be explored.

Samples of the selected cancer type are divided into high and low expression groups. The mutation frequency and tumor mutation burden (TMB) are calculated for each group.

About this module

In this module, you are able to:

Select a cancer type and a gene of interest to explore the association between oncomarkers and the expression of the selected gene.
Visualize the mutation frequency, TMB, and expression of oncomarkers in high and low expression groups.

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Oncoprint plot

On the plot, each little tile represents a sample. The red tile indicates the presence of a mutation in the oncomarker.

First y axis: the overall percentage of samples with mutations in this oncomarker in both groups.
Second y axis: oncomarker names. The star * indicates the significance of the mutation frequency between high and low expression groups (Chi-square).
Mutation frequency subplot: this subplot shows the mutation percentage of the oncomarker in high and low expression groups.

TMB plot

TMB difference is tested with the Wilcox test.

Expression plot

This plot use the grouping information from the selected gene, plot the top 15 most frequently mutated genes in the selected cancer type respectively.
The expression difference is tested with the Wilcox test.

Analysis Options

Step 1: Select a cancer type

Number of cancer types

Cancer Type

Step 2: choose a gene to plot

Select or search for a gene

Mutation Plots

Plots
Data Tables

Oncoprint

TMB

Gene Expression Panel

Mutation Data

TMB Data

Frequency Data

Font size

Point size (TMB plot only)

Log scale on expression

Low expression color

High expression color

Show outlier points on plots (TMB plot only)

Tumor Microenvironment Immune Infiltration

This module calculates the immune infiltration score of immune cells in the tumor microenvironment based on the selected cancer type and gene of interest.

About this module

In this module, you are able to:

Select a cancer type and gene of interest.
Generate the immune infiltration score of immune cells using different algorithms.
- CIBERSORT
- xCell
- MCPcounter
- TIMER
- quanTIseq
- EPIC

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Immune Infiltration Heatmap/Table

Heatmap: This heatmap shows the immune infiltration score of immune cells in the tumor microenvironment. The color scale represents the correlation between the immune cell and the gene of interest.
The color scale is from -1 to 1, where -1 represents a negative correlation, 0 represents no correlation, and 1 represents a positive correlation.
The immune cells grouped by major immune cell types, such as T cells, B cells. Subtypes are showing on each row.
Significant correlations are marked with star(s) *.
Table: This table shows the similar information but in numerical format.
The color scale is the same as the heatmap.

Analysis Options

Step 1: Select a cancer type

Number of cancer types

Cancer Type

Step 2: choose a gene to plot

Select or search for a gene

Immune Infiltration Heatmap

Immune Infiltration Correlation Table

Methylation

About methylation

DNA methylation is a key epigenetic modification that regulates gene expression by altering chromatin structure, playing a critical role in cancer initiation, progression, and prognosis. Aberrant DNA methylation patterns are frequently observed in rare cancers, making them valuable biomarkers for understanding disease mechanisms and developing targeted therapies.

Basic methods

This module integrates DNA methylation data from the UCSC Xena platform. Data were downloaded using the UCSC XenaTools package and focused on 9 rare cancer types from two major sources: the TCGA and TARGET. The methylation profiles were generated using the Illumina Human Methylation 450K BeadChip. Probe annotation was performed using the HM450.hg38.manifest.gencode.v36.probeMap file from UCSC Xena.

About this module

In this module, you are able to:

Visualizing survival analysis results (Cox proportional hazards and Kaplan-Meier) of methylation sites associated with the gene.
Exploring Spearman correlation results between gene expression and methylation levels of its associated probes.

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Survival Analysis Cutoff and plot settings

There are two major cutoffs in this module:

Cox analysis: Non-infinite HR, non-zero HR, significant p-values (p < 0.05), and valid CI ranges (non-zero lower CI, non-infinite upper CI).
KM analysis: Significant log-rank test results for both median-cut and optimal-cut grouping (p < 0.05), and biologically meaningful CI ranges (upper CI < 10, lower CI ≥ 0.1).

Plots

Survival forest plot

Survival forest plot: This plot shows the methylation survival analysis result of the selected cancer type and gene. The x-axis represents the hazard ratio, and the y-axis represents the the value in different CpG sites of the selection gene.
confidence interval: The 95% confidence interval of the hazard ratio.
p-value indicator: The p-value of the hazard ratio, which indicates the significance of the result, log-rank test (KM) or Wald test (Cox).
Different colors represent survival events.
- OS (overall survival)
- disease-specific survival (DSS)
- disease-free interval (DFI)
- progression-free interval (PFI)
- relapse free survival (RFS)
If the HR is greater than 10, for asthetic reasons, the long tail of the HR is cut off at 10, and is displayed as an arrow pointing to the right.
If the lower CI is still more than 10, the no line is drawn but a dot is placed close to 10 and an arrow is drawn to the right. *There is an option in the plot control panel on the left to filter out these large HRs.
Survival analysis result table: This table shows the detailed information of the survival analysis result. Significant results are highlighted in green.

Correlation plot

Correlation plot: This plot shows the correlation between the gene expression and methylation levels of the selected gene’s associated probes. The x-axis represents correlation value, and the y-axis represents the methylation sites.
The correlation coefficient
The p-value of the correlation coefficient, displayed as the size of the dots.

Analysis Options

Step 1: Select a cancer type, survival analysis method, and grouping

Cancer Type

Survival Analysis Method

Choose grouping method (KM only)

Step 2: choose a gene to plot

Select or search for a gene

Methylation-Expression Correlation Plots

Plots
Data Tables

Methylation Survival Forest Plot

Methylation-Expression Correlation

Methylation Survival Table

Correlation Data

Font size

Remove large HR (forest plot)

Dot color for p-values (correlation plot)

Cancer Immunotherapy Survival Analysis

Cancer immunotherapy survival analysis provides a comprehensive view of patient outcomes over time of different cancer treatment cohorts based on the selected cancer type and gene of interest.

About this module

In this module, you are able to:

Select a cancer cohort and gene of interest.
Select the survival analysis method (KM or Cox ).
Visualize the survival forest plot.
Generate the survival analysis result table.

About the data

The data used in this module are from different cancer cohorts. See following for details:

Cohort	Primary	Citation
Mariathasan	Bladder	Mariathasan S, Turley SJ, Nickles D, et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature. 2018 Feb 22; 554(7693):544-548. doi: 10.1038/nature25501. PMID: 29443960; PMCID: PMC6028240.
Braun	Kidney	Braun DA, Hou Y, Bakouny Z, et al. Interplay of somatic alterations and immune infiltration modulates response to PD-1 blockade in advanced clear cell renal cell carcinoma. Nature Medicine. 2020 Jun; 26(6):909-918. doi: 10.1038/s41591-020-0839-y. PMID: 32472114; PMCID: PMC7499153.
Jung	Lung	Jung H, Kim HS, Kim JY, et al. DNA methylation loss promotes immune evasion of tumours with high mutation and copy number load. Nature Communications. 2019 Sep 19; 10(1):4278. doi: 10.1038/s41467-019-12159-9. PMID: 31537801; PMCID: PMC6753140.
Liu	Melanoma	Liu D, Schilling B, Liu D, et al. Integrative molecular and clinical modeling of clinical outcomes to PD1 blockade in patients with metastatic melanoma. Nature Medicine. 2019 Dec; 25(12):1916-1927. doi: 10.1038/s41591-019-0654-5. PMID: 31792460; PMCID: PMC6898788.
Padron	Pancreas	Padrón LJ, Maurer DM, O’Hara MH, et al. Sotigalimab and/or nivolumab with chemotherapy in first-line metastatic pancreatic cancer: clinical and immunologic analyses from the randomized phase 2 PRINCE trial. Nature Medicine. 2022 Jun; 28(6):1167-1177. doi: 10.1038/s41591-022-01829-9. PMID: 35662283; PMCID: PMC9205784.
Snyder	Ureteral	Snyder A, Nathanson T, Funt SA, et al. Contribution of systemic and somatic factors to clinical response and resistance to PD-L1 blockade in urothelial cancer: An exploratory multi-omic analysis. PLoS Medicine. 2017 May 26; 14(5). doi: 10.1371/journal.pmed.1002309. PMID: 28552987; PMCID: PMC5446110.

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Survival forest plot

Survival forest plot: This plot shows the survival analysis result of the selected cohort and gene. The x-axis represents the hazard ratio, and the y-axis represents the the value in different datasets of the selected cancer type.
confidence interval: The 95% confidence interval of the hazard ratio.
p-value indicator: The p-value of the hazard ratio, which indicates the significance of the result, log-rank test (KM) or Wald test (Cox).
Different colors represent survival events.
- OS (overall survival)
- disease-specific survival (DSS)
- disease-free interval (DFI)
- progression-free interval (PFI)
- relapse free survival (RFS)
Survival analysis result table: This table shows the detailed information of the survival analysis result. Significant results are highlighted in green.

Analysis Options

Step 1: Select a cancer type, immunotherapy analysis method, and grouping

Cohort or Primary Cancer Type

immunotherapy Analysis Method

Choose grouping method (KM only)

Step 2: choose a gene to plot

Select or search for a gene

immunotherapy Forest Plot

immunotherapy Table

Drug Response Prediction

Databases information

GDSC: The Genomics of Drug Sensitivity in Cancer (GDSC) project is a database focused on cancer cell drug sensitivity and molecular markers of drug response. GDSC provides a unique resource that combines drug sensitivity and genomic datasets to aid in the discovery of new therapeutic biomarkers for cancer treatment. The cancer genomic mutation information in the database includes point mutations, gene amplifications and deletions, tissue types, and expression profiles, among others. The database has characterized 1,000 human cancer cell lines and screened them with over 100 compounds. https://www.cancerrxgene.org/
CTRP: The Cancer Therapeutics Response Portal (CTRP) links genetic, lineage, and other cellular features of cancer cell lines to small-molecule sensitivity with the goal of accelerating discovery of patient-matched cancer therapeutics. The CTRP is a living resource for the biomedical research community that can be mined to develop insights into small-molecule mechanisms of action and novel therapeutic hypotheses, and to support future discovery of drugs matched to patients based on predictive biomarkers. https://portals.broadinstitute.org/ctrp/
PRISM: Developed by the Broad Institute of MIT and Harvard, Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) is a novel DNA barcoding technology that allows for rapid, viability screening of more than 900 human cancer cell-line models in mixtures. These 900 cell lines represent more than 45 major lineages of cancer. https://www.theprismlab.org/

Basic Methods

Test Set: A total of 13 cancer types were used, with one representative dataset selected for each cancer type as the test set.
Training Set: GDSC2、CTRP2、PRISM
Analysis Method and Screening Criteria: Ridge regression analysis was performed using the Oncopredict package. The expression levels of target genes for each cancer type were correlated with the predicted scores for each drug using Spearman correlation analysis. Correlation values less than -0.4 were selected, and the top 10 drugs were chosen based on the correlation coefficients from low to high.
Prediction Values: GDSC2 and CTRP2 provide IC50 values, while PRISM provides AUC values. Lower IC50 and AUC values indicate stronger targeting effects of the drug on the target gene.
Significance of IC50 and AUC Values:

IC50 (Half Maximal Inhibitory Concentration): IC50 refers to the concentration of a compound or drug required to inhibit a biological process or activity by 50% under certain conditions. It is commonly used to assess the biological activity of drugs. A lower IC50 value indicates a more potent drug, as it can inhibit the target biological molecule at lower concentrations.

AUC (Area Under Concentration-Time Curve): AUC represents the drug’s bioavailability, which is the degree and rate at which the active pharmaceutical ingredient from the formulation is absorbed into the systemic circulation. A higher AUC indicates higher bioavailability, while a lower AUC suggests lower bioavailability. AUC from 0 to ∞ refers to the total area under the concentration-time curve from time zero until all the parent drug is eliminated, reflecting the total amount of the drug entering the bloodstream. In general, a lower AUC value indicates increased sensitivity of the cells to treatment.

About this module

In this module, you are able to:

Select a cancer and gene of interest.
Visualize the correlation between the selected gene and the drug response prediction.
Visualize the drug response prediction for low and high gene expression groups.

Guide

Please go through the generic guide on Tutorial page for cancer type and gene selection.

Cutoff and plot settings

There are two major cutoffs in this module:

Gene - drug response correlation cutoff. This cutoff is used to filter out drugs that have a correlation coefficient lower than the cutoff value. Usually a negative correlation coefficient indicates the drug negatively contributes to the gene expression, and potentially inhibits the gene expression.
Top N drugs filter. Most times even if we set a correlation cutoff, there are still too many drugs left. This filter is used to select the top N drugs ranked by lowest correlation to the highest correlation.
Other settings, including font size, color, etc.

Plots

Correlation Analysis

This plot shows the correlation between the selected gene and the drug response score (GDSC or CTRP), or AUC value (PRISM).

Top N drugs by the two cutoffs are displayed on both plots.
The correlation coefficient
The p-value of the correlation coefficient, displayed as the size of the dots.

Drug Response Prediction

This plot shows the drug response prediction for the selected gene in the low and high gene expression groups. The low-high groups are divided by the median expression of the selected gene.

The drug response IC50 (GDSC or CTRP), or AUC value (PRISM) is displayed.
The boxplot shows the distribution of IC50 or AUC values for the low and high gene expression groups.
Wilcoxon test p-value of the IC50 or AUC values between the low and high gene expression groups of each drug.

Analysis Options

Step 1: Select a cancer type

Number of cancer types

Cancer Type

Step 2: choose a gene to plot

Select or search for a gene

Drug prediction plots

Plots
Data Tables

GDSC2

CTRP2

PRISM

Correlation Data

Drug Response Data

Cutoff for correlation

Display top N drugs

Font size

Dot color for p-values

Low expression color

High expression color

Get Started

This page provides a comprehensive guide on how to use the database.

The major functionalities of the database can be navigated through the top navigation bar.
It is commended to start with the Data module to explore the available datasets, and see what modules are supported for each dataset.

Modules

To use a module, it is recommended to read the module specific guide first. This can be accessed by clicking on the icon on the module page.

Cancer Type and Gene Selection

The cancer type and gene selection is a common feature in most modules.

Select a cancer type of interest.
Some additional selections may be available depending on the module.
Cancer statistics: please wait for a few seconds for data and statistics to load. A panel of statistics will be shown. Different modules may have different statistics.
Select a gene of interest. You must make cancer type selection first. The available genes vary depending on the selected cancer type.
Most modules allow only one gene selection. Some modules may allow multiple gene selections. Please hover over to the icon for more information.
By default, a gene is selected. You can change the gene selection by:
- Clicking on the selected gene and use delete key to remove it.
- Open the dropdown menu hit the backspace key to remove the selected gene.
To add a gene:
- Open the dropdown menu scroll down to find the gene of interest. Note The dropdown menu only shows the first a few genes.
- A better way is to type in the gene in the search box to find the gene of interest. Most modules support gene search by typing in the gene name and ENSEMBL ID.
Click this button to generate the results. After changing any option/selection above, you must click this button to update the results.

Plot Options

Many modules provide plot options to customize the plot.

The plot options are located on the left side.
After changing any plot options, you must click this button to apply the changes.

Data and download

Most modules provide at least one data table that contains the data used to generate the plots.

If there not many plots and only a single data table, the data table will be shown at the bottom of the page.
If there are multiple plots or multiple data tables to show, the data table will be shown in a separate tab. You can switch between the plot and data table by clicking on the tabs.
There is a CSV button on the bottom left corner of each data table. Clicking on this button will download the data table as a CSV file.

For bulk download, we have provided a separate Download module to download massive data files. It can be accessed from the top navigation bar.

Full Screen

All plots can be viewed in full screen by hovering over the plot and a full screen icon will appear on the bottom right corner of the plot. Click on the icon to view the plot in full screen. Please wait for a few seconds for the plot to rerender and adjust to the full screen.

Other

Dark Mode

There is a dark mode available. You can switch between light and dark mode by clicking the or icon on the top right corner.

Download core data

In this database, one can download the core data from different datasets. This includes:

Expression data
Patient data

Exceptions

However, there are some exceptions where the data is not available for download:

Some datasets may not have expression or patient data available.
Some datasets have access restrictions, meaning that you may need to request the original platform or contact the data provider for access.

Data Overview

Select a dataset you want and click the download button below the table.