BreastScape
Collected and Pre-Processed Breast Cancer scRNA-seq Datasets
Methods
Abstract
Breast cancer is a molecularly heterogeneous disease, with prognosis and treatment shaped by subtype and disease stage. While single-cell RNA sequencing (scRNA-seq) has provided critical insights into the tumor microenvironment (TME), a unified view across breast cancer subtypes and progression states has been lacking.
Here, we present an integrated meta-analysis of 31 scRNA-seq studies comprising over 1.2 million cells from 376 breast cancer samples. We systematically profiled immune, stromal, and tumor-intrinsic features across Luminal A (LA), Luminal B (LB), HER2-enriched, and triple-negative breast cancer (TNBC) subtypes.
Our analysis identified distinct subtype- and stage-specific TME states. Specifically, TNBC tumors exhibited an inflamed microenvironment enriched for exhausted T cells, with PD-1 and LAG3 expressed predominantly in primary tumors and CTLA4 in metastases. EMT-associated transcriptional programs in TNBC were enriched in metastatic lesions and correlated with poor survival and elevated TROP2 expression, suggesting potential sensitivity to TROP2-targeted therapies. In contrast, LA tumors transitioned from immunologically “cold” primary lesions, characterized by ER-positive tumor cells, immunosuppressive macrophages, and FAP⁺ fibroblasts, to more inflamed metastatic states marked by ER loss, MYC activation, and enhanced immune infiltration. ER-positive and ER-negative LA tumor states showed distinct transcriptional, genomic, and immunologic features, with prognostic and predictive signatures validated in bulk cohorts. Ligand–receptor analyses revealed stromal–myeloid interactions that may reinforce immune exclusion in ER-positive disease.
Altogether, our study provides a high-resolution atlas of the breast cancer ecosystem, revealing clinically relevant programs and therapeutic vulnerabilities that evolve across tumor subtype and disease stage. These findings offer a framework for precision immunotherapy and biomarker-driven intervention.
Description of the datasets found in the website
This repository contains paired breast scRNA datasets most of them are from invasive breast cancer patients some from normal breast or DCIS most samples are treatment naïve, some were treated with chemotherapy, radiotherapy, immunotherapy, hormonal therapy or their combination.
These datasets contain only single cells that passed our QC.
Each dataset contains the raw count scRNAseq matrix as an h5ad object (except for GSE118390 , GSE190772 and GSE169246 which provided log₂-transformed normalized expression values), including the metadata with breast cancer subtype and origin of the biopsy .The cells included in each file are the cells that passed our own quality control (QC) as described in our manuscript. The genes included are the 10,273 genes common to all datasets.
In order to analyze the full list of genes for each dataset please refer the original study of each dataset.
Technical & study-specific information
The datasets are ready for further integration according to your own pipeline, or according to our pipeline as described in our manuscript.
Citation
If you use the datasets from this website in your research, please consider citing our paper together with the original study of each dataset.
The figures used under the "Home" and "Methods" tabs were created with BioRender.com using a paid license.