12 Public Single-Cell Databases, Compared: Which One Should You Use?
- Jun 11
- 7 min read
If you've ever spent an afternoon jumping between single-cell databases like GEO, SRA, CellxGene, and half a dozen other websites trying to find the right single-cell dataset, you're not alone.
The single-cell ecosystem has grown rapidly over the past few years, bringing with it an impressive collection of databases and data portals. But while having more data is great, finding relevant and usable data is often more challenging than it sounds!
A researcher looking for a particular disease single-cell dataset may encounter dozens of databases, each offering different levels of data processing, annotation quality, and accessibility. Some primarily store raw sequencing files and metadata, while others provide harmonized cell-type annotations, interactive visualization tools, or curated collections focused on specific tissues, diseases, or therapeutic areas.
To make things easier, we've put together a practical comparison of 12 public single-cell databases, covering what they offer, where they excel, and the limitations you should be aware of before diving in.

What Kinds of Single-Cell Databases Are Out There?
There are many ways to categorize single-cell databases, but two approaches are particularly useful: what biological data they focus on and what they allow researchers to do with that data.
By Biological Focus
According to Gondal et al. (2024), single-cell databases can be broadly categorized as:
General databases – broad repositories covering many tissues, diseases, and biological systems
Tissue-specific databases
Disease-specific databases
Cancer-focused databases
Cell type-focused databases
By Data Type and Functionality
From a practical standpoint, most public single-cell resources can be grouped into three broad categories:
Archives
Archives are primary repositories where researchers deposit data generated in their studies. These resources typically contain raw sequencing files, count matrices, processed matrices, and associated metadata.
Standardized Databases
These resources aggregate datasets from multiple studies and apply standardized processing pipelines, harmonized metadata, and consistent cell-type annotations. Such standardization makes it easier to compare datasets across studies and perform large-scale analyses.
Interactive Discovery Portals
These databases provide web-based interfaces that allow users to search, visualize, and explore public single-cell datasets without downloading and processing the data themselves.
It's worth noting that these categories are not mutually exclusive. For example, some resources combine standardized datasets with interactive exploration tools, allowing researchers to both access harmonized data and explore it through a user-friendly interface.
Popular Single-Cell Databases: What They Do Well (and Where They Fall Short)
Let's start with archive databases. While these single-cell databases provide the broadest coverage of public single-cell datasets, metadata, file formats, annotations, and processing methods are often inconsistent across studies, making cross-dataset comparison and integration challenging.
Database Name | Category | Description | Pros | Cons |
GEO | Archive / General | The most widely used repository for gene expression data in general. Single-cell studies typically provide raw count matrices (MTX, CSV, TSV, TXT, H5), metadata, and occasionally processed objects such as H5AD or RDS files. | • Massive dataset collection • Often the first place new studies appear • Easy to find associated publications | • Metadata, file formats, and data structures vary widely. Some datasets are raw while others are normalized or partially processed • No interactive visualization or analysis tools |
SRA | Archive / General | Repository for raw sequencing data, primarily FASTQ files generated from sequencing experiments. | Access to original sequencing reads; ideal for custom processing pipelines. | Requires significant computational resources and bioinformatics expertise to explore. |
Zenodo | Archive / General | A general-purpose open-access repository developed by CERN and supported by the European Commission. Researchers often use Zenodo to share single-cell datasets, processed objects (e.g., H5AD, Seurat RDS), analysis code, supplementary files, and data not deposited in specialized repositories. | • Often contains processed datasets and analysis-ready files • Supports code, workflows, and supplementary materials alongside data | • Metadata and file organization vary widely between studies • Limited search and filtering capabilities for biological datasets |
EBI’s Single Cell Expression Atlas | Archive / Interactive Discovery Portal/ General | A repository maintained by EMBL-EBI and commonly used resources for depositing public single-cell data alongside GEO. It includes a searchable browser that allows researchers to discover datasets by species, project, assay,… as well as basic interactive tools for exploring gene expression patterns. | • Easy to navigate and find relevant datasets with search tools and filters (although quite limited). • Provides basic interactive visualization | • Not fully harmonized across studies (primarily standardized to assay ontologies and species) • Limited data coverage (~10 million cells as of June 2026) • Limited visualization and downstream analysis capabilities |
Databases with focused topics are great for finding relevant data to particular therapeutic areas. Yet they come with much less data coverage. Some examples:
Database Name | Category | Description | Pros | Cons |
Allen Brain Cell Types Atlas | Tissue Specific | Comprehensive atlas of brain cell types generated using transcriptomic and multimodal profiling. | Excellent resource for neuroscience research. | Limited data coverage |
TISCH2 | Cancer-Focused Database | Specialized resource for tumor microenvironment single-cell datasets with curated annotations. | Excellent resource for cancer and immuno-oncology research. | Limited data coverage |
Standardized databases aim to solve one of the biggest challenges in public single-cell research: data heterogeneity. These single-cell databases harmonize metadata, annotations, and data formats across studies, making it much easier to discover relevant datasets, compare cohorts, and perform cross-study analyses. Most also provide interactive visualization tools that allow researchers to explore data without extensive bioinformatics expertise.
Here are some available standardized single-cell databases to consider:
Database Name | Category | Description | Pros | Cons |
Human Cell Atlas (HCA) | Standardized Database / General | A repository of datasets generated by the Human Cell Atlas initiative, an international effort to create comprehensive reference maps of all human cell types. |
| Not all terminologies are standardized - missing harmonized metadata related to treatments, animal strains, comorbidities, etc. Coverage is strongest for Human Cell Atlas projects and participating consortia rather than all published single-cell studies. |
HuBMAP | Standardized Database / Interactive Discovery Portal / General | A consortium-led repository and data portal that aims to map the human body at cellular resolution. HuBMAP hosts a wide range of single-cell, spatial transcriptomics, imaging, and other multimodal datasets across multiple human tissues and organs. |
| Not all terminologies are standardized. Missing important harmonized metadata related to treatments, cell lines, sample types/ morphology,... Primarily designed as a multimodal human atlas rather than a dedicated single-cell repository. Single-cell dataset coverage is smaller. |
DISCO | Standardized Database/ Interactive Discovery Portal/ General | A curated, harmonized data repository and interactive platform for visualization. | There's a function to integrate samples across studies. Integrated atlases are available for instant access. | Focused primarily on human datasets. Limited downstream analysis capabilities compared with dedicated analysis platforms. |
CELLxGENE | Standardized Database / Interactive Discovery Portal / General | Collection of harmonized single-cell datasets with search and visualization capabilities. | Standardized terminologies used across datasets regarding key variables like cell types, assays, diseases, tissues Standardized data formats (downloadable in H5AD) Easy to navigate and find relevant datasets with search tools and filters. User-friendly; supports visualization without coding. | Not all terminologies are standardized. Missing important harmonized metadata related to treatments, ages, cell lines, sample types/ morphology (‘normal’ labels can both mean healthy controls and normal tissues in diseased patients). Limited visualization and downstream analysis functions. |
Broad Institute Single Cell Portal | Standardized Database / Interactive Discovery Portal / General | One of the most widely used repositories for single-cell data with interactive exploration tools. | Standardized terminologies used across datasets regarding key variables like cell types, assays, diseases, tissues User-friendly; supports visualization without coding. Easy to navigate and find relevant datasets with search tools and filters. | Not all terminologies are standardized. Missing important harmonized metadata related to treatments, cell lines, sequencing platforms, animal strains, etc. Limited visualization and downstream analysis functions. |
How Pythiomics Takes a Different Approach
As discussed above, existing public single-cell databases each address part of the challenge but leave important gaps:
Data from large archive repositories are often difficult to reuse, especially for AI/ML modeling, because metadata, annotations, and file formats are not standardized. Raw data requires significant effort to process before it is ready to explore.
Standardized databases improve consistency, but harmonization is typically limited to a relatively small number of metadata categories, making it difficult to compare cohorts across studies using variables such as treatments, treatment history, demographics, or clinical characteristics.
In addition, many databases focus primarily on data discovery and basic visualization, offering limited downstream analysis capabilities.
We developed the Pythiomics database with these challenges in mind. It is designed as a centralized repository that provides truly ready-to-use data for AI/ML modeling, target and biomarker discoveries. Pythiomics combines extensive data coverage, careful manual curation and documentation, thorough metadata harmonization, and rich analytical functionality within a single platform.

Pythiomics | |
Category | Standardized Database / Interactive Discovery Portal / General |
Data source | Manually curated from a wide range of repositories like GEO, Cellxgene, Zenodo, HCA, Broad Institute Single-cell Portal, EBI Single-cell Expression Atlas, lab websites,… |
Data curation pipeline |
|
Data coverage |
As of June 2026 |
Data accessibility |
|
Data harmonization | 50+ harmonized metadata fields covering patient ids, sample ids, diseases, tissues, cell types, assays, treatments, treatment history, sample types/morphology, genders, sequencing platforms, animal strains, comorbidities, cell lines and many more. |
Wrapping Up
The best database ultimately depends on your research goals. If you need raw sequencing data, archive repositories may be the right choice. If you are focused on a particular disease or tissue, specialized resources can provide valuable domain-specific insights. For researchers looking to rapidly discover, compare, and analyze datasets across studies, standardized databases offer the most efficient workflow.
Pythiomics builds on this foundation by combining large-scale public data coverage, deep metadata harmonization, and integrated analytics in a single platform. By reducing the time spent searching, cleaning, and integrating datasets, researchers can focus on what matters most: generating biological insights and accelerating discovery.





Comments