top of page

The Beginner’s Guide to Single-Cell RNA-seq Data Analysis: Essential Plot Types

Updated: Aug 11

Exploring single-cell RNA-seq (scRNA-seq) data can be both exciting and overwhelming, especially when you're met with a flood of unfamiliar plots when visualising: UMAPs, violin plots, QC plots, heatmaps and more. But here’s the good news: each plot serves a specific purpose and helps answer a critical question about your dataset. Understanding your need is the first step to mastering the scRNA-seq technique.


Whether you're just getting started or need a refresher, this beginner-friendly guide walks you through key plot types in single-cell RNA-seq data analysis and the biological questions they unlock.


1. Quality Control (QC) plot

Key question it addresses: Are the cells of good quality?


Before diving into your single-cell RNA-seq data, it’s essential to assess its quality. Quality Control (QC) is one of the most important steps in the scRNA-seq workflow. These plots help you spot any unusual or low-quality cells and decide what to filter out. That way, you can move forward with cleaner, more reliable data for the next steps like clustering, finding important genes and making clear visualizations.


Most single-cell QC plots are violin plots, density plots or histograms - those implying the distribution of data. They often show 3 key metrics:

  • number of genes per cell (how many unique genes)

  • UMI counts per cell (total number of RNA reads)

  • percentage of mitochondrial genes (% of reads come from mitochondrial genes).


QC plots showing gene count, total UMI count, and mitochondrial gene percentage per cell. These metrics help filter out low-quality or stressed cells before downstream analysis.
QC plots created by the C-DIAM Multi-omics Studio, showing gene count, total UMI count, and mitochondrial gene percentage per cell. These metrics help filter out low-quality or stressed cells before downstream analysis.

Looking at the QC plots, we can tell the overall quality of the data, spot the outliers and determine the cut-off thresholds. For example: 


  • For genes and UMIs, too low a value means an empty droplet and too high may indicate a doublet.

  • For percentage of mitochondrial genes, the higher this value is, the more dying cells (due to apoptosis) there might be.


There is no absolute standard for the setting of filter thresholds, as it depends on the tissue types, diseases, or other experimental factors, which you should take into consideration carefully before defining the cut-offs. Commonly, it is recommended to filter out cells with ≤ 100 or ≥ 6000 expressed genes, ≤ 200 UMIs, and ≥ 10% mitochondrial genes [1].


2. UMAP

Key question it addresses: Do my cells group into distinct types or states?


UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction method that helps us see and understand complex data more easily. In single-cell RNA-seq data analysis, we measure the activity (expression) of thousands of genes in each cell. That’s a lot of data—each cell is like a point in a space with thousands of dimensions; hence, without UMAP the data would be high-dimensional and impossible to plot or interpret.


We use UMAP plots frequently in single-cell RNA-seq data analysis. For example, the above UMAP visualization shows clusters of cells labeled 1 to 20. Each cluster represents a group of similar cells.
UMAP is frequently used in single-cell RNA-seq data analysis. For example, the above UMAP visualization shows clusters of cells labeled 1 to 20. Each cluster represents a group of similar cells. (Source: C-DIAM Multi-omics Studio)

So in short, UMAP plots show complex data more simply by reducing thousands of gene dimensions down to just 2 or 3 (visible 2D & 3D space). They let us see similar cells grouped close together and different cells spread farther apart. They are also one of the most common plots you'll see in single-cell RNA-seq data analysis.

(PHOTO)


3. Violin, Box, Feature, and Dot Plots

Key question they address: How are genes expressed across the clusters? 


After identifying clusters, the next question is: how are your genes of interest expressed across them? How about the expression across samples, tissues and conditions in your data? To answer this, several types of plots are utilized in single-cell RNA-seq data analysis, including mostly heatmaps, violin plots, box plots, feature plots, and dot plots. 


Violin plots & box plots: Both violin plots and box plots are used to visualize the expression of a single gene across multiple groups (such as cell clusters, tissue types, conditions or samples). However:

  • Box plots focus on summary statistics: median, quartiles and potential outliers. They are useful in comparing the average level between different groups.


A box plot showing the expression of the marker gene PRKN in a Parkinson’s disease study. The plot summarizes the distribution, median, and variability of expression levels.
A box plot showing the expression of the marker gene PRKN in a Parkinson’s disease study. The plot summarizes the distribution, median, and variability of expression levels. (Source: C-DIAM Multi-omics Studio)
  • Violin plots include all the information in a box plot but also show the distribution shape of the data (like a smoothed histogram turned vertically). This helps you see whether the expression is bimodal, skewed or tightly clustered.


A violin plot showing the expression of the marker gene PRKN in a Parkinson’s disease study. In single-cell RNA-seq data analysis, it is preferred due to the combination of statistical summary and distribution.
A violin plot showing the expression of the marker gene PRKN in a Parkinson’s disease study. In single-cell RNA-seq data analysis, it is preferred due to the combination of statistical summary and distribution. (Source: C-DIAM Multi-omics Studio)

Because violin plots combine statistical summary and distribution, they are often preferred in single-cell RNA-seq data analysis.


Feature plots: These plots are also called joint feature plot or dual feature plot, a visualization that simultaneously displays the expression patterns of two genes across cells on a dimensionality reduction plot, typically UMAP or t-SNE. This type of plot helps you see co-expression or mutual exclusivity between two genes in different clusters or cell types. In single-cell RNA-seq data analysis, it may be useful in cases when we want to identify cell populations that express both genes.


Dual feature plot showing co-expression of OLIG1 and OLIG2 in oligodendrocytes. The strong diagonal distribution confirms their coordinated expression.
Dual feature plot showing co-expression of OLIG1 and OLIG2 in oligodendrocytes. The strong diagonal distribution confirms their coordinated expression. The plot is created by C-DIAM Multi-omics Studio.
Dual feature plot showing mutually exclusive expression of EPCAM and a CD8+ T cell marker. EPCAM is highly expressed in CMS2 epithelial tumor cells, while CD8+ T cells show little to no EPCAM expression.
Dual feature plot showing mutually exclusive expression of EPCAM and a CD8+ T cell marker. EPCAM is highly expressed in CMS2 epithelial tumor cells, while CD8+ T cells show little to no EPCAM expression. (Source: C-DIAM Multi-omics Studio)

Dot plots & heatmaps: Both are used to analyze many genes across multiple clusters (or cell types). However:

  • Dot plots show a quick summary with dot size (% of cells expressing that gene) and dot color (average expression level of that gene).

Dot plot showing the expression levels of selected marker genes across various cell types in a Parkinson’s disease study. Dot size represents the percentage of cells expressing each gene, while color indicates average expression level.
Dot plot showing the expression levels of selected marker genes across various cell types in a Parkinson’s disease study. Dot size represents the percentage of cells expressing each gene, while color indicates average expression level. (Source: C-DIAM Multi-omics Studio)
  • Meanwhile, heatmaps display detailed gene expressions (each tiny square represents the exact expression value of a gene in a cell) and reveal more about patterns or trends in large datasets.


One use of heatmaps in single-cell RNA-seq data analysis is to visualize the expression levels of selected marker genes across different cell types, as shown here in this Parkinson’s disease study. Warmer colors indicate higher expression, highlighting cell type–specific gene activity.
One use of heatmaps in single-cell RNA-seq data analysis is to visualize the expression levels of selected marker genes across different cell types, as shown here in this Parkinson’s disease study. Warmer colors indicate higher expression, highlighting cell type–specific gene activity. (Source: C-DIAM Multi-omics Studio)

4. Composition Plots

Key question it addresses: How do cell types change between conditions?


Most of the time when doing single-cell RNA-seq data analysis, to study changes of the cells across treatments, stages or time points, we will use composition plots (oftentimes they are stacked bar charts). 


These plots help track population shifts, such as immune infiltration or cell death. For example, they can reveal changes in T cell proportions in treated vs. control groups, or help measure cluster-based cell type distributions per sample. That is why these plots are especially valuable and essential for biologists applying scRNA-seq techniques in immunology, cancer or drug studies.


In single-cell RNA-seq data analysis, a composition plot helps show changes in cell proportions across different treatment groups. Each bar represents a group, and the segments indicate the relative abundance of each cell type, highlighting treatment-induced shifts in cellular composition.
In single-cell RNA-seq data analysis, a composition plot helps show changes in cell proportions across different treatment groups. Each bar represents a group, and the segments indicate the relative abundance of each cell type, highlighting treatment-induced shifts in cellular composition. (Source: C-DIAM Multi-omics Studio)

5. Intercellular Signaling Heatmaps and Circos Plots

Key question they address: How are the cell types communicating with each other? 


In some areas of research, cell-cell communication plots become very important. For example, these plots are particularly valuable in studies of the tumour microenvironment, or in regenerative medicine development, where researchers use them to visualize how niche signaling regulates stem cell differentiation. By highlighting key signalling pathways and dominant cell communicators, they answer the question of who’s signaling to whom and how - helping to identify therapeutic targets or understand disease mechanisms.


The two most common intercellular signaling plots used in single-cell RNA-seq data analysis are circos plots and heatmaps. Both of them can show how cells interact using ligands and receptors, but a slight difference.


In single-cell RNA-seq data analysis, both plots are useful. While circos plots highlight the direction and flow of signaling between cell types, heatmaps are better suited for quantitatively comparing the strength of interactions across cell types.
In single-cell RNA-seq data analysis, both plots are useful. While circos plots highlight the direction and flow of signaling between cell types, heatmaps are better suited for quantitatively comparing the strength of interactions across cell types. (Source: C-DIAM Multi-omics Studio)

Meanwhile circos plots focus on visually showing direction and flow of signalling; heatmap is less visual for direction and is best for quantitative comparison of how much each cell type is interacting with others.


6. Volcano Plot

Key question it addresses: How can I see the differentially expressed genes?


A volcano plot is a type of scatter plot commonly used to visualize differentially expressed genes (DEGs). It’s a classic and powerful tool not only in single-cell RNA-seq data analysis but also in bulk RNA-seq and other bioinformatics workflows. The name "volcano" comes from the plot's shape, with most genes clustered near the center, and significantly upregulated or downregulated genes spreading outward. 


In single-cell RNA-seq data analysis, this plot helps pinpoint biologically relevant genes for follow-up or genes with the strongest expression changes between conditions.


A volcano plot consists of one x-axis indicating how much a gene's expression has changed (Log₂ Fold Change) and one y-axis indicating how statistically significant the change is (–log₁₀(p-value)). That means a gene (a dot) is downregulated if it’s far left & high up, upregulated if far left & high up, and has no significance if at the center and low on the y-axis.


A volcano plot is a classic and powerful tool not only in single-cell RNA-seq data analysis but also in bulk RNA-seq and other bioinformatics workflows. A gene (a dot) is downregulated if it’s far left & high up, upregulated if far left & high up, and has no significance if at the center and low on the y-axis.
A volcano plot is a classic and powerful tool not only in single-cell RNA-seq data analysis but also in bulk RNA-seq and other bioinformatics workflows. A gene (a dot) is downregulated if it’s far left & high up, upregulated if far left & high up, and has no significance if at the center and low on the y-axis. (Source: C-DIAM Multi-omics Studio)

Alongside the volcano plot, a heatmap is another choice to visualize DEGs in single-cell RNA-seq data analysis. 


In single-cell RNA-seq data analysis, a heatmap shows DEGs across cell types or conditions. Rows represent genes, and columns represent samples or clusters. Color intensity reflects relative gene expression levels, highlighting patterns of upregulation and downregulation.
In single-cell RNA-seq data analysis, a heatmap shows DEGs across cell types or conditions. Rows represent genes, and columns represent samples or clusters. Color intensity reflects relative gene expression levels, highlighting patterns of upregulation and downregulation. (Source: C-DIAM Multi-omics Studio)

C-DIAM Multi-omics Studio for Single-Cell RNA-seq Data Analysis and Visualization

All the essential plots shown above are fully supported by our C-DIAM Multi-Omics Studio — and yes, they were captured directly from our platform. Built with beginners in mind and powerful enough for experts, C-DIAM eliminates common barriers in multi-omics and single-cell data analysis. Its intuitive interface and zero coding requirement make it easy for researchers at any level to explore, visualize and interpret complex datasets with confidence.

Whether you're just getting started or looking to streamline high-throughput workflows — would you like to experience a trial to see if C-DIAM can help you work smarter and faster?



References 

[1] Jovic, D., Liang, X., Zeng, H., Lin, L., Xu, F., & Luo, Y. (2022). Single‐cell RNA sequencing technologies and applications: A brief overview. Clinical and translational medicine, 12(3), e694.


Comments


bottom of page