.volcano_plot

proteopy.pl.volcano_plot(adata, varm_slot, fc_col='logfc', pval_col='pval_adj', fc_thresh=1.0, pval_thresh=0.05, top_labels=None, label_col=None, figsize=(6.0, 5.0), xlabel=None, alt_color=None, ylabel_logscale=True, title=None, show=True, save=None, ax=None)[source]

Visualize differential abundance results as a volcano plot.

Creates a scatter plot of log fold change (x-axis) versus p-value (y-axis) for proteins from a statistical test stored in adata.varm. Points are colored by significance (exceeding both fold change and p-value thresholds), with options for custom coloring and automatic labeling of top hits.

Parameters:
  • adata (ad.AnnData) – AnnData containing differential abundance test results in .varm.

  • varm_slot (str) – Key in adata.varm containing the differential abundance test results as a DataFrame. Expected format produced by copro.tl.differential_abundance.

  • fc_col (str, optional) – Column name in the varm DataFrame containing log fold change values. Log base depends on the test method used.

  • pval_col (str, optional) – Column name in the varm DataFrame containing adjusted p-values. If this column is not found, the function falls back to "pval" (unadjusted p-values).

  • fc_thresh (float, optional) – Absolute log fold change threshold for significance. Proteins with |logfc| >= fc_thresh and pval <= pval_thresh are highlighted as significant.

  • pval_thresh (float, optional) – P-value threshold for significance. Used in conjunction with fc_thresh to identify significant proteins.

  • top_labels (int | None, optional) – Number of top proteins to label on each side of the volcano plot (up to 2N labels total). For each direction (positive and negative fold change), selects the top N proteins that meet BOTH significance thresholds (pval <= pval_thresh AND |logfc| >= fc_thresh). Proteins are ranked first by smallest p-value, then by largest absolute fold change. None disables automatic labeling.

  • label_col (str | None, optional) – Column in adata.var to use for labeling proteins. Defaults to adata.var_names if None.

  • figsize (tuple[float, float], optional) – Figure dimensions (width, height) in inches.

  • xlabel (str | None, optional) – Label for the x-axis. Defaults to the value of fc_col if None.

  • alt_color (pd.Series | list[bool] | np.ndarray | None, optional) – Boolean mask (length n_vars) for alternative coloring scheme. When provided, this COMPLETELY OVERRIDES the default significance-based coloring: proteins with True are colored light purple (#8E54E5), proteins with False are colored gray (#808080). Significance thresholds (fc_thresh and pval_thresh) are still visualized as dashed lines but do not influence point colors. None uses the default significance-based coloring (gray, blue, red).

  • ylabel_logscale (bool, optional) – Controls y-axis representation of p-values. When True, plot raw p-values on a log10-scaled y-axis (inverted so smaller p-values appear higher); y-axis label shows pval_col name. When False, apply -log10 transform to p-values and plot on a linear y-axis; y-axis label shows -log10(pval_col). Both representations produce visually similar plots with the same interpretation.

  • title (str | None, optional) – Plot title. If None, generates a title from the varm_slot name using the stat test metadata (test type, group comparison, layer).

  • show (bool, optional) – Call matplotlib.pyplot.show() to display the plot.

  • save (str | Path | None, optional) – File path to save the figure. Saved at 300 DPI with tight bounding box. None skips saving.

  • ax (bool | None, optional) – Return the matplotlib.axes.Axes object. When None or False, returns None. When True, returns the Axes object for further customization.

Returns:

The Matplotlib Axes object if ax=True, otherwise None.

Return type:

Axes | None

Raises:
  • KeyError – If varm_slot is not in adata.varm, fc_col or p-value columns are not in the varm DataFrame, or label_col is not in adata.var.

  • TypeError – If adata.varm[varm_slot] is not a pandas DataFrame.

  • ValueError – If no valid (finite, positive p-value) results remain after filtering, if top_labels is not a positive integer, or if all of show, save, and ax are False.

Notes

Data Filtering: Proteins are filtered before plotting to remove: - Missing values in fold change or p-value columns - Non-finite values (inf, -inf, nan) - Non-positive p-values (cannot be log-transformed)

Color Schemes: Default coloring (when alt_color=None): - Gray: non-significant proteins (fail one or both thresholds) - Blue (#1f77b4): significantly downregulated

(logfc <= -fc_thresh AND pval <= pval_thresh)

  • Red (#d62728): significantly upregulated (logfc >= fc_thresh AND pval <= pval_thresh)

Alternative coloring (when alt_color is provided): - Gray (#808080): proteins with alt_color=False - Light purple (#8E54E5): proteins with alt_color=True - Significance thresholds do NOT affect colors, only threshold

lines are drawn

Label Selection Algorithm (when top_labels is set): 1. Filter proteins to those meeting BOTH significance thresholds 2. Separate into positive (logfc > 0) and negative

(logfc < 0) groups

  1. Within each group, rank by: (1) smallest p-value, then (2) largest absolute fold change

  2. Select top N from each group (up to 2N total labels)

  3. Use adjustText library to prevent label overlap

Examples

Plot differential abundance results with default settings:

>>> pp.pl.volcano_plot(adata, varm_slot="welch;condition;treatment_vs_ctrl")

Label top 10 proteins per side and save to file:

>>> pp.pl.volcano_plot(
...     adata,
...     varm_slot="welch;condition;treatment_vs_ctrl",
...     top_labels=10,
...     save="volcano.png",
... )

Use custom coloring to highlight proteins of interest:

>>> proteins_of_interest = adata.var["protein_id"].isin(
...     ["P12345", "Q67890"]
... )
>>> pp.pl.volcano_plot(
...     adata,
...     varm_slot="welch;condition;treatment_vs_ctrl",
...     alt_color=proteins_of_interest,
... )