.volcano_plot
- proteopy.pl.volcano_plot(adata, varm_slot, fc_col='logfc', pval_col='pval_adj', fc_thresh=1.0, pval_thresh=0.05, top_labels=None, label_col=None, figsize=(6.0, 5.0), xlabel=None, alt_color=None, ylabel_logscale=True, title=None, show=True, save=None, ax=None)[source]
Visualize differential abundance results as a volcano plot.
Creates a scatter plot of log fold change (x-axis) versus p-value (y-axis) for proteins from a statistical test stored in
adata.varm. Points are colored by significance (exceeding both fold change and p-value thresholds), with options for custom coloring and automatic labeling of top hits.- Parameters:
adata (ad.AnnData) –
AnnDatacontaining differential abundance test results in.varm.varm_slot (str) – Key in
adata.varmcontaining the differential abundance test results as a DataFrame. Expected format produced bycopro.tl.differential_abundance.fc_col (str, optional) – Column name in the varm DataFrame containing log fold change values. Log base depends on the test method used.
pval_col (str, optional) – Column name in the varm DataFrame containing adjusted p-values. If this column is not found, the function falls back to
"pval"(unadjusted p-values).fc_thresh (float, optional) – Absolute log fold change threshold for significance. Proteins with
|logfc| >= fc_threshandpval <= pval_threshare highlighted as significant.pval_thresh (float, optional) – P-value threshold for significance. Used in conjunction with
fc_threshto identify significant proteins.top_labels (int | None, optional) – Number of top proteins to label on each side of the volcano plot (up to 2N labels total). For each direction (positive and negative fold change), selects the top N proteins that meet BOTH significance thresholds (
pval <= pval_threshAND|logfc| >= fc_thresh). Proteins are ranked first by smallest p-value, then by largest absolute fold change.Nonedisables automatic labeling.label_col (str | None, optional) – Column in
adata.varto use for labeling proteins. Defaults toadata.var_namesifNone.figsize (tuple[float, float], optional) – Figure dimensions (width, height) in inches.
xlabel (str | None, optional) – Label for the x-axis. Defaults to the value of
fc_colifNone.alt_color (pd.Series | list[bool] | np.ndarray | None, optional) – Boolean mask (length
n_vars) for alternative coloring scheme. When provided, this COMPLETELY OVERRIDES the default significance-based coloring: proteins withTrueare colored light purple (#8E54E5), proteins withFalseare colored gray (#808080). Significance thresholds (fc_threshandpval_thresh) are still visualized as dashed lines but do not influence point colors.Noneuses the default significance-based coloring (gray, blue, red).ylabel_logscale (bool, optional) – Controls y-axis representation of p-values. When
True, plot raw p-values on a log10-scaled y-axis (inverted so smaller p-values appear higher); y-axis label showspval_colname. WhenFalse, apply-log10transform to p-values and plot on a linear y-axis; y-axis label shows-log10(pval_col). Both representations produce visually similar plots with the same interpretation.title (str | None, optional) – Plot title. If
None, generates a title from thevarm_slotname using the stat test metadata (test type, group comparison, layer).show (bool, optional) – Call
matplotlib.pyplot.show()to display the plot.save (str | Path | None, optional) – File path to save the figure. Saved at 300 DPI with tight bounding box.
Noneskips saving.ax (bool | None, optional) – Return the
matplotlib.axes.Axesobject. WhenNoneorFalse, returnsNone. WhenTrue, returns the Axes object for further customization.
- Returns:
The Matplotlib Axes object if
ax=True, otherwiseNone.- Return type:
Axes | None
- Raises:
KeyError – If
varm_slotis not inadata.varm,fc_color p-value columns are not in the varm DataFrame, orlabel_colis not inadata.var.TypeError – If
adata.varm[varm_slot]is not a pandas DataFrame.ValueError – If no valid (finite, positive p-value) results remain after filtering, if
top_labelsis not a positive integer, or if all ofshow,save, andaxareFalse.
Notes
Data Filtering: Proteins are filtered before plotting to remove: - Missing values in fold change or p-value columns - Non-finite values (inf, -inf, nan) - Non-positive p-values (cannot be log-transformed)
Color Schemes: Default coloring (when
alt_color=None): - Gray: non-significant proteins (fail one or both thresholds) - Blue (#1f77b4): significantly downregulated(
logfc <= -fc_threshANDpval <= pval_thresh)Red (#d62728): significantly upregulated (
logfc >= fc_threshANDpval <= pval_thresh)
Alternative coloring (when
alt_coloris provided): - Gray (#808080): proteins withalt_color=False- Light purple (#8E54E5): proteins withalt_color=True- Significance thresholds do NOT affect colors, only thresholdlines are drawn
Label Selection Algorithm (when
top_labelsis set): 1. Filter proteins to those meeting BOTH significance thresholds 2. Separate into positive (logfc > 0) and negative(
logfc < 0) groupsWithin each group, rank by: (1) smallest p-value, then (2) largest absolute fold change
Select top N from each group (up to 2N total labels)
Use
adjustTextlibrary to prevent label overlap
Examples
Plot differential abundance results with default settings:
>>> pp.pl.volcano_plot(adata, varm_slot="welch;condition;treatment_vs_ctrl")
Label top 10 proteins per side and save to file:
>>> pp.pl.volcano_plot( ... adata, ... varm_slot="welch;condition;treatment_vs_ctrl", ... top_labels=10, ... save="volcano.png", ... )
Use custom coloring to highlight proteins of interest:
>>> proteins_of_interest = adata.var["protein_id"].isin( ... ["P12345", "Q67890"] ... ) >>> pp.pl.volcano_plot( ... adata, ... varm_slot="welch;condition;treatment_vs_ctrl", ... alt_color=proteins_of_interest, ... )