.differential_abundance

proteopy.tl.differential_abundance(adata, method='ttest_two_sample', group_by=None, setup=None, layer=None, multitest_correction='fdr_bh', alpha=0.05, space='auto', force=False, fill_na=None, inplace=True)[source]

Perform differential abundance analysis between sample groups.

Compares expression values between groups using statistical tests. Computes log fold changes, p-values, and applies multiple testing correction. Results are stored in adata.varm as DataFrames.

Parameters:
  • adata (ad.AnnData) – AnnData object with expression data in .X or a specified layer.

  • method (str, optional) –

    Statistical test for differential abundance. Supported methods:

    • "ttest_two_sample": Independent two-sample Student’s t-test assuming equal variances.

    • "welch": Welch’s t-test without equal variance assumption. More robust when group variances differ.

  • group_by (str) – Column in adata.obs containing group labels for comparison.

  • setup (dict | None, optional) –

    Dictionary specifying comparison mode. Two modes available:

    Two-group mode (keys "group1" and "group2" present):

    Compare two specific groups. Required keys:

    • "group1": First group label (numerator in log fold change).

    • "group2": Second group label (denominator in log fold change).

    Example: {"group1": "treated", "group2": "control"}

    One-vs-rest mode (default when setup is None or {}):

    Compare each group against all other groups combined. Optional key:

    • "groups": "all" (default) to test all groups, or list of specific group labels to test.

    Examples: None, {}, or {"groups": ["A", "B"]}.

    Each group and the combined “rest” must have at least 3 samples.

  • layer (str | None, optional) – Key in adata.layers to use. If None, uses adata.X.

  • multitest_correction (str, optional) –

    Multiple testing correction method. Supported values:

    • "bonferroni": Bonferroni correction (family-wise error rate control).

    • "fdr_bh": Benjamini-Hochberg FDR correction (false discovery rate control).

    • "fdr", "bh", "benjamini_hochberg": Aliases for "fdr_bh".

  • alpha (float | int, optional) – Significance threshold for labeling differential abundance. Must satisfy 0 < alpha <= 1.

  • space ({'auto', 'log', 'linear'}, optional) – Intensity space of input data. When "auto", inferred via is_log_transformed(). Two-sample methods require log space; linear data are converted to log2. When "log" or "linear", mismatch with inferred space raises error unless force=True.

  • force (bool, optional) – Skip space-mismatch validation and use declared space. Use with caution when automatic detection is incorrect.

  • fill_na (float | int | None, optional) – Replace np.nan values in expression matrix with this value before analysis. If None, no replacement occurs.

  • inplace (bool, optional) – If True, modify adata in place and return None. If False, return modified copy of adata.

Returns:

When inplace=False, returns copy of AnnData with results in .varm. When inplace=True, returns None and modifies adata in place.

Storage format in adata.varm:

Results stored as DataFrame with keys using the format {method};{group_by};{design} or {method};{group_by};{design};{layer} when a layer is used:

  • Two-group mode: "{method};{group_by};{group1}_vs_{group2}" (e.g., "welch;condition;treated_vs_control").

  • One-vs-rest mode: "{method};{group_by};{group}_vs_rest" for each tested group (e.g., "ttest_two_sample;cell_type;A_vs_rest").

  • When a layer is used, it is appended as the fourth component (e.g., "welch;condition;treated_vs_control;raw_intensities").

Additionally, a sanitized version of the group_by column is added to adata.obs if not already present. This column contains sanitized versions of the group labels.

DataFrame columns:

Each results DataFrame contains the following columns indexed by variable names (matching adata.var_names):

  • mean1: Mean expression in group1 (focal group in one-vs-rest).

  • mean2: Mean expression in group2 (rest in one-vs-rest).

  • logfc: Log fold change (mean1 - mean2 in log space). Computed in log2 space for linear input data, otherwise in the data’s existing log base.

  • tstat: t-statistic from the statistical test.

  • pval: Raw p-value from the test.

  • pval_adj: Adjusted p-value using the specified multitest_correction method.

  • is_diff_abundant: Boolean indicating pval_adj <= alpha.

Return type:

ad.AnnData | None

Raises:
  • ValueError – If group_by is None or not in adata.obs.

  • ValueError – If layer is not in adata.layers.

  • ValueError – If method is not supported.

  • ValueError – If multitest_correction is not supported.

  • ValueError – If alpha is not in range (0, 1].

  • ValueError – If space mismatches inferred space and force=False.

  • ValueError – If groups have fewer than 3 samples.

Examples

Two-group comparison between treated and control samples:

>>> import proteopy as pp
>>> adata = pp.datasets.karayel_2020()
>>> pp.tl.differential_abundance(
...     adata,
...     method="welch",
...     group_by="condition",
...     setup={"group1": "treated", "group2": "control"},
... )
>>> results = adata.varm["welch;condition;treated_vs_control"]
>>> sig_proteins = results[results["is_diff_abundant"]]

One-vs-rest comparison for all cell types:

>>> pp.tl.differential_abundance(
...     adata,
...     method="ttest_two_sample",
...     group_by="cell_type",
...     setup=None,
... )
>>> # Results stored as "ttest_two_sample;cell_type;{celltype}_vs_rest"
>>> for key in adata.varm.keys():
...     print(key, adata.varm[key]["is_diff_abundant"].sum())