`.differential_abundance`

proteopy.tl.differential_abundance(adata, method='ttest_two_sample', group_by=None, setup=None, layer=None, multitest_correction='fdr_bh', alpha=0.05, space='auto', force=False, fill_na=None, inplace=True)[source]

Perform differential abundance analysis between sample groups.

Compares expression values between groups using statistical tests. Computes log fold changes, p-values, and applies multiple testing correction. Results are stored in adata.varm as DataFrames.

Parameters:

adata (ad.AnnData) – AnnData object with expression data in .X or a specified layer.
method (str, optional) –
Statistical test for differential abundance. Supported methods:
- "ttest_two_sample": Independent two-sample Student’s t-test assuming equal variances.
- "welch": Welch’s t-test without equal variance assumption. More robust when group variances differ.
group_by (str) – Column in adata.obs containing group labels for comparison.
setup (dict | None, optional) –
Dictionary specifying comparison mode. Two modes available:

Two-group mode (keys "group1" and "group2" present):
Compare two specific groups. Required keys:
- "group1": First group label (numerator in log fold change).
- "group2": Second group label (denominator in log fold change).
Example: {"group1": "treated", "group2": "control"}
One-vs-rest mode (default when setup is None or {}):
Compare each group against all other groups combined. Optional key:
- "groups": "all" (default) to test all groups, or list of specific group labels to test.
Examples: None, {}, or {"groups": ["A", "B"]}.
Each group and the combined “rest” must have at least 3 samples.
layer (str | None, optional) – Key in adata.layers to use. If None, uses adata.X.
multitest_correction (str, optional) –
Multiple testing correction method. Supported values:
- "bonferroni": Bonferroni correction (family-wise error rate control).
- "fdr_bh": Benjamini-Hochberg FDR correction (false discovery rate control).
- "fdr", "bh", "benjamini_hochberg": Aliases for "fdr_bh".
alpha (float | int, optional) – Significance threshold for labeling differential abundance. Must satisfy 0 < alpha <= 1.
space ({'auto', 'log', 'linear'}, optional) – Intensity space of input data. When "auto", inferred via is_log_transformed(). Two-sample methods require log space; linear data are converted to log2. When "log" or "linear", mismatch with inferred space raises error unless force=True.
force (bool, optional) – Skip space-mismatch validation and use declared space. Use with caution when automatic detection is incorrect.
fill_na (float | int | None, optional) – Replace np.nan values in expression matrix with this value before analysis. If None, no replacement occurs.
inplace (bool, optional) – If True, modify adata in place and return None. If False, return modified copy of adata.

Returns:

When inplace=False, returns copy of AnnData with results in .varm. When inplace=True, returns None and modifies adata in place.

Storage format in adata.varm:

Results stored as DataFrame with keys using the format {method};{group_by};{design} or {method};{group_by};{design};{layer} when a layer is used:

Two-group mode: "{method};{group_by};{group1}_vs_{group2}" (e.g., "welch;condition;treated_vs_control").
One-vs-rest mode: "{method};{group_by};{group}_vs_rest" for each tested group (e.g., "ttest_two_sample;cell_type;A_vs_rest").
When a layer is used, it is appended as the fourth component (e.g., "welch;condition;treated_vs_control;raw_intensities").

Additionally, a sanitized version of the group_by column is added to adata.obs if not already present. This column contains sanitized versions of the group labels.

DataFrame columns:

Each results DataFrame contains the following columns indexed by variable names (matching adata.var_names):

mean1: Mean expression in group1 (focal group in one-vs-rest).
mean2: Mean expression in group2 (rest in one-vs-rest).
logfc: Log fold change (mean1 - mean2 in log space). Computed in log2 space for linear input data, otherwise in the data’s existing log base.
tstat: t-statistic from the statistical test.
pval: Raw p-value from the test.
pval_adj: Adjusted p-value using the specified multitest_correction method.
is_diff_abundant: Boolean indicating pval_adj <= alpha.

Return type:

ad.AnnData | None

Raises:

ValueError – If group_by is None or not in adata.obs.
ValueError – If layer is not in adata.layers.
ValueError – If method is not supported.
ValueError – If multitest_correction is not supported.
ValueError – If alpha is not in range (0, 1].
ValueError – If space mismatches inferred space and force=False.
ValueError – If groups have fewer than 3 samples.

Examples

Two-group comparison between treated and control samples:

>>> import proteopy as pp
>>> adata = pp.datasets.karayel_2020()
>>> pp.tl.differential_abundance(
...     adata,
...     method="welch",
...     group_by="condition",
...     setup={"group1": "treated", "group2": "control"},
... )
>>> results = adata.varm["welch;condition;treated_vs_control"]
>>> sig_proteins = results[results["is_diff_abundant"]]

One-vs-rest comparison for all cell types:

>>> pp.tl.differential_abundance(
...     adata,
...     method="ttest_two_sample",
...     group_by="cell_type",
...     setup=None,
... )
>>> # Results stored as "ttest_two_sample;cell_type;{celltype}_vs_rest"
>>> for key in adata.varm.keys():
...     print(key, adata.varm[key]["is_diff_abundant"].sum())

.differential_abundance

`.differential_abundance`