.differential_abundance
- proteopy.tl.differential_abundance(adata, method='ttest_two_sample', group_by=None, setup=None, layer=None, multitest_correction='fdr_bh', alpha=0.05, space='auto', force=False, fill_na=None, inplace=True)[source]
Perform differential abundance analysis between sample groups.
Compares expression values between groups using statistical tests. Computes log fold changes, p-values, and applies multiple testing correction. Results are stored in
adata.varmas DataFrames.- Parameters:
adata (ad.AnnData) –
AnnDataobject with expression data in.Xor a specified layer.method (str, optional) –
Statistical test for differential abundance. Supported methods:
"ttest_two_sample": Independent two-sample Student’s t-test assuming equal variances."welch": Welch’s t-test without equal variance assumption. More robust when group variances differ.
group_by (str) – Column in
adata.obscontaining group labels for comparison.setup (dict | None, optional) –
Dictionary specifying comparison mode. Two modes available:
Two-group mode (keys
"group1"and"group2"present):Compare two specific groups. Required keys:
"group1": First group label (numerator in log fold change)."group2": Second group label (denominator in log fold change).
Example:
{"group1": "treated", "group2": "control"}One-vs-rest mode (default when
setupisNoneor{}):Compare each group against all other groups combined. Optional key:
"groups":"all"(default) to test all groups, or list of specific group labels to test.
Examples:
None,{}, or{"groups": ["A", "B"]}.Each group and the combined “rest” must have at least 3 samples.
layer (str | None, optional) – Key in
adata.layersto use. IfNone, usesadata.X.multitest_correction (str, optional) –
Multiple testing correction method. Supported values:
"bonferroni": Bonferroni correction (family-wise error rate control)."fdr_bh": Benjamini-Hochberg FDR correction (false discovery rate control)."fdr","bh","benjamini_hochberg": Aliases for"fdr_bh".
alpha (float | int, optional) – Significance threshold for labeling differential abundance. Must satisfy 0 < alpha <= 1.
space ({'auto', 'log', 'linear'}, optional) – Intensity space of input data. When
"auto", inferred viais_log_transformed(). Two-sample methods require log space; linear data are converted to log2. When"log"or"linear", mismatch with inferred space raises error unlessforce=True.force (bool, optional) – Skip space-mismatch validation and use declared
space. Use with caution when automatic detection is incorrect.fill_na (float | int | None, optional) – Replace
np.nanvalues in expression matrix with this value before analysis. IfNone, no replacement occurs.inplace (bool, optional) – If
True, modifyadatain place and returnNone. IfFalse, return modified copy ofadata.
- Returns:
When
inplace=False, returns copy ofAnnDatawith results in.varm. Wheninplace=True, returnsNoneand modifiesadatain place.Storage format in
adata.varm:Results stored as
DataFramewith keys using the format{method};{group_by};{design}or{method};{group_by};{design};{layer}when a layer is used:Two-group mode:
"{method};{group_by};{group1}_vs_{group2}"(e.g.,"welch;condition;treated_vs_control").One-vs-rest mode:
"{method};{group_by};{group}_vs_rest"for each tested group (e.g.,"ttest_two_sample;cell_type;A_vs_rest").When a layer is used, it is appended as the fourth component (e.g.,
"welch;condition;treated_vs_control;raw_intensities").
Additionally, a sanitized version of the
group_bycolumn is added toadata.obsif not already present. This column contains sanitized versions of the group labels.DataFrame columns:
Each results DataFrame contains the following columns indexed by variable names (matching
adata.var_names):mean1: Mean expression in group1 (focal group in one-vs-rest).mean2: Mean expression in group2 (rest in one-vs-rest).logfc: Log fold change (mean1 - mean2in log space). Computed in log2 space for linear input data, otherwise in the data’s existing log base.tstat: t-statistic from the statistical test.pval: Raw p-value from the test.pval_adj: Adjusted p-value using the specifiedmultitest_correctionmethod.is_diff_abundant: Boolean indicatingpval_adj <= alpha.
- Return type:
ad.AnnData | None
- Raises:
ValueError – If
group_byisNoneor not inadata.obs.ValueError – If
layeris not inadata.layers.ValueError – If
methodis not supported.ValueError – If
multitest_correctionis not supported.ValueError – If
alphais not in range (0, 1].ValueError – If
spacemismatches inferred space andforce=False.ValueError – If groups have fewer than 3 samples.
Examples
Two-group comparison between treated and control samples:
>>> import proteopy as pp >>> adata = pp.datasets.karayel_2020() >>> pp.tl.differential_abundance( ... adata, ... method="welch", ... group_by="condition", ... setup={"group1": "treated", "group2": "control"}, ... ) >>> results = adata.varm["welch;condition;treated_vs_control"] >>> sig_proteins = results[results["is_diff_abundant"]]
One-vs-rest comparison for all cell types:
>>> pp.tl.differential_abundance( ... adata, ... method="ttest_two_sample", ... group_by="cell_type", ... setup=None, ... ) >>> # Results stored as "ttest_two_sample;cell_type;{celltype}_vs_rest" >>> for key in adata.varm.keys(): ... print(key, adata.varm[key]["is_diff_abundant"].sum())