.pp

The proteopy.pp module provides preprocessing functions for quality control, filtering, normalization, and imputation of proteomics data.

Filtering

proteopy.pp.filter_samples

Filter observations based on non-missing value content.

proteopy.pp.filter_samples_completeness

Filter observations based on data completeness.

proteopy.pp.filter_var

Filter variables based on non-missing value content.

proteopy.pp.filter_var_completeness

Filter variables based on data completeness.

proteopy.pp.filter_proteins_by_peptide_count

Filter proteins by their peptide count.

proteopy.pp.filter_samples_by_category_count

Filter observations by the frequency of their category value in a .vars metadata column.

proteopy.pp.remove_zero_variance_vars

Remove variables (columns) with near-zero variance, skipping NaN values.

proteopy.pp.remove_contaminants

Remove variables whose protein identifier matches a contaminant FASTA entry.

Normalization

proteopy.pp.normalize_median

Median normalization of intensities.

Imputation

proteopy.pp.impute_downshift

Left-censored imputation in log space with downshifted normal sampling.

Quantification

proteopy.pp.extract_peptide_groups

Create a new column adata.var['peptide_group'] with all overlapping (substring) peptide_ids joined by ';' for each row in adata.var.

proteopy.pp.summarize_modifications

Aggregate modified peptides by their stripped sequence.

proteopy.pp.summarize_overlapping_peptides

Aggregate intensities across peptides sharing the same group_col.

proteopy.pp.quantify_proteins

Aggregate intensities in adata.X (or selected layer) by .var[group_col], aggregate annotations in adata.var by concatenating unique values with ';', and set group_col as the new index (var_names).

proteopy.pp.quantify_proteoforms

Aggregate intensities in adata.X (or selected layer) by .var[group_col], aggregate annotations in adata.var by concatenating unique values with ';', and set group_col as the new index (var_names).

Statistics

proteopy.pp.calculate_cv

Compute the coefficient of variation (CV = std / mean) for each variable.