.remove_contaminants

proteopy.pp.remove_contaminants(adata, contaminant_path, protein_key='protein_id', header_parser=None, inplace=False)[source]

Remove variables whose protein identifier matches a contaminant FASTA entry.

Parameters:
  • adata (anndata.AnnData) – Annotated data.

  • contaminant_path (str | Path) – Path to the contaminant list. The file can be in FASTA format, in which case the headers are parsed to extract the contaminant ids (see param: header_parser); or tabular format TSV/CSV files, in which case the first column is extracted as contaminant ids..

  • protein_key (str, optional (default: "protein_id")) – Column in adata.var containing protein identifiers to match.

  • header_parser (callable, optional) – Function to extract protein IDs from FASTA headers. Defaults to splitting the header on "|" and returning the second element, falling back to the full header if not present.

  • inplace (bool, optional (default: False)) – If True, modify adata in place. Otherwise, return a filtered view.

Returns:

None if inplace=True; otherwise the filtered AnnData view.

Return type:

None or anndata.AnnData