Frankenfield 2022 – Universal Protein Contaminant Library

Overview

Mass spectrometry-based proteomics is challenged by the presence of contaminant protein signals originating from reagents, sample handling, and the laboratory environment. These contaminants are difficult to avoid and, if unaccounted for, can lead to false protein identifications and reduced sensitivity.

Frankenfield et al. (2022) systematically characterized common sources of protein contamination and compiled a universal contaminant FASTA library containing 381 protein entries. The library was designed to be applicable across all proteomics workflows, including both data-dependent acquisition (DDA) and data-independent acquisition (DIA).

The authors demonstrated that including the contaminant library during database searching reduces false discoveries and increases true protein identifications without affecting quantification accuracy. Library-based DIA analysis showed more than 5% additional protein and peptide identifications when contaminant libraries were included in a HepG2 human cells dataset.

Library Composition

The 381 contaminant proteins are classified into the following source categories:

Source of Contamination

Proteins

Human skin and hair (keratins and keratin-associated proteins)

151

Residual cell culture medium containing fetal bovine serum (FBS)

120

FBS / affinity bead background

39

Mouse skin and hair

26

Sheep keratin (wool clothing)

16

Proteolytic enzymes (trypsin, pepsin, Lys-C, and others)

11

Other contaminants

7

Fluorescent proteins (GFP, YFP)

3

Latex gloves (Hevea brasiliensis)

2

Affinity purification reagents (FLAG, HA, streptavidin beads)

3

Bacterial (Escherichia coli)

1

Lys-C protease enzyme

1

Trypsin protease enzyme

1

Organism Breakdown

Entries in the FASTA file originate from the following organisms:

Organism

Proteins

Bos taurus (bovine, predominantly FBS-derived)

159

Homo sapiens (human keratins and skin proteins)

151

Mus musculus (mouse keratins)

26

Ovis aries (sheep wool keratins)

16

Sus scrofa (porcine enzymes)

4

Other organisms (15 species, 1–2 entries each)

25

Methodology

The authors generated contamination-only samples by introducing known contaminant sources into lysis buffer:

  • Enzyme contamination: trypsin, Lys-C, and trypsin/Lys-C mixtures

  • Affinity purification contamination: streptavidin, FLAG, and HA beads

  • Serum contamination: fetal bovine serum

  • Keratin contamination: intentional handling of samples with ungloved hands

These contamination-only samples were analyzed by LC-MS/MS. The resulting identifications were combined with proteins from existing contaminant databases (cRAP, MaxQuant) to build the universal library, adding 166 previously uncharacterized contaminant entries.

The library was validated using HEK293 cell lysates and mouse brain tissue across multiple software platforms (MaxQuant, Proteome Discoverer, Spectronaut, DIA-NN).

Sample-Type Specific Libraries

In addition to the universal library, the authors provide sample-type specific contaminant FASTA files for:

  • Cell culture

  • Mouse tissue

  • Rat tissue

  • Neuron culture

  • Stem cell culture

These are available from the GitHub repository.

Note

ProteoPy downloads the universal contaminant library via pr.download.contaminants(source="frankenfield2022").

Resources