.karayel_2020

proteopy.datasets.karayel_2020()[source]

Load Karayel 2020 erythropoiesis proteomics dataset.

Download and process the protein-level DIA-MS dataset from Karayel et al. (2020) studying CD34+ hematopoietic stem cell differentiation during erythropoiesis. The dataset contains quantitative proteomics measurements across five cell types representing sequential stages of erythroid development.

The function downloads data from the PRIDE archive (PXD017276), processes sample identifiers, maps technical names to biological cell types, and excludes day 7 samples. Protein quantities marked as ‘Filtered’ in the original data are converted to np.nan.

Sample annotation (.obs) includes:
  • sample_id: Unique sample identifier (cell_type_replicate)

  • cell_type: Differentiation stage (Progenitor, ProE&EBaso, LBaso, Poly, Ortho)

  • replicate: Technical replicate identifier

Variable annotation (.var) includes:
  • protein_id: Protein group identifier (matches .var_names)

  • gene_name: Associated gene name(s)

Returns:

AnnData object containing protein-level quantification data. .X contains protein intensities (samples × proteins) with missing values as np.nan. Day 7 samples are excluded from the dataset.

Return type:

ad.AnnData

Raises:

urllib.error.URLError – If download from PRIDE archive fails.

Examples

>>> import proteopy as pp
>>> adata = pp.datasets.karayel_2020()
>>> adata
AnnData object with n_obs × n_vars
    obs: 'sample_id', 'cell_type', 'replicate'
    var: 'protein_id', 'gene_name'
>>> adata.obs['cell_type'].unique()
['Progenitor', 'ProE&EBaso', 'LBaso', 'Poly', 'Ortho']

Notes

The dataset represents five stages of erythroid differentiation:

  1. Progenitor: CD34+ hematopoietic stem cells

  2. ProE&EBaso: Proerythroblasts and early basophilic erythroblasts

  3. LBaso: Late basophilic erythroblasts

  4. Poly: Polychromatic erythroblasts

  5. Ortho: Orthochromatic erythroblasts

Samples collected at day 7 (_D7) are filtered out during processing.

Reference

Karayel Ö, Xu P, Bludau I, Velan Bhoopalan S, Yao Y, Ana Rita FC, Santos A, Schulman BA, Alpi AF, Weiss MJ, and Mann M. Integrative proteomics reveals principles of dynamic phosphosignaling networks in human erythropoiesis. Molecular Systems Biology, 2020. URL: https://doi.org/10.15252/msb.20209813, doi:10.15252/msb.20209813.