.karayel_2020
- proteopy.datasets.karayel_2020()[source]
Load Karayel 2020 erythropoiesis proteomics dataset.
Download and process the protein-level DIA-MS dataset from Karayel et al. (2020) studying CD34+ hematopoietic stem cell differentiation during erythropoiesis. The dataset contains quantitative proteomics measurements across five cell types representing sequential stages of erythroid development.
The function downloads data from the PRIDE archive (PXD017276), processes sample identifiers, maps technical names to biological cell types, and excludes day 7 samples. Protein quantities marked as ‘Filtered’ in the original data are converted to
np.nan.- Sample annotation (
.obs) includes: sample_id: Unique sample identifier (cell_type_replicate)cell_type: Differentiation stage (Progenitor, ProE&EBaso, LBaso, Poly, Ortho)replicate: Technical replicate identifier
- Variable annotation (
.var) includes: protein_id: Protein group identifier (matches.var_names)gene_name: Associated gene name(s)
- Returns:
AnnData object containing protein-level quantification data.
.Xcontains protein intensities (samples × proteins) with missing values asnp.nan. Day 7 samples are excluded from the dataset.- Return type:
ad.AnnData
- Raises:
urllib.error.URLError – If download from PRIDE archive fails.
Examples
>>> import proteopy as pp >>> adata = pp.datasets.karayel_2020() >>> adata AnnData object with n_obs × n_vars obs: 'sample_id', 'cell_type', 'replicate' var: 'protein_id', 'gene_name'
>>> adata.obs['cell_type'].unique() ['Progenitor', 'ProE&EBaso', 'LBaso', 'Poly', 'Ortho']
Notes
The dataset represents five stages of erythroid differentiation:
Progenitor: CD34+ hematopoietic stem cells
ProE&EBaso: Proerythroblasts and early basophilic erythroblasts
LBaso: Late basophilic erythroblasts
Poly: Polychromatic erythroblasts
Ortho: Orthochromatic erythroblasts
Samples collected at day 7 (_D7) are filtered out during processing.
Reference
Karayel Ö, Xu P, Bludau I, Velan Bhoopalan S, Yao Y, Ana Rita FC, Santos A, Schulman BA, Alpi AF, Weiss MJ, and Mann M. Integrative proteomics reveals principles of dynamic phosphosignaling networks in human erythropoiesis. Molecular Systems Biology, 2020. URL: https://doi.org/10.15252/msb.20209813, doi:10.15252/msb.20209813.
- Sample annotation (