{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# `proteodata`: the ProteoPy data format\n", "\n", "This tutorial explains the **proteodata** convention — the set of assumptions that every ProteoPy function relies on when working with an `AnnData` object. You will learn:\n", "\n", "- What constitutes a valid proteodata object at the protein level and at the peptide level\n", "- How to construct proteodata from scratch\n", "- How to use `is_proteodata()` and `check_proteodata()` to validate your data\n", "- Common pitfalls that break the format, and how to avoid them\n", "\n", "**Prerequisites:** Basic familiarity with `AnnData` (observations, variables, `.obs`, `.var`, `.X`)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from anndata import AnnData\n", "import proteopy as pr\n", "\n", "from proteopy.utils.anndata import is_proteodata, check_proteodata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 1. Anatomy of a proteodata object\n", "\n", "ProteoPy stores proteomics data in the [AnnData](https://anndata.readthedocs.io/) format, where:\n", "\n", "| Slot | Content |\n", "|------|---------|\n", "| `.X` | Intensity matrix (samples x proteins/peptides), containing `float`, `int`, or `np.nan` |\n", "| `.obs` | Sample metadata — **must** include a `sample_id` column |\n", "| `.var` | Protein / peptide metadata — **must** include `protein_id` (and `peptide_id` for peptide-level data) |\n", "| `.obs_names` | Sample index — must be unique |\n", "| `.var_names` | Protein/peptide index — must be unique |\n", "\n", "The `is_proteodata()` function checks all of these assumptions and returns a tuple: `(True, \"protein\")`, `(True, \"peptide\")`, or `(False, None)`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 2. Building a valid protein-level proteodata" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3 × 4\n", " obs: 'sample_id'\n", " var: 'protein_id'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# -- Sample metadata --\n", "sample_ids = [\"sample_A\", \"sample_B\", \"sample_C\"]\n", "obs = pd.DataFrame({\"sample_id\": sample_ids}, index=sample_ids)\n", "\n", "# -- Protein metadata --\n", "protein_ids = [\"P12345\", \"Q67890\", \"O11223\", \"P44556\"]\n", "var = pd.DataFrame({\"protein_id\": protein_ids}, index=protein_ids)\n", "\n", "# -- Intensity matrix (3 samples x 4 proteins) --\n", "X = np.array([\n", " [100.0, 200.0, 50.0, 300.0],\n", " [110.0, np.nan, 55.0, 280.0],\n", " [ 95.0, 210.0, 48.0, 310.0],\n", "])\n", "\n", "adata_protein = AnnData(X=X, obs=obs, var=var)\n", "adata_protein" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, 'protein')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "is_proteodata(adata_protein)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The key rules for protein-level proteodata:\n", "\n", "1. `.var[\"protein_id\"]` must exist\n", "2. `.var[\"protein_id\"]` values must exactly match `.var_names` (same values, same order)\n", "3. `.obs[\"sample_id\"]` must exist\n", "4. All indices must be unique\n", "5. `.X` must not contain `np.inf` or `-np.inf`\n", "6. `protein_id` must not contain `NaN`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 3. Building a valid peptide-level proteodata\n", "\n", "Peptide-level data requires an additional `peptide_id` column in `.var`. Each peptide must map to exactly **one** protein." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, 'peptide')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# -- Sample metadata --\n", "sample_ids = [\"sample_A\", \"sample_B\"]\n", "obs = pd.DataFrame({\"sample_id\": sample_ids}, index=sample_ids)\n", "\n", "# -- Peptide metadata --\n", "# Two peptides from PROT_X and one from PROT_Y\n", "peptide_ids = [\"PEPTIDE_1\", \"PEPTIDE_2\", \"PEPTIDE_3\"]\n", "protein_ids = [\"PROT_X\", \"PROT_X\", \"PROT_Y\"]\n", "var = pd.DataFrame(\n", " {\"peptide_id\": peptide_ids, \"protein_id\": protein_ids},\n", " index=peptide_ids,\n", ")\n", "\n", "# -- Intensity matrix (2 samples x 3 peptides) --\n", "X = np.array([\n", " [500.0, 300.0, 800.0],\n", " [520.0, 310.0, 790.0],\n", "])\n", "\n", "adata_peptide = AnnData(X=X, obs=obs, var=var)\n", "is_proteodata(adata_peptide)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The additional rules for peptide-level proteodata:\n", "\n", "1. `.var[\"peptide_id\"]` must exist and match `.var_names` exactly\n", "2. `.var[\"protein_id\"]` must exist — every peptide needs a parent protein\n", "3. Each peptide maps to **exactly one** protein (no multi-mapping like `\"PROT_A;PROT_B\"`)\n", "4. Neither `peptide_id` nor `protein_id` may contain `NaN`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 4. `is_proteodata` vs `check_proteodata`\n", "\n", "ProteoPy provides two validation functions:\n", "\n", "| Function | On failure | Use case |\n", "|----------|------------|----------|\n", "| `is_proteodata(adata)` | Returns `(False, None)` | Conditional logic — \"is this proteodata?\" |\n", "| `check_proteodata(adata)` | Raises `ValueError` | Guard clauses — \"this *must* be proteodata\" |\n", "\n", "`is_proteodata` also accepts `raise_error=True` to behave like `check_proteodata`.\n", "\n", "Both accept a `layers` parameter to additionally validate layer matrices for infinite values." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Valid: True, Level: protein\n" ] } ], "source": [ "# is_proteodata: soft check — returns a tuple\n", "result = is_proteodata(adata_protein)\n", "print(f\"Valid: {result[0]}, Level: {result[1]}\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Validation passed!\n" ] } ], "source": [ "# check_proteodata: hard check — raises on failure\n", "try:\n", " check_proteodata(adata_protein)\n", " print(\"Validation passed!\")\n", "except ValueError as e:\n", " print(f\"Validation failed: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 5. Pitfalls: how valid proteodata can break\n", "\n", "Even if you start with a valid proteodata object, common operations can silently break the format. We define a single `adata` below and work with deep copies (`adata.copy()`) in each example so the original remains untouched." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting point: (True, 'protein')\n" ] }, { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3 × 4\n", " obs: 'sample_id'\n", " var: 'protein_id'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# One adata for all pitfall demonstrations (3 samples x 4 proteins)\n", "obs = pd.DataFrame(\n", " {\"sample_id\": [\"s1\", \"s2\", \"s3\"]},\n", " index=[\"s1\", \"s2\", \"s3\"],\n", ")\n", "proteins = [\"PROT_A\", \"PROT_B\", \"PROT_C\", \"PROT_D\"]\n", "var = pd.DataFrame(\n", " {\"protein_id\": proteins},\n", " index=proteins,\n", ")\n", "X = np.array([\n", " [100.0, 0.0, 50.0, 200.0],\n", " [200.0, 50.0, 50.0, 300.0],\n", " [150.0, 80.0, 50.0, 250.0],\n", "])\n", "\n", "adata = AnnData(X=X, obs=obs, var=var)\n", "print(\"Starting point:\", is_proteodata(adata))\n", "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1 Renaming an index without updating the corresponding metadata column\n", "\n", "If you rename an index such as `.var_names` in a protein-level proteodata object, the `protein_id` column no longer matches (a proteodata format requirement)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "After: (False, None)\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "# Rename the index to gene names\n", "adata_mod.var_names = [\"GeneA\", \"GeneB\", \"GeneC\", \"GeneD\"]\n", "\n", "print(\"After:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `protein_id` column still contains `[\"PROT_A\", \"PROT_B\", \"PROT_C\", \"PROT_D\"]`, but the index now says `[\"GeneA\", \"GeneB\", \"GeneC\", \"GeneD\"]`. To fix this, always update **both** the index and the ID column together." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Repaired: (True, 'protein')\n" ] } ], "source": [ "# Update protein_id to match the new index\n", "adata_mod.var[\"protein_id\"] = adata_mod.var_names\n", "\n", "print(\"Repaired:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same applies to renaming:\n", "- `.var_names` of a peptide-level proteodata object, where `.var[\"peptide_id\"]` must be kept in sync.\n", "- `.obs_names`, where `.obs[\"sample_id\"]` must be kept in sync." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2 Renaming `protein_id`, `peptide_id`, or `sample_id` without updating the corresponding index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Conversely, if `protein_id` or `peptide_id` are modified, `.var_names` must be updated to match. The same holds for `sample_id` and `.obs_names`. Here we demonstrate with `sample_id`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "obs_names: ['s1', 's2', 's3']\n", "sample_id: ['new_s1', 'new_s2', 'new_s3']\n", "After: (False, None)\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "# Rename sample_id — obs_names is now stale\n", "adata_mod.obs[\"sample_id\"] = [\"new_s1\", \"new_s2\", \"new_s3\"]\n", "\n", "print(\"obs_names:\", list(adata_mod.obs_names))\n", "print(\"sample_id:\", list(adata_mod.obs[\"sample_id\"]))\n", "\n", "print(\"After:\", is_proteodata(adata_mod))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "obs_names: ['new_s1', 'new_s2', 'new_s3']\n", "sample_id: ['new_s1', 'new_s2', 'new_s3']\n", "Repaired: (True, 'protein')\n" ] } ], "source": [ "# Update obs_names to match\n", "adata_mod.obs_names = [\"new_s1\", \"new_s2\", \"new_s3\"]\n", "\n", "print(\"obs_names:\", list(adata_mod.obs_names))\n", "print(\"sample_id:\", list(adata_mod.obs[\"sample_id\"]))\n", "\n", "print(\"Repaired:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.3 Infinite values from mathematical operations\n", "\n", "Infinite values can easily arise from common data transformations. A typical example is log-transforming data that contains zeros:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Current X:\n", "[[100. 0. 50. 200.]\n", " [200. 50. 50. 300.]\n", " [150. 80. 50. 250.]]\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "# Our matrix contains a zero in PROT_B (from the initial setup)\n", "print(\"Current X:\")\n", "print(adata_mod.X)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Before log2: (True, 'protein')\n" ] } ], "source": [ "print(\"\\nBefore log2:\", is_proteodata(adata_mod))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Matrix after log2:\n", "[[6.64385619 -inf 5.64385619 7.64385619]\n", " [7.64385619 5.64385619 5.64385619 8.22881869]\n", " [7.22881869 6.32192809 5.64385619 7.96578428]]\n", "\n", "After log2: (False, None)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_2256571/3414031841.py:2: RuntimeWarning: divide by zero encountered in log2\n", " adata_mod.X = np.log2(adata_mod.X)\n" ] } ], "source": [ "# log2(0) = -inf!\n", "adata_mod.X = np.log2(adata_mod.X)\n", "print(\"Matrix after log2:\")\n", "print(adata_mod.X)\n", "\n", "print(\"\\nAfter log2:\", is_proteodata(adata_mod))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AnnData.X contains infinite values (np.inf or -np.inf). Please remove or replace infinite values before proceeding.\n" ] } ], "source": [ "# The detailed error message tells you what went wrong\n", "try:\n", " is_proteodata(adata_mod, raise_error=True)\n", "except ValueError as e:\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Valid again: (True, 'protein')\n" ] } ], "source": [ "# Fix: replace zeros with nan before log-transforming\n", "adata_mod = adata.copy()\n", "adata_mod.X[adata_mod.X == 0] = np.nan\n", "adata_mod.X = np.log2(adata_mod.X) # log2(NaN) = NaN — safe\n", "\n", "print(\"\\nValid again:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another source of division by zero is normalization. For example, variance scaling divides each protein's intensities by its standard deviation. If a protein has identical intensities across all samples, its standard deviation is zero:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Per-protein std: [40.82482905 32.99831646 0. 40.82482905]\n", "PROT_C has std = 0 — division by zero!\n", "\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "# Variance scaling: x / std (per protein)\n", "stds = np.std(adata_mod.X, axis=0)\n", "print(\"Per-protein std:\", stds)\n", "print(\"PROT_C has std = 0 — division by zero!\\n\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Scaled X:\n", "[[2.44948974 0. inf 4.89897949]\n", " [4.89897949 1.51522882 inf 7.34846923]\n", " [3.67423461 2.42436611 inf 6.12372436]]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_2256571/3963694927.py:1: RuntimeWarning: divide by zero encountered in divide\n", " adata_mod.X = adata_mod.X / stds\n" ] } ], "source": [ "adata_mod.X = adata_mod.X / stds\n", "\n", "print(\"Scaled X:\")\n", "print(adata_mod.X)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "After var-scaling: (False, None)\n" ] } ], "source": [ "print(\"After var-scaling:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One fix is to remove zero-variance proteins before scaling:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Removed 1 variables.\n", "\n", "Subset and scaled X:\n", "[[2.44948974 0. 4.89897949]\n", " [4.89897949 1.51522882 7.34846923]\n", " [3.67423461 2.42436611 6.12372436]]\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "pr.pp.remove_zero_variance_vars(adata_mod)\n", "\n", "stds = np.std(adata_mod.X, axis=0)\n", "adata_mod.X = adata_mod.X / stds\n", "\n", "print(\"\\nSubset and scaled X:\")\n", "print(adata_mod.X)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Valid again: (True, 'protein')\n" ] } ], "source": [ "print(\"Valid again:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **Note** \n", "> Z-score normalization with zero-variance variables yields `np.nan`, which is allowed in the proteodata format." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Normalized X:\n", "[[-1.22474487 -1.31319831 nan -1.22474487]\n", " [ 1.22474487 0.20203051 nan 1.22474487]\n", " [ 0. 1.1111678 nan 0. ]]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_2256571/1869068870.py:6: RuntimeWarning: invalid value encountered in divide\n", " adata_mod.X = (adata_mod.X - means) / stds\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "# Z-score normalization: x / std (per protein)\n", "stds = np.std(adata_mod.X, axis=0)\n", "means = np.mean(adata_mod.X, axis=0)\n", "adata_mod.X = (adata_mod.X - means) / stds\n", "\n", "print(\"Normalized X:\")\n", "print(adata_mod.X)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "is_proteodata: (True, 'protein')\n" ] } ], "source": [ "print(\"\\nis_proteodata:\", is_proteodata(adata_mod))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROT_C is now entirely NaN (0 / 0). While NaN passes proteodata validation, the values are meaningless.\n", "With a nonzero numerator, division by zero would produce inf and fail validation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 6. Validating layers\n", "\n", "ProteoPy functions sometimes store intermediate results in `adata.layers` (e.g., raw intensities before transformation). Both `is_proteodata` and `check_proteodata` accept a `layers` parameter to validate those matrices as well.\n", "\n", "We continue with a fresh copy of the same `adata` from section 5." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Without layers check: (True, 'protein')\n" ] } ], "source": [ "adata_mod = adata.copy()\n", "\n", "print(\"Without layers check:\", is_proteodata(adata_mod))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ignoring layers: (True, 'protein')\n" ] } ], "source": [ "# Add a layer with an infinite value\n", "bad_layer = adata_mod.X.copy()\n", "bad_layer[0, 0] = np.inf\n", "adata_mod.layers[\"transformed\"] = bad_layer\n", "\n", "# Still passes without the layers parameter\n", "print(\"Ignoring layers: \", is_proteodata(adata_mod))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checking layer: (False, None)\n", "adata.layers['transformed'] contains infinite values (np.inf or -np.inf). Please remove or replace infinite values before proceeding.\n" ] } ], "source": [ "# Fails when we explicitly check that layer\n", "print(\"Checking layer: \", is_proteodata(adata_mod, layers=\"transformed\"))\n", "\n", "try:\n", " check_proteodata(adata_mod, layers=\"transformed\")\n", "except ValueError as e:\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "adata.layers['transformed'] contains infinite values (np.inf or -np.inf). Please remove or replace infinite values before proceeding.\n" ] } ], "source": [ "# You can check multiple layers at once\n", "adata_mod.layers[\"raw\"] = adata_mod.X.copy() # this one is fine\n", "\n", "try:\n", " check_proteodata(adata_mod, layers=[\"raw\", \"transformed\"])\n", "except ValueError as e:\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 7. Quick-reference checklist\n", "\n", "Use this checklist when constructing or debugging a proteodata object:\n", "\n", "| # | Check | Applies to |\n", "|---|-------|------------|\n", "| 1 | `.obs[\"sample_id\"]` exists | Both |\n", "| 2 | `.var[\"protein_id\"]` exists | Both |\n", "| 3 | `.var[\"peptide_id\"]` exists and matches `.var_names` | Peptide only |\n", "| 4 | `.var[\"protein_id\"]` matches `.var_names` | Protein only |\n", "| 5 | Each peptide maps to exactly one protein | Peptide only |\n", "| 6 | No `NaN` in `protein_id` or `peptide_id` | Both |\n", "| 7 | No `np.inf` or `-np.inf` in `.X` or checked layers | Both |\n", "| 8 | All indices (obs, var) are unique | Both |\n", "| 9 | `.X` is 2-dimensional | Both |\n", "| 10 | `protein_id`/`peptide_id` not in `.obs`; `sample_id` not in `.var` | Both |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Summary\n", "\n", "The proteodata format is a thin but strict contract on top of AnnData that ensures ProteoPy functions can safely assume:\n", "\n", "- **Who are the samples?** — defined by `.obs[\"sample_id\"]`\n", "- **What are the variables?** — defined by `.var[\"protein_id\"]` (protein-level) or `.var[\"peptide_id\"]` + `.var[\"protein_id\"]` (peptide-level)\n", "- **Is the data clean?** — no infinite values, no NaN identifiers, no duplicates\n", "\n", "Always validate with `check_proteodata()` after constructing or modifying an AnnData. ProteoPy does this automatically in every public function, so you will get a clear error message if something is wrong." ] } ], "metadata": { "kernelspec": { "display_name": "proteopy-usage1", "language": "python", "name": "proteopy-usage1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.9" } }, "nbformat": 4, "nbformat_minor": 4 }