.hclustv_tree

proteopy.tl.hclustv_tree(adata, selected_vars=None, group_by=None, summary_method='median', linkage_method='average', distance_metric='euclidean', layer=None, zero_to_na=False, fill_na=None, z_transform=True, inplace=True, key_added=None, verbose=True)[source]

Perform hierarchical clustering on variables (peptides or proteins).

Computes a linkage matrix from variable profiles across samples or groups, storing the result in adata.uns for downstream visualization or analysis.

Parameters:
  • adata (AnnData) – AnnData with proteomics annotations.

  • selected_vars (list[str] | None) – Explicit list of variables to include. When None, all variables are used.

  • group_by (str | None) – Column in adata.obs used to group observations. When provided, computes a summary statistic for each group rather than using individual samples. Grouping can resolve NaN values through aggregation (e.g., median of [1, NaN, 3] = 2).

  • summary_method (str) – Method for computing group summaries when group_by is specified. One of "median" or "mean" (alias "average").

  • linkage_method (str) – Linkage criterion passed to scipy.cluster.hierarchy.linkage(). Common options include "average", "complete", "single", and "ward".

  • distance_metric (str) – Distance metric for clustering. One of "euclidean", "manhattan", or "cosine".

  • layer (str | None) – Optional adata.layers key to draw quantification values from. When None the primary matrix adata.X is used.

  • zero_to_na (bool) – Replace zeros with NaN before computing profiles.

  • fill_na (float | int | None) – Replace NaN values with the specified constant before summary computation.

  • z_transform (bool) – Standardize values to mean 0 and variance 1 per variable before clustering. Variables with zero variance will be set to 0 (the mean) after transformation.

  • inplace (bool) – If True, store results in adata.uns and return None. If False, return a modified copy of adata.

  • key_added (str | None) – Custom key prefix for storing results in adata.uns. When None, uses the default format 'hclustv_linkage;<group_by>;<var_hash>;<layer>'.

  • verbose (bool) – Print storage location keys after computation.

Returns:

If inplace=True, returns None. If inplace=False, returns a copy of adata with clustering results stored in .uns.

Return type:

AnnData | None

Notes

The linkage matrix is stored at adata.uns['hclustv_linkage;<group_by>;<var_hash>;<layer>']. The profile values DataFrame (after all transformations) is stored at adata.uns['hclustv_values;<group_by>;<var_hash>;<layer>'].

The var_hash is the first 8 characters of the MD5 hash of the sorted, semicolon-joined variable names used for clustering. When group_by is None, the field is left empty in the key. When layer is None, 'X' is used in the key.

Examples

>>> import proteopy as pp
>>> adata = pp.datasets.example_peptide_data()
>>> pp.tl.hclustv_tree(adata, group_by="condition")
>>> # Linkage matrix stored in adata.uns['hclustv_linkage;condition;a1b2c3d4;X']