.hclustv_cluster_ann

proteopy.tl.hclustv_cluster_ann(adata, k, linkage_key='auto', values_key='auto', inplace=True, key_added=None, verbose=True)[source]

Annotate variables with cluster assignments from hierarchical clustering.

Uses scipy.cluster.hierarchy.fcluster() to cut the dendrogram at k clusters and stores cluster assignments in .var.

Parameters:
  • adata (AnnData) – AnnData with hierarchical clustering results stored in .uns (from hclustv_tree()).

  • k (int) – Number of clusters to generate (required).

  • linkage_key (str) – Key in adata.uns containing the linkage matrix. When 'auto', auto-detects the linkage key if exactly one 'hclustv_linkage;...' key exists. When multiple keys are present, must be specified explicitly.

  • values_key (str) – Key in adata.uns containing the values DataFrame. When 'auto', auto-detects the values key if exactly one 'hclustv_values;...' key exists. When multiple keys are present, must be specified explicitly.

  • inplace (bool) – If True, store results in adata.var and return None. If False, return a modified copy of adata.

  • key_added (str | None) – Custom key for storing results in adata.var. When None, uses the default format 'hclustv_cluster;<group_by>;<var_hash>;<layer>' derived from the linkage key components.

  • verbose (bool) – Print storage location key after computation.

Returns:

If inplace=True, returns None. If inplace=False, returns a copy of adata with cluster annotations stored in .var.

Return type:

AnnData | None

Raises:
  • ValueError – If no hierarchical clustering results are found in adata.uns. If multiple clustering results exist and linkage_key is not specified. If linkage matrix has invalid shape. If k < 2 (single cluster is semantically meaningless). If auto-generated storage key cannot be derived from a custom linkage key.

  • TypeError – If linkage matrix is not a numpy array.

  • KeyError – If specified linkage_key is not found in adata.uns.

Notes

Cluster assignments are stored at adata.var['hclustv_cluster;<group_by>;<var_hash>;<layer>'] Variables not included in the clustering (e.g., filtered out due to NaN values) will have NaN in this column.

Examples

>>> import proteopy as pr
>>> adata = pr.datasets.karayel_2020()
>>> pr.tl.hclustv_tree(
...     adata, group_by="condition", selected_vars=adata.vars[0:1000]
... )
>>> pr.tl.hclustv_cluster_ann(adata, 5)

Access cluster assignments:

>>> adata.var['hclustv_cluster;condition;a1b2c3d4;X']