.hclustv_tree
- proteopy.tl.hclustv_tree(adata, selected_vars=None, group_by=None, summary_method='median', linkage_method='average', distance_metric='euclidean', layer=None, zero_to_na=False, fill_na=None, z_transform=True, inplace=True, key_added=None, verbose=True)[source]
Perform hierarchical clustering on variables (peptides or proteins).
Computes a linkage matrix from variable profiles across samples or groups, storing the result in
adata.unsfor downstream visualization or analysis.- Parameters:
adata (AnnData) –
AnnDatawith proteomics annotations.selected_vars (list[str] | None) – Explicit list of variables to include. When
None, all variables are used.group_by (str | None) – Column in
adata.obsused to group observations. When provided, computes a summary statistic for each group rather than using individual samples. Grouping can resolve NaN values through aggregation (e.g., median of [1, NaN, 3] = 2).summary_method (str) – Method for computing group summaries when
group_byis specified. One of"median"or"mean"(alias"average").linkage_method (str) – Linkage criterion passed to
scipy.cluster.hierarchy.linkage(). Common options include"average","complete","single", and"ward".distance_metric (str) – Distance metric for clustering. One of
"euclidean","manhattan", or"cosine".layer (str | None) – Optional
adata.layerskey to draw quantification values from. WhenNonethe primary matrixadata.Xis used.zero_to_na (bool) – Replace zeros with
NaNbefore computing profiles.fill_na (float | int | None) – Replace
NaNvalues with the specified constant before summary computation.z_transform (bool) – Standardize values to mean 0 and variance 1 per variable before clustering. Variables with zero variance will be set to 0 (the mean) after transformation.
inplace (bool) – If
True, store results inadata.unsand returnNone. IfFalse, return a modified copy ofadata.key_added (str | None) – Custom key prefix for storing results in
adata.uns. WhenNone, uses the default format'hclustv_linkage;<group_by>;<var_hash>;<layer>'.verbose (bool) – Print storage location keys after computation.
- Returns:
If
inplace=True, returnsNone. Ifinplace=False, returns a copy ofadatawith clustering results stored in.uns.- Return type:
AnnData | None
Notes
The linkage matrix is stored at
adata.uns['hclustv_linkage;<group_by>;<var_hash>;<layer>']. The profile values DataFrame (after all transformations) is stored atadata.uns['hclustv_values;<group_by>;<var_hash>;<layer>'].The
var_hashis the first 8 characters of the MD5 hash of the sorted, semicolon-joined variable names used for clustering. Whengroup_byisNone, the field is left empty in the key. WhenlayerisNone,'X'is used in the key.Examples
>>> import proteopy as pp >>> adata = pp.datasets.example_peptide_data() >>> pp.tl.hclustv_tree(adata, group_by="condition") >>> # Linkage matrix stored in adata.uns['hclustv_linkage;condition;a1b2c3d4;X']