Clustering methods

Functions for spectral clustering are in motifcluster.clustering.

cluster_spectrum(spectrum, num_clusts)

Get cluster assignments from spectrum using k-means++.

Get a list of cluster assignments from a spectrum, using k-means++ and num_clusts clusters.

Parameters:
  • spectrum (dict) – A dictionary containing “vects”: the matrix of eigenvectors to pass to k-means++.
  • num_clusts (int) – The number of clusters to find.
Returns:

cluster_assigns – A list of integers from 1 to num_clusts, representing cluster assignments.

Return type:

list of int

run_motif_clustering(adj_mat, motif_name, motif_type='struc', mam_weight_type='unweighted', mam_method='sparse', num_eigs=2, type_lap='comb', num_clusts=2, restrict=True, gr_method='sparse')

Run motif-based clustering.

Run motif-based clustering on the adjacency matrix of a (weighted directed) network, using a specified motif, motif type, weighting scheme, embedding dimension, number of clusters and Laplacian type. Optionally restrict to the largest connected component before clustering.

Parameters:
  • adj_mat (matrix) – Adjacency matrix to be embedded.
  • motif_name (str) – Motif used for the motif adjacency matrix.
  • motif_type (str) – Type of motif adjacency matrix to use. One of “func” or “struc”.
  • mam_weight_type (str) – Weighting scheme for the motif adjacency matrix. One of “unweighted”, “mean” or “product”.
  • mam_method (str) – The method to use for building the motif adjacency matrix. One of “sparse” or “dense”.
  • num_eigs (int) – Number of eigenvalues and eigenvectors for the embedding.
  • type_lap (str) – Type of Laplacian for the embedding. One of “comb” or “rw”.
  • num_clusts (int) – The number of clusters to find.
  • restrict (bool) – Whether or not to restrict the motif adjacency matrix to its largest connected component before embedding.
  • gr_method (str) – Format to use for getting largest component. One of “sparse” or “dense”.
Returns:

  • adj_mat (sparse matrix) – The original adjacency matrix.
  • motif_adj_mat (sparse matrix) – The motif adjacency matrix.
  • comps (list) – The indices of the largest connected component of the motif adjacency matrix (if restrict=True).
  • adj_mat_comps (matrix) – The original adjacency matrix restricted to the largest connected component of the motif adjacency matrix (if restrict=True).
  • motif_adj_mat_comps (matrix) – The motif adjacency matrix restricted to its largest connected component (if restrict=True).
  • vals (list) – A length-num_eigs list containing the eigenvalues associated with the Laplace embedding of the (restricted) motif adjacency matrix.
  • vects (matrix) – A matrix containing the eigenvectors associated with the Laplace embedding of the (restricted) motif adjacency matrix.
  • clusts – A vector containing integers representing the cluster assignment of each vertex in the (restricted) graph.

Examples

>>> adj_mat = np.array(range(1, 10)).reshape((3, 3))
>>> run_motif_clustering(adj_mat, "M1")