Network clustering: 50 years and still going!

Maximilien Dreveton (EPFL, Switzerland), Daniel R. Figueiredo (UFRJ, Brazil)

Tutorial presented at ACM Sigmetrics / IFIP Performance on June 10, 2024

Duration: 3 hours

Abstract: Networks are arguably the most fundamental model to represent relationships between objects and have been adopted by different disciplines to represent various systems and phenomena. From social networks that encode different interactions among people, to the power grid and the Internet that represents how electrons and bits are transferred, to the connectome and protein interaction networks that encode neural and protein activity in the brain and the cell, respectively. Finding sets of similar nodes in a network, known as network clustering or community detection, is one of the most fundamental problems in network analysis. In particular, node clusters often reveal different latent information such as node function or node type with applications in various domains. Not surprisingly, network clustering has been studied for over 50 years in various scenarios and approaches.

Most advances within network clustering can be broadly categorized into either information-theoretic or algorithmic. The former is concerned with establishing conditions under which cluster detection is possible. This often requires a clean mathematical model for generating networks with clusters and provides thresholds (as a function of model parameters) for partial and exact recovery of the network clusters as the network grows large. Algorithmic advances are concerned with the design of algorithms that recover network clusters from data. To be useful, such algorithms must be efficient (in terms of computational complexity) and accurate (in terms of identifying network clusters).

Motivated by the increasing amounts of data, nodes, and edges of modern networks often have various attributes that significantly enhance the information encoded by the network (much beyond just the network structure). This relatively recent observation has led to novel advances in information-theoretic and algorithmic results further broadening the problem of network clustering.

This tutorial provides a comprehensive overview of network clustering over the past 50 years, covering fundamental information-theoretic and algorithmic results. Theoretical thresholds for partial and exact recovery of clusters in the classic Stochastic Block Model and in the more recent Contextual Stochastic Block Model (that handles attributes) will be presented. Classic algorithms based on spectral methods, statistical inference, modularity-maximization, and more recent algorithms that leverage Graph Neural Networks, will also be presented. The tutorial will also provide an outlook for topics not covered, such as geometric models and semi-supervised network clustering.

Slides: Tutorial_Network_Clustering.pdf

Videos:

Part I: Introduction to network clustering (Daniel)

Part II: How to detect network clusters? (Daniel)

Part III: When can network clusters be identified? (Maximilien)

Part IV: What was not covered in this tutorial? (Maximilien)

Playlist on YouTube