Local Community Detection

Contributing a new class of algorithm to NetworkX

February 5, 2025

Network graphs often allow relationships in data to be represented more naturally. In a graph, each node represents an object, and edges describe how these objects are related—like friendships in social networks or interactions between proteins. Finding groups of closely related nodes is essential in understanding these relationships, which is why community detection has become such an important tool. You can imagine finding friendship groups or groups of interacting proteins.

From Global to Local

Most community detection algorithms work globally across the whole network, putting every node into a community. While this works well with smaller static graphs, it becomes challenging when the graphs get large or very dynamic. The processing time and memory requirements can be significant, making finding communities globally infeasible or expensive.

Global Community Detection

However, sometimes we are focused on a single node. For example, when finding which social circle a person belongs to, there is no need to find every community in the graph. Instead, we can just look at the local graph around the node, reducing the complexity significantly. This is where Local Community Detection (LCD) algorithms come into play. They start with a chosen node and incrementally add nodes that are most connected until a predefined cut-off is reached. The processing time can be so quick that it is possible to run this in real-time as an API, even on very large dynamic graphs.

Local Community Detection

Aaron Clauset first proposed an LCD algorithm in 2005. Similarly to many global algorithms, it relies on a modified version of modularity. Modularity compares how many edges exist between a group of nodes, compared to how many you would expect if they were all randomly distributed. If there are more edges than you would expect, it suggests a community. Clauset’s algorithm expands the local community while the modularity gain is positive and stops once the benefit of adding new nodes reduces.

Use Cases

LCD is already used in a number of areas:

  • Targeted Marketing: Quickly identify the most relevant cluster of potential customers around a key influencer in a social graph.
  • Protein-Protein Interactions: Find a small set of proteins that function together without analyzing the entire proteome.
  • Fraud Detection: Spot suspicious communities around certain flagged accounts or transactions in real time.

NetworkX Implementation

Despite the benefits and existing applications, LCD algorithms are still not widely used. Very few open-source network libraries have LCD algorithms implemented, making it difficult to start using and testing them. As a step to wider usage, I recently implemented Clauset’s initial LCD algorithm in NetworkX. You will soon be able to use this as simply as:

G = nx.karate_club_graph()
community = nx.community.greedy_source_expansion(G, source=16)

In this snippet, we pick node 16 as our “source,” and the algorithm identifies the surrounding local community. While NetworkX still requires the entire graph in memory, the LCD process itself uses negligible additional memory and runs exponentially faster than a global approach. Newer LCD algorithms may provide even better accuracy, and I hope to add some of them to NetworkX soon.

Looking Ahead

As data grows larger and more dynamic, analyzing graphs globally is going to become more challenging. By focusing on the local structures around a node of interest, LCD algorithms offer a scalable, efficient approach. With more implementations and continued research, local community detection has the potential to solve a wide range of real-world problems.