local_homophily
Table of contents
- Overview
- Function Signature
- Parameters
- Returns
- Mathematical Definition
- Usage Examples
- Implementation Details
Overview
The local_homophily
function computes the local p-hop homophily for each node in a graph. Local homophily measures how similar a node’s features are to its p-hop neighbors with respect to class labels. This metric is crucial for understanding the information flow in graph neural networks and identifying homophilic bottlenecks.
Function Signature
def local_homophily(
p: int,
g: dgl.DGLGraph,
y: Optional[torch.Tensor] = None,
self_loops: bool = False,
do_hp: bool = False,
fix_d: bool = True,
sym: bool = False,
device: Union[str, torch.device] = 'cpu'
) -> torch.Tensor
Parameters
Parameter | Type | Description |
---|---|---|
p | int | The ‘order’ of homophily (number of hops) |
g | dgl.DGLGraph | Input graph |
y | Optional[torch.Tensor] | Node labels of shape (n_nodes,); if None will use g.ndata[‘label’] |
self_loops | bool | Whether to include self-loops in adjacency |
do_hp | bool | Whether to compute higher-order polynomial version (I - A) |
fix_d | bool | If True, row-normalize adjacency (D^{-1}A) |
sym | bool | Whether to symmetrize adjacency (A <- A + A^T) |
device | Union[str, torch.device] | Device to perform computation on |
Returns
Return Type | Description |
---|---|
torch.Tensor | Tensor of shape (n_nodes,) containing the local homophily scores for each node |
Mathematical Definition
For a node i, the local p-homophily is defined as:
\[h^{(p)}_i = \sum_{c=1}^{C} \left( \sum_{j: y_j = c} \hat{A}^p_{ij} \right)^2\]where:
- $\hat{A}$ is the normalized adjacency matrix
- $p$ is the number of hops
- $y_j$ is the class label of node j
- $C$ is the number of classes
This measure quantifies how much information from nodes of the same class can reach a target node through p-hop paths.
Usage Examples
Basic Usage
import torch
import dgl
from bridge.utils import local_homophily
# Load a dataset
dataset = dgl.data.CoraGraphDataset()
g = dataset[0]
# Compute 2-hop local homophily for each node
homophily_scores = local_homophily(p=2, g=g)
# Print average homophily
print(f"Average 2-hop homophily: {homophily_scores.mean().item():.4f}")
# Find nodes with lowest homophily (bottlenecks)
low_homophily_nodes = torch.argsort(homophily_scores)[:10]
print(f"Nodes with lowest homophily: {low_homophily_nodes}")
Using High-Pass Filter
# Compute high-pass filter version (I - A) homophily
hp_homophily_scores = local_homophily(p=2, g=g, do_hp=True)
# Compare with standard homophily
standard_homophily = local_homophily(p=2, g=g, do_hp=False)
# Print the difference
diff = hp_homophily_scores - standard_homophily
print(f"Mean difference (high-pass - standard): {diff.mean().item():.4f}")
Using Custom Labels
import torch
# Create custom/predicted labels
n_nodes = g.num_nodes()
custom_labels = torch.randint(0, 3, (n_nodes,)) # 3 classes
# Compute homophily with respect to these labels
custom_homophily = local_homophily(p=2, g=g, y=custom_labels)
# Compare with homophily based on true labels
true_homophily = local_homophily(p=2, g=g) # Uses g.ndata['label']
print(f"Custom labels homophily: {custom_homophily.mean().item():.4f}")
print(f"True labels homophily: {true_homophily.mean().item():.4f}")
Comparison Across Multiple Hop Distances
# Compute homophily at different hop distances
homophily_1hop = local_homophily(p=1, g=g)
homophily_2hop = local_homophily(p=2, g=g)
homophily_3hop = local_homophily(p=3, g=g)
# Print averages
print(f"1-hop homophily: {homophily_1hop.mean().item():.4f}")
print(f"2-hop homophily: {homophily_2hop.mean().item():.4f}")
print(f"3-hop homophily: {homophily_3hop.mean().item():.4f}")
Implementation Details
The local_homophily
function implements the following algorithm:
- Build Sparse Adjacency Matrix:
- Converts the DGL graph to a sparse adjacency matrix
- Optionally adds self-loops or symmetrizes the matrix
- Normalizes the adjacency matrix using D^{-1/2}AD^{-1/2}
- Apply High-Pass Filter (if
do_hp=True
):- Transforms the adjacency matrix to I - A
- This emphasizes differences between nodes rather than similarities
- Compute One-Hot Label Matrix:
- Creates a matrix M where M[i,c] = 1 if node i has class c, otherwise 0
- Compute (A^p)M:
- Raises the adjacency matrix to power p
- Multiplies by the label matrix
- This gives the influence from each class on each node through p-hop paths
- Compute Homophily Scores:
- For each node i, computes the sum of squares of influences from each class
- This measures how much influence comes from nodes of the same class
The implementation is optimized for sparse graphs and can handle large networks efficiently.