Feature Generation

Overview
Feature Generation Functions
1. generate_features
2. create_feature_generator
Implementation Details
1. Feature Generation Process
2. Covariance Matrix Structure
Usage in Sensitivity Analysis
1. Example with SNR Estimation
2. Example with Comprehensive Experiment
Relationship to SNR

Overview

The feature generation module provides functions for generating synthetic node features with controlled covariance structures between classes, global shifts, and node-specific noise. These functions are essential for sensitivity analysis and SNR estimation in graph neural networks.

Feature Generation Functions

generate_features

def generate_features(
    num_nodes: int, 
    num_features: int, 
    labels: np.ndarray, 
    inter_class_cov: np.ndarray, 
    intra_class_cov: np.ndarray, 
    global_cov: np.ndarray, 
    noise_cov: np.ndarray, 
    mu_repeats: int = 1
) -> np.ndarray

Generates synthetic node features with controlled covariance structure. This function implements the feature generation model from the paper: X_i = μ_{y_i} + γ + ε_i.

Parameters

Parameter	Type	Description
`num_nodes`	int	Number of nodes
`num_features`	int	Number of feature dimensions
`labels`	np.ndarray	Node class labels (numpy array)
`inter_class_cov`	np.ndarray	Covariance matrix between different classes
`intra_class_cov`	np.ndarray	Covariance matrix within the same class
`global_cov`	np.ndarray	Covariance matrix for the global shift
`noise_cov`	np.ndarray	Covariance matrix for the node-specific noise
`mu_repeats`	int	Number of feature realizations to generate for each class mean

Returns

Return Type	Description
np.ndarray	A numpy array of shape (num_nodes, num_features, mu_repeats) containing the generated features

Mathematical Model

The feature generation model is:

\[X_i = \mu_{y_i} + \gamma + \varepsilon_i\]

where:

$\mu_{y_i}$ is the class-specific mean for the class of node i
$\gamma$ is a global random vector (same for all nodes)
$\varepsilon_i$ is node-specific noise

Example

import numpy as np
from bridge.sensitivity import generate_features

# Define class labels for 10 nodes (3 classes)
labels = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])

# Define simple covariance matrices for a 3-feature scenario
feature_dim = 3
intra_class_cov = 0.1 * np.eye(feature_dim)   # Strong within-class correlation
inter_class_cov = -0.05 * np.eye(feature_dim)  # Negative between-class correlation
global_cov = np.eye(feature_dim)              # Unit global variation
noise_cov = 0.5 * np.eye(feature_dim)         # Moderate node-specific noise

# Generate features (10 nodes, 3 features, 5 realizations)
features = generate_features(
    num_nodes=10,
    num_features=feature_dim,
    labels=labels,
    inter_class_cov=inter_class_cov,
    intra_class_cov=intra_class_cov,
    global_cov=global_cov,
    noise_cov=noise_cov,
    mu_repeats=5
)

print(f"Generated features shape: {features.shape}")  # Should be (10, 3, 5)

# Analyze features for the first realization
first_realization = features[:, :, 0]
print("Features for the first realization:")
print(first_realization)

# Compute mean feature vector for each class
for class_id in range(3):
    class_mask = (labels == class_id)
    class_features = first_realization[class_mask]
    class_mean = np.mean(class_features, axis=0)
    print(f"Class {class_id} mean feature vector: {class_mean}")

create_feature_generator

def create_feature_generator(
    sigma_intra: torch.Tensor,
    sigma_inter: torch.Tensor,
    tau: torch.Tensor,
    eta: torch.Tensor,
    dtype: torch.dtype = torch.float64
) -> Callable

Creates a feature generator function with fixed covariance parameters. This is a factory function that returns another function with a specific signature required for SNR estimation.

Parameters

Parameter	Type	Description
`sigma_intra`	torch.Tensor	Intra-class covariance matrix
`sigma_inter`	torch.Tensor	Inter-class covariance matrix
`tau`	torch.Tensor	Global shift covariance matrix
`eta`	torch.Tensor	Noise covariance matrix
`dtype`	torch.dtype	Torch data type for the output tensor

Returns

Return Type	Description
Callable	A function that generates features with signature: feature_generator(num_nodes, in_feats, labels, num_mu_samples)

Returned Function Signature

def feature_generator_fixed(
    num_nodes: int, 
    in_feats: int, 
    labels: torch.Tensor, 
    num_mu_samples: int = 1
) -> torch.Tensor

Example

import torch
from bridge.sensitivity import create_feature_generator

# Create covariance matrices
in_feats = 5
sigma_intra = 0.1 * torch.eye(in_feats)      # Strong intra-class correlation
sigma_inter = -0.05 * torch.eye(in_feats)    # Negative inter-class correlation
tau = torch.eye(in_feats)                    # Unit global variation
eta = 0.5 * torch.eye(in_feats)              # Moderate node-specific noise

# Create feature generator with these covariance matrices
feature_generator = create_feature_generator(
    sigma_intra=sigma_intra,
    sigma_inter=sigma_inter,
    tau=tau,
    eta=eta
)

# Use the generator to create features
num_nodes = 100
labels = torch.randint(0, 3, (num_nodes,))  # Random labels with 3 classes
features = feature_generator(
    num_nodes=num_nodes,
    in_feats=in_feats,
    labels=labels,
    num_mu_samples=10
)

print(f"Generated features shape: {features.shape}")  # Should be (100, 5, 10)

Implementation Details

Feature Generation Process

The feature generation follows these steps:

Class Mean Generation:
- A large covariance matrix is constructed with block structure
- Intra-class covariance (diagonal blocks): Controls similarity between nodes of same class
- Inter-class covariance (off-diagonal blocks): Controls similarity between nodes of different classes
- Class means are drawn from this multivariate normal distribution
Global Shift Generation:
- A global random vector is generated for each repeat
- This vector is shared across all nodes
- Controlled by the global covariance matrix (tau)
Node-Specific Noise:
- Independent random noise is generated for each node
- Controlled by the noise covariance matrix (eta)
Feature Combination:
- The final features are the sum of class means, global shift, and node-specific noise
- The process is repeated for mu_repeats times to create multiple realizations

Covariance Matrix Structure

The covariance matrices control different aspects of the feature distribution:

sigma_intra (Intra-class covariance): Controls how similar nodes of the same class are to each other.
- Higher values → stronger class signal
- Typically positive for clear class separation
sigma_inter (Inter-class covariance): Controls how similar nodes of different classes are to each other.
- Negative values → class anti-correlation
- Typically negative or zero for clearer class boundaries
tau (Global covariance): Controls shared variations across all nodes.
- Higher values → more dataset-wide noise
- Affects all classes simultaneously
eta (Node-specific covariance): Controls individual node variations.
- Higher values → more per-node noise
- Affects class separability

Usage in Sensitivity Analysis

The feature generation functions are primarily used in sensitivity analysis to:

SNR Estimation: Generate multiple feature realizations for Monte Carlo SNR estimation
Model Training: Create training data with controlled properties
Controllable Experiments: Investigate the effect of feature properties on model performance

Example with SNR Estimation

from bridge.sensitivity import create_feature_generator, estimate_snr_monte_carlo
from bridge.models import LinearGCN

# Create covariance matrices
in_feats = 5
sigma_intra = 0.1 * torch.eye(in_feats)
sigma_inter = -0.05 * torch.eye(in_feats)
tau = torch.eye(in_feats)
eta = torch.eye(in_feats)

# Create feature generator
feature_generator = create_feature_generator(
    sigma_intra=sigma_intra,
    sigma_inter=sigma_inter,
    tau=tau,
    eta=eta
)

# Create a model
model = LinearGCN(in_feats=in_feats, hidden_feats=16, out_feats=3)

# Estimate SNR using Monte Carlo with the feature generator
snr_mc = estimate_snr_monte_carlo(
    model=model,
    graph=g,
    in_feats=in_feats,
    labels=g.ndata['label'],
    num_montecarlo_simulations=100,
    feature_generator=feature_generator,
    device="cuda" if torch.cuda.is_available() else "cpu"
)

Example with Comprehensive Experiment

from bridge.sensitivity import run_sensitivity_experiment, create_feature_generator

# Create feature generator
feature_generator = create_feature_generator(
    sigma_intra=0.1 * torch.eye(in_feats),
    sigma_inter=-0.05 * torch.eye(in_feats),
    tau=torch.eye(in_feats),
    eta=torch.eye(in_feats)
)

# Run comprehensive experiment
results = run_sensitivity_experiment(
    model=model,
    graph=g,
    feature_generator=feature_generator,
    in_feats=in_feats,
    num_acc_repeats=10,
    num_monte_carlo_samples=100,
    sigma_intra=0.1 * torch.eye(in_feats),
    sigma_inter=-0.05 * torch.eye(in_feats),
    tau=torch.eye(in_feats),
    eta=torch.eye(in_feats)
)

Relationship to SNR

The feature generation model directly corresponds to the SNR theoretical framework:

The class means (μ) represent the signal component
The global shift (γ) and node-specific noise (ε) represent the noise components
The covariance parameters control the signal-to-noise ratio:
- Higher intra-class vs. inter-class covariance → Higher SNR
- Lower global and node-specific covariance → Higher SNR

Feature Generation

Table of contents

Overview

Feature Generation Functions

generate_features

Parameters

Returns

Mathematical Model

Example

create_feature_generator

Parameters

Returns

Returned Function Signature

Example

Implementation Details

Feature Generation Process

Covariance Matrix Structure

Usage in Sensitivity Analysis

Example with SNR Estimation

Example with Comprehensive Experiment

Relationship to SNR