Distributed SPI Verification

Verify distributed tensor-parallel inference with deterministic colors

Gay.jl extends Strong Parallelism Invariance (SPI) to distributed systems, enabling verification of:

Tensor parallelism: Vocabulary/hidden dimension sharding across GPUs
Pipeline parallelism: Layer sharding across devices
Data parallelism: Sequence/batch sharding
Exo clusters: Memory-weighted ring partitioning across MacBooks

The Problem

Distributed inference can silently corrupt data through:

Bit flips in memory or network transmission
Race conditions in AllGather/AllReduce operations
Pipeline handoff errors between stages
Floating-point non-determinism across devices

Traditional approaches require gathering all data to verify correctness — expensive and defeats the purpose of distribution.

The Solution: XOR Fingerprints

Gay.jl uses XOR fingerprints that are:

Associative: fp(A ∪ B) = fp(A) ⊕ fp(B)
Commutative: Order-independent verification
Pre-computable: Know the expected fingerprint BEFORE running inference

using Gay

# Pre-compute expected fingerprint
expected_fp = expected_fingerprint(seed, n_tokens, hidden_dim; layer=5)

# Run distributed inference...
# Each device computes its shard's fingerprint locally

# Verify without gathering
@assert actual_fp == expected_fp  # Single XOR comparison!

Architecture Support

Data Parallelism (Sequence Sharding)

Device 0: tokens[1:64]    → fp₀
Device 1: tokens[65:128]  → fp₁
─────────────────────────────────
Combined: fp₀ ⊕ fp₁ = expected_fp

Tensor Parallelism (Vocabulary Sharding)

using Gay: TensorPartition, verify_allgather

# Each device handles a shard of the vocabulary
partition = TensorPartition(dim=2, n_shards=4, shard_id=rank, global_size=vocab_size)

# Color the logits
color_logits!(logits, partition; seed=GAY_SEED)

# Verify AllGather produced correct result
@assert verify_allgather(gathered_logits, partitions; seed=GAY_SEED)

Pipeline Parallelism (Layer Sharding)

using Gay: verify_pipeline_handoff

# After each pipeline stage handoff
@assert verify_pipeline_handoff(
    activations, 
    from_rank=0, 
    to_rank=1, 
    layer=16;
    seed=GAY_SEED
)

Exo Cluster Integration

For exo clusters running MLX:

using Gay: ExoCluster, ExoVerifier, verify_exo_inference

# Define cluster topology
cluster = ExoCluster([
    ("MacBook Pro", 18.0, "192.168.1.10"),
    ("MacBook Air", 8.0, "192.168.1.11"),
], "olmo-7b")

# Memory-weighted layer assignment:
#   MacBook Pro (18GB): layers 1-22 (69%)
#   MacBook Air (8GB):  layers 23-32 (31%)

# Pre-compute expected fingerprints
verifier = ExoVerifier(cluster, n_tokens=128)

# Verify after inference
@assert verify_exo_inference(cluster, "Hello world"; n_tokens=32)

API Reference

Gay.TensorParallel.TensorPartition — Type

Describes how a tensor is partitioned across ranks.

source

Gay.TensorParallel.ShardedTensor — Type

A tensor shard with its partition metadata and fingerprint.

source

Gay.TensorParallel.DistributedContext — Type

Context for a distributed computation across multiple ranks.

source

Gay.TensorParallel.verify_allgather — Function

verify_allgather(gathered, partitions, seed) -> Bool

Verify AllGather produced correct result. Each rank can verify independently using only its local shard + expected fingerprint.

source

Gay.TensorParallel.verify_allreduce — Function

verify_allreduce(reduced, n_ranks, seed) -> Bool

Verify AllReduce (sum) produced correct result. For tensor-parallel matrix multiplies where results are summed.

source

Gay.TensorParallel.verify_pipeline_handoff — Function

verify_pipeline_handoff(activations, from_rank, to_rank, layer, seed) -> Bool

Verify activations passed between pipeline stages are uncorrupted.

source

Gay.TensorParallel.ExoPartition — Type

Describes exo's memory-weighted ring partitioning. Each device handles a contiguous range of layers proportional to its memory.

source

Missing docstring.

Missing docstring for create_exo_partitions. Check Documenter's build log for details.

Gay.TensorParallel.verify_exo_ring — Function

verify_exo_ring(activations, partitions, current_device, seed) -> Bool

Verify activations in exo ring topology are correct.

source

Gay.TensorParallel.verify_distributed_inference — Function

verify_distributed_inference(; n_tokens, hidden_dim, vocab_size, 
                               n_layers, n_ranks, seed) -> Bool

End-to-end verification of distributed inference. Simulates the full pipeline and verifies each stage.

source