How to Calculate Moran’s I in PySAL

To calculate Moran’s I in PySAL, construct a spatial weights matrix using libpysal, then pass your attribute array and weights object to esda.moran.Moran. The class computes the global statistic, expected value under spatial randomness, variance, and permutation-based inference in a single execution. Always row-standardize your weights matrix before calculation to ensure numerical stability and correct interpretation of the index.

This guide delivers a production-ready workflow, explains the modern modular architecture, and details how to validate results for spatial data science and environmental modeling pipelines.

Why the Modern PySAL Architecture Matters

PySAL transitioned from a monolithic package to a focused, modular ecosystem. You no longer import from a top-level pysal namespace. Instead, topology and neighbor definitions live in libpysal, while statistical inference is handled by esda (Exploratory Spatial Data Analysis). This separation reduces dependency bloat, clarifies the data pipeline, and aligns with modern Python packaging standards. Understanding this split is essential when learning how to calculate Moran’s I in PySAL without encountering ImportError or deprecated API warnings.

Prerequisites and Environment Setup

Ensure your environment meets these baseline requirements before running spatial statistics:

  • Python 3.9+
  • geopandas>=0.12 (spatial I/O and CRS transformations)
  • libpysal>=4.9.0 (spatial weights construction)
  • esda>=2.5.0 (global and local spatial autocorrelation)
  • numpy>=1.23 and scipy>=1.9 (numerical backends)

Install the stack via pip:

bash
pip install geopandas libpysal esda numpy scipy

Step-by-Step Implementation

The following workflow demonstrates a complete, reproducible pipeline using NYC borough boundaries, a synthetic environmental variable, and Queen contiguity weights.

python
import geopandas as gpd
import numpy as np
import libpysal
from esda.moran import Moran

# 1. Load spatial data
gdf = gpd.read_file(gpd.datasets.get_path('nybb'))

# 2. Attach a numeric variable (replace with your actual column)
rng = np.random.default_rng(42)
gdf['pm25_concentration'] = rng.normal(loc=12.5, scale=3.2, size=len(gdf))

# 3. Project to a metric CRS
# Geographic coordinates (EPSG:4326) distort distance and contiguity calculations
gdf = gdf.to_crs(epsg=32618)

# 4. Build spatial weights matrix
# Queen contiguity: polygons sharing at least one edge or vertex
w = libpysal.weights.Queen.from_geodataframe(gdf)

# Row-standardize weights so each row sums to 1.0
# Required for Moran's I to remain bounded between -1 and 1
w.transform = 'R'

# 5. Compute Global Moran's I with permutation inference
y = gdf['pm25_concentration'].values
moran_result = Moran(y, w, permutations=999, two_tailed=True)

# 6. Extract and format results
print(f"Moran's I:          {moran_result.I:.4f}")
print(f"Expected I (E[I]):  {moran_result.EI:.4f}")
print(f"Z-score:            {moran_result.z_norm:.4f}")
print(f"Pseudo p-value:     {moran_result.p_sim:.4f}")

Interpreting the Output

The Moran object returns diagnostic values that determine whether observed spatial patterns are statistically meaningful:

  • I (Observed Statistic): Ranges roughly from -1 to +1. Positive values indicate clustering of similar values (high-high or low-low). Negative values indicate dispersion (high-low or low-high). Values near zero suggest spatial randomness.
  • EI (Expected Value): The theoretical mean under the null hypothesis of spatial randomness. It is typically slightly negative and asymptotically approaches -1/(N-1) as sample size increases.
  • z_norm (Z-Score): Measures how many standard deviations the observed I deviates from EI. Values beyond ±1.96 typically indicate significance at α=0.05.
  • p_sim (Pseudo p-value): Derived from the empirical permutation distribution. A value <0.05 rejects the null hypothesis of no spatial autocorrelation.

Statistical significance does not equal practical significance. Always cross-reference effect size (I) with domain knowledge. For deeper mathematical context, review the theoretical foundations of Spatial Autocorrelation Metrics before drawing ecological, epidemiological, or urban planning conclusions.

Analytical vs. Permutation-Based Inference

By default, Moran uses permutation inference (permutations=999). This approach randomly shuffles the attribute vector across fixed spatial locations to build an empirical null distribution. It is robust to non-normal data and irregular spatial configurations.

Alternatively, you can request analytical approximations by setting permutations=0. The analytical method assumes normality or randomization and computes variance using closed-form equations. Use analytical inference only when:

  • Your dataset exceeds 10,000 observations and permutation runtime becomes prohibitive
  • You are running Monte Carlo simulations where deterministic variance is required
  • You have verified that your variable distribution approximates normality

For most applied spatial workflows, permutation inference remains the gold standard due to its distribution-free properties.

Critical Best Practices

1. Always Project Your Data

Contiguity and distance-based weights fail silently or produce distorted neighborhoods when calculated on unprojected latitude/longitude coordinates. Convert to a local projected CRS (e.g., UTM or state plane) before building w.

2. Handle Spatial Islands Explicitly

Disconnected polygons (islands) have no neighbors and will cause libpysal to issue warnings. Isolated features break the spatial lag operator and bias variance estimates. Resolve them by:

  • Switching to K-Nearest Neighbors (libpysal.weights.KNN.from_geodataframe)
  • Using w.remap_ids() to assign self-weights
  • Removing or merging isolated geometries before weight construction

3. Choose Weights Based on Process Theory

  • Queen/Rook Contiguity: Ideal for administrative boundaries or raster-derived zones.
  • K-Nearest Neighbors (KNN): Better for irregular point patterns or sparse networks.
  • Distance Bands: Appropriate when interaction decays predictably over space (e.g., pollution plumes, retail catchments). Refer to the official libpysal documentation for implementation details, sparse matrix optimizations, and weight transformation options.

4. Row-Standardize Before Calculation

Setting w.transform = 'R' ensures that each observation’s spatial lag is a weighted average of its neighbors. Without row-standardization, areas with many neighbors dominate the statistic, inflating variance and biasing inference.

5. Increase Permutations for Precision

The default permutations=999 yields a minimum resolvable p-value of 0.001. For publication-grade results or multiple testing corrections, increase to permutations=9999. This adds computational cost but stabilizes the empirical null distribution and reduces Monte Carlo error.

Troubleshooting Common Errors

Symptom Likely Cause Fix
ValueError: Cannot compute spatial weights with disconnected components Islands in the dataset Use w = libpysal.weights.Rook.from_geodataframe(gdf, silence_warnings=True) or switch to KNN
I outside [-1, 1] Missing row-standardization Add w.transform = 'R' before calling Moran()
p_sim exactly 0.0 Too few permutations Increase to permutations=9999
Slow execution on large datasets High neighbor counts + dense permutations Use sparse matrix representations or subsample for exploratory analysis

Next Steps in Spatial Analysis

Once global autocorrelation is confirmed, drill down into local patterns using esda.moran.Moran_Local (LISA) to identify statistically significant hotspots, coldspots, and spatial outliers. Integrate these results with regression diagnostics to validate model residuals, or visualize clusters using choropleth mapping. Mastering this workflow anchors your analytical pipeline within the broader Core Concepts of Spatial Statistics & Geostatistics, ensuring robust, reproducible spatial inference.

For advanced implementation patterns, consult the esda API reference and explore permutation inference theory in the official PySAL tutorials.