How to Calculate Moran’s I in PySAL
To calculate Moran’s I in PySAL, construct a spatial weights matrix using libpysal, then pass your attribute array and weights object to esda.moran.Moran. The class computes the global statistic, expected value under spatial randomness, variance, and permutation-based inference in a single execution. Always row-standardize your weights matrix before calculation to ensure numerical stability and correct interpretation of the index.
This guide delivers a production-ready workflow, explains the modern modular architecture, and details how to validate results for spatial data science and environmental modeling pipelines.
Why the Modern PySAL Architecture Matters
PySAL transitioned from a monolithic package to a focused, modular ecosystem. You no longer import from a top-level pysal namespace. Instead, topology and neighbor definitions live in libpysal, while statistical inference is handled by esda (Exploratory Spatial Data Analysis). This separation reduces dependency bloat, clarifies the data pipeline, and aligns with modern Python packaging standards. Understanding this split is essential when learning how to calculate Moran’s I in PySAL without encountering ImportError or deprecated API warnings.
Prerequisites and Environment Setup
Ensure your environment meets these baseline requirements before running spatial statistics:
- Python 3.9+
geopandas>=0.12(spatial I/O and CRS transformations)libpysal>=4.9.0(spatial weights construction)esda>=2.5.0(global and local spatial autocorrelation)numpy>=1.23andscipy>=1.9(numerical backends)
Install the stack via pip:
pip install geopandas libpysal esda numpy scipy
Step-by-Step Implementation
The following workflow demonstrates a complete, reproducible pipeline using NYC borough boundaries, a synthetic environmental variable, and Queen contiguity weights.
import geopandas as gpd
import numpy as np
import libpysal
from esda.moran import Moran
# 1. Load spatial data
gdf = gpd.read_file(gpd.datasets.get_path('nybb'))
# 2. Attach a numeric variable (replace with your actual column)
rng = np.random.default_rng(42)
gdf['pm25_concentration'] = rng.normal(loc=12.5, scale=3.2, size=len(gdf))
# 3. Project to a metric CRS
# Geographic coordinates (EPSG:4326) distort distance and contiguity calculations
gdf = gdf.to_crs(epsg=32618)
# 4. Build spatial weights matrix
# Queen contiguity: polygons sharing at least one edge or vertex
w = libpysal.weights.Queen.from_geodataframe(gdf)
# Row-standardize weights so each row sums to 1.0
# Required for Moran's I to remain bounded between -1 and 1
w.transform = 'R'
# 5. Compute Global Moran's I with permutation inference
y = gdf['pm25_concentration'].values
moran_result = Moran(y, w, permutations=999, two_tailed=True)
# 6. Extract and format results
print(f"Moran's I: {moran_result.I:.4f}")
print(f"Expected I (E[I]): {moran_result.EI:.4f}")
print(f"Z-score: {moran_result.z_norm:.4f}")
print(f"Pseudo p-value: {moran_result.p_sim:.4f}")
Interpreting the Output
The Moran object returns diagnostic values that determine whether observed spatial patterns are statistically meaningful:
I(Observed Statistic): Ranges roughly from -1 to +1. Positive values indicate clustering of similar values (high-high or low-low). Negative values indicate dispersion (high-low or low-high). Values near zero suggest spatial randomness.EI(Expected Value): The theoretical mean under the null hypothesis of spatial randomness. It is typically slightly negative and asymptotically approaches-1/(N-1)as sample size increases.z_norm(Z-Score): Measures how many standard deviations the observedIdeviates fromEI. Values beyond ±1.96 typically indicate significance at α=0.05.p_sim(Pseudo p-value): Derived from the empirical permutation distribution. A value <0.05 rejects the null hypothesis of no spatial autocorrelation.
Statistical significance does not equal practical significance. Always cross-reference effect size (I) with domain knowledge. For deeper mathematical context, review the theoretical foundations of Spatial Autocorrelation Metrics before drawing ecological, epidemiological, or urban planning conclusions.
Analytical vs. Permutation-Based Inference
By default, Moran uses permutation inference (permutations=999). This approach randomly shuffles the attribute vector across fixed spatial locations to build an empirical null distribution. It is robust to non-normal data and irregular spatial configurations.
Alternatively, you can request analytical approximations by setting permutations=0. The analytical method assumes normality or randomization and computes variance using closed-form equations. Use analytical inference only when:
- Your dataset exceeds 10,000 observations and permutation runtime becomes prohibitive
- You are running Monte Carlo simulations where deterministic variance is required
- You have verified that your variable distribution approximates normality
For most applied spatial workflows, permutation inference remains the gold standard due to its distribution-free properties.
Critical Best Practices
1. Always Project Your Data
Contiguity and distance-based weights fail silently or produce distorted neighborhoods when calculated on unprojected latitude/longitude coordinates. Convert to a local projected CRS (e.g., UTM or state plane) before building w.
2. Handle Spatial Islands Explicitly
Disconnected polygons (islands) have no neighbors and will cause libpysal to issue warnings. Isolated features break the spatial lag operator and bias variance estimates. Resolve them by:
- Switching to K-Nearest Neighbors (
libpysal.weights.KNN.from_geodataframe) - Using
w.remap_ids()to assign self-weights - Removing or merging isolated geometries before weight construction
3. Choose Weights Based on Process Theory
- Queen/Rook Contiguity: Ideal for administrative boundaries or raster-derived zones.
- K-Nearest Neighbors (KNN): Better for irregular point patterns or sparse networks.
- Distance Bands: Appropriate when interaction decays predictably over space (e.g., pollution plumes, retail catchments). Refer to the official libpysal documentation for implementation details, sparse matrix optimizations, and weight transformation options.
4. Row-Standardize Before Calculation
Setting w.transform = 'R' ensures that each observation’s spatial lag is a weighted average of its neighbors. Without row-standardization, areas with many neighbors dominate the statistic, inflating variance and biasing inference.
5. Increase Permutations for Precision
The default permutations=999 yields a minimum resolvable p-value of 0.001. For publication-grade results or multiple testing corrections, increase to permutations=9999. This adds computational cost but stabilizes the empirical null distribution and reduces Monte Carlo error.
Troubleshooting Common Errors
| Symptom | Likely Cause | Fix |
|---|---|---|
ValueError: Cannot compute spatial weights with disconnected components |
Islands in the dataset | Use w = libpysal.weights.Rook.from_geodataframe(gdf, silence_warnings=True) or switch to KNN |
I outside [-1, 1] |
Missing row-standardization | Add w.transform = 'R' before calling Moran() |
p_sim exactly 0.0 |
Too few permutations | Increase to permutations=9999 |
| Slow execution on large datasets | High neighbor counts + dense permutations | Use sparse matrix representations or subsample for exploratory analysis |
Next Steps in Spatial Analysis
Once global autocorrelation is confirmed, drill down into local patterns using esda.moran.Moran_Local (LISA) to identify statistically significant hotspots, coldspots, and spatial outliers. Integrate these results with regression diagnostics to validate model residuals, or visualize clusters using choropleth mapping. Mastering this workflow anchors your analytical pipeline within the broader Core Concepts of Spatial Statistics & Geostatistics, ensuring robust, reproducible spatial inference.
For advanced implementation patterns, consult the esda API reference and explore permutation inference theory in the official PySAL tutorials.