What does a positive Moran's I value mean?

A positive Moran's I indicates spatial clustering — nearby locations tend to share similar attribute values (high-high or low-low). A value near zero suggests spatial randomness, while a negative value indicates dispersion (high-low alternating patterns).

Should I use analytical or permutation inference for Moran's I?

Permutation inference (permutations=999 or higher) is preferred for most applied work because it is distribution-free and robust to non-normal data. Analytical inference (permutations=0) is faster but assumes normality and is only suitable for large datasets where permutation runtime is prohibitive.

Why does my Moran's I fall outside the range [-1, 1]?

Moran's I is not strictly bounded to [-1, 1] in all configurations. Values outside this range most commonly indicate missing row-standardization on the weights matrix. Set w.transform = 'R' before calling Moran().

How to Calculate Moran's I in PySAL

TL;DR: Construct a row-standardized spatial weights matrix with libpysal, then pass your attribute array and weights to esda.moran.Moran(y, w, permutations=999). The result object exposes I (the statistic), EI (expected value under randomness), z_norm (z-score), and p_sim (pseudo p-value from permutation inference). Always call w.transform = "R" before running the test.

Why This Matters

Global Moran’s I is the gateway diagnostic in any spatial analysis pipeline. Before fitting a spatial regression, interpolating surfaces, or mapping environmental exposures, you need to know whether your variable exhibits statistically significant spatial dependence. A non-significant result justifies ordinary regression; a significant positive I demands spatial modelling. This workflow is the practical entry point into the spatial autocorrelation metrics toolkit covered in its parent topic, which itself sits within the broader Core Concepts of Spatial Statistics & Geostatistics framework.

The mathematical formula behind the statistic is:

I = \frac{n}{\sum_i \sum_j w_{ij}} \cdot \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}

where $n$ is the number of observations, $w_{ij}$ are elements of the spatial weight matrix, and $x_i$ is the attribute value at location $i$ .

The diagram below shows how the three stages — data, weights, and inference — connect:

Environment and Version Pinning

Install the minimal stack required for this workflow. No additional visualization or ML dependencies are needed.

bash

pip install "geopandas>=1.0" "libpysal>=4.9.0" "esda>=2.5.0" "numpy>=1.23" "scipy>=1.9"

Why PySAL is modular: PySAL transitioned from a monolithic package to a focused ecosystem. Spatial topology and neighbor definitions live in libpysal; statistical inference is in esda. Importing from a legacy top-level pysal namespace produces ImportError on any modern install. This split reduces dependency bloat and clarifies the data pipeline.

Imports for this workflow:

python

import geopandas as gpd
import numpy as np
import libpysal
from esda.moran import Moran
from shapely.geometry import box

Step-by-Step Implementation

Step 1 — Load or build your spatial dataset

python

# Synthetic 8×8 grid of polygons — replace with your own file:
# gdf = gpd.read_file("your_data.gpkg")
cells = [box(x, y, x + 1, y + 1) for y in range(8) for x in range(8)]
gdf = gpd.GeoDataFrame(geometry=cells, crs="EPSG:32618")
gdf = gdf.reset_index(drop=True)

The dataset must use a projected CRS (metres, not degrees). Contiguity weights computed on latitude/longitude produce distorted neighbourhoods at higher latitudes.

Step 2 — Attach your attribute variable

python

rng = np.random.default_rng(42)
gdf["pm25"] = rng.normal(loc=12.5, scale=3.2, size=len(gdf))

Your column must be numeric and free of NaN. Missing values in the attribute vector silently propagate through the spatial lag operation and corrupt both I and the permutation null distribution.

Step 3 — Build and row-standardize the spatial weights matrix

python

# Queen contiguity: polygons sharing at least one edge or vertex are neighbours
w = libpysal.weights.Queen.from_dataframe(gdf)

# Row-standardize: each row sums to 1.0
# This makes the spatial lag a proper weighted average of neighbour values
w.transform = "R"

Queen contiguity is appropriate for administrative polygons and raster-derived zones. For irregular point patterns, use libpysal.weights.KNN.from_dataframe(gdf, k=8) instead — see building custom spatial weights matrices for the full decision guide.

Step 4 — Compute Global Moran’s I with permutation inference

python

y = gdf["pm25"].values
moran_result = Moran(y, w, permutations=999, two_tailed=True)

permutations=999 randomly reshuffles y across fixed spatial locations 999 times to build an empirical null distribution. This is distribution-free and robust to non-normal data — the preferred approach for most applied spatial workflows.

Step 5 — Print and inspect results

python

print(f"Moran's I:          {moran_result.I:.4f}")
print(f"Expected I (E[I]):  {moran_result.EI:.4f}")
print(f"Z-score:            {moran_result.z_norm:.4f}")
print(f"Pseudo p-value:     {moran_result.p_sim:.4f}")
print(f"Analytical p-value: {moran_result.p_norm:.4f}")

Interpreting the Output

Output attribute	What it measures	Decision threshold
`I`	Observed spatial autocorrelation index	Positive → clustering; negative → dispersion; ≈0 → random
`EI`	Expected value under null: $-1/(n-1)$	Approaches 0 as $n$ grows
`z_norm`	Standard deviations from `EI` (analytical)	\|z\| > 1.96 → significant at α=0.05
`p_sim`	Permutation pseudo p-value	< 0.05 rejects spatial randomness
`p_norm`	Analytical p-value (assumes normality)	Use only when `permutations=0`

Statistical significance does not equal practical significance. A dataset with 10,000 observations will almost always yield a significant p_sim for any weak spatial pattern. Cross-reference the effect size (I magnitude) with domain knowledge and consider whether the observed dependence is strong enough to require a spatial regression model rather than OLS.

Critical Best Practices

Always project before building weights

Contiguity and distance-based weights fail silently or produce distorted neighbourhoods on unprojected latitude/longitude coordinates. Convert to UTM, a national grid, or another metric projection before calling libpysal.weights.Queen.from_dataframe.

Row-standardize without exception

Setting w.transform = "R" ensures each observation’s spatial lag is a weighted average of its neighbours. Without row-standardization, high-degree nodes (polygons with many neighbours) dominate the statistic, inflating variance and biasing inference. The I attribute can drift outside $[-1, 1]$ with non-standardized binary weights.

Handle spatial islands explicitly

Disconnected polygons (islands with no neighbours) cause libpysal to issue a UserWarning and break the spatial lag operator. Diagnose them with w.islands before proceeding:

python

if w.islands:
    print(f"Warning: {len(w.islands)} disconnected observations found.")
    # Option 1: switch to KNN to guarantee every unit has k neighbours
    w = libpysal.weights.KNN.from_dataframe(gdf, k=4)
    w.transform = "R"
    # Option 2: remove island features and reindex

Increase permutations for publication-grade results

The default permutations=999 yields a minimum resolvable pseudo p-value of $1/(999+1) = 0.001$ . For publication or multiple-testing corrections, use permutations=9999. This stabilizes the empirical null distribution and reduces Monte Carlo variance at the cost of roughly 10× runtime.

Choose weights based on process theory

Do not default to Queen contiguity without justification. The spatial weight matrix encodes your assumption about how influence propagates across space — see spatial weight matrices for the full theoretical framework. Distance-band weights are appropriate when interaction decays predictably over space (pollution plumes, retail catchments); KNN weights suit sparse or irregular configurations.

Troubleshooting

Symptom	Likely cause	Fix
`UserWarning: islands` during weight construction	Disconnected polygon in dataset	Use `KNN.from_dataframe(gdf, k=4)` or remove/merge the island feature
`I` outside `[-1, 1]`	Missing row-standardization	Add `w.transform = "R"` before calling `Moran()`
`p_sim` exactly `0.001`	Too few permutations to resolve small p-values	Increase to `permutations=9999`
`p_sim` exactly `1.0`	Attribute vector is constant (zero variance)	Check `y.std()` — remove constant columns before analysis
Very slow on large datasets	Dense weight matrix + high permutation count	Use `libpysal.weights.KNN` with small `k`; set `permutations=0` for exploratory runs
`ImportError: cannot import name 'Moran' from 'pysal'`	Legacy top-level import	Replace with `from esda.moran import Moran`

Next Steps

Once global autocorrelation is confirmed, run esda.moran.Moran_Local to produce LISA cluster maps that identify statistically significant hotspots, coldspots, and spatial outliers at the unit level. For a deeper treatment of the statistical framework, inference strategies, and local variants, return to the spatial autocorrelation metrics topic. If your residuals from an OLS model show significant Moran’s I, the next step is a spatial regression model to correct for dependence.

← Back to Spatial Autocorrelation Metrics

Related

Spatial Autocorrelation Metrics — global and local statistics, Geary’s C, LISA maps
Building Custom Spatial Weights Matrices — Queen vs Rook vs KNN vs distance-band trade-offs
Spatial Weight Matrices — theory, row-standardization, and sparse representations

How to Calculate Moran's I in PySAL

Why This Matters #

Environment and Version Pinning #

Step-by-Step Implementation #

Step 1 — Load or build your spatial dataset #

Step 2 — Attach your attribute variable #

Step 3 — Build and row-standardize the spatial weights matrix #

Step 4 — Compute Global Moran’s I with permutation inference #

Step 5 — Print and inspect results #

Interpreting the Output #

Critical Best Practices #

Always project before building weights #

Row-standardize without exception #

Handle spatial islands explicitly #

Increase permutations for publication-grade results #

Choose weights based on process theory #

Troubleshooting #

Next Steps #

Why This Matters

Environment and Version Pinning

Step-by-Step Implementation

Step 1 — Load or build your spatial dataset

Step 2 — Attach your attribute variable

Step 3 — Build and row-standardize the spatial weights matrix

Step 4 — Compute Global Moran’s I with permutation inference

Step 5 — Print and inspect results

Interpreting the Output

Critical Best Practices

Always project before building weights

Row-standardize without exception

Handle spatial islands explicitly

Increase permutations for publication-grade results

Choose weights based on process theory

Troubleshooting

Next Steps