Testing for Second-Order Stationarity in Python
Testing for second-order stationarity in Python requires verifying that a spatial random field maintains a constant mean, constant variance, and a covariance structure that depends exclusively on separation distance (lag), not absolute coordinates. In practice, you validate this by combining exploratory spatial data analysis (ESDA) with empirical variogram estimation, trend removal, and statistical homogeneity tests using numpy, scipy, and geostatistical libraries like gstools. If the empirical variogram reaches a stable sill, residuals show no spatially correlated drift, and variance remains homogeneous across distance bins, the process satisfies second-order stationarity and is mathematically safe for ordinary kriging or spatial regression.
The Three Mathematical Conditions
Second-order (weak) stationarity sits between intrinsic stationarity and strict stationarity. It imposes three explicit constraints that must hold across your study area:
- Constant mean:
E[Z(x)] = μfor all locationsx - Constant variance:
Var[Z(x)] = σ²for allx - Lag-dependent covariance:
Cov[Z(x), Z(x+h)] = C(h), meaning spatial correlation decays or stabilizes purely as a function of distanceh
Violations typically appear as global trends (e.g., elevation-driven temperature gradients), directional anisotropy, or heteroscedasticity. Mastering these constraints is essential when navigating the broader Stationarity & Trend Analysis workflow, as stationarity dictates which interpolation and inference methods remain valid.
Environment & Dependencies
| Component | Minimum Version | Notes |
|---|---|---|
| Python | 3.9+ | 3.10+ recommended for scipy performance optimizations |
| NumPy | 1.24+ | Required for vectorized lag calculations |
| SciPy | 1.10+ | Statistical tests (levene, shapiro) |
| GeoPandas | 0.12+ | Spatial I/O and coordinate handling |
| GSTools / scikit-gstat | 1.5+ / 1.4+ | C-extensions may require build-essential (Linux) or Xcode CLI tools (macOS) |
For Windows users, prefer conda install -c conda-forge gstools to avoid MSVC compilation errors. The implementation below relies on pure numpy/scipy for maximum portability, with optional integration points for GSTools noted in the comments.
Complete Python Implementation
The following script removes linear drift, computes an empirical semivariogram, bins distances, and tests variance homogeneity using Levene’s test. It returns a structured assessment of second-order stationarity.
import numpy as np
from scipy.spatial.distance import pdist, squareform
from scipy.stats import levene
import warnings
def test_second_order_stationarity(coords, values, n_bins=10, alpha=0.05):
"""
Tests second-order stationarity for spatial data.
Parameters:
coords (np.ndarray): (N, 2) array of x, y coordinates
values (np.ndarray): (N,) array of observed values
n_bins (int): Number of distance bins for variogram estimation
alpha (float): Significance level for homogeneity tests
Returns:
dict: Stationarity assessment with test statistics and pass/fail flags
"""
coords = np.asarray(coords)
values = np.asarray(values)
if coords.shape[0] != values.shape[0]:
raise ValueError("coords and values must have the same number of observations.")
if coords.shape[1] != 2:
raise ValueError("coords must be 2D (N, 2).")
# 1. Remove linear trend (fit plane: z = ax + by + c)
X = np.column_stack((coords, np.ones(len(coords))))
coeffs, _, _, _ = np.linalg.lstsq(X, values, rcond=None)
trend = X @ coeffs
residuals = values - trend
# 2. Compute pairwise distances and empirical semivariogram
dist_matrix = squareform(pdist(coords))
diff_matrix = squareform(pdist(residuals.reshape(-1, 1), metric='sqeuclidean')) * 0.5
# Flatten upper triangle
dist_flat = dist_matrix[np.triu_indices_from(dist_matrix, k=1)]
gamma_flat = diff_matrix[np.triu_indices_from(diff_matrix, k=1)]
# 3. Bin distances
max_dist = np.max(dist_flat)
bin_edges = np.linspace(0, max_dist, n_bins + 1)
bin_indices = np.digitize(dist_flat, bin_edges) - 1
bin_indices = np.clip(bin_indices, 0, n_bins - 1)
# Group semivariances by bin
bin_variances = []
for b in range(n_bins):
mask = bin_indices == b
if np.sum(mask) > 1:
bin_variances.append(gamma_flat[mask])
else:
bin_variances.append(np.array([np.nan]))
# 4. Test variance homogeneity across bins (Levene's test)
valid_bins = [v for v in bin_variances if not np.isnan(v).all() and len(v) > 1]
if len(valid_bins) >= 2:
_, levene_p = levene(*valid_bins, center='median')
variance_homogeneous = levene_p > alpha
else:
levene_p = np.nan
variance_homogeneous = False
# 5. Check for sill stabilization (coefficient of variation < 0.15 in last 3 bins)
last_bins = [v for v in valid_bins[-3:] if len(v) > 1]
if last_bins:
sill_means = [np.mean(v) for v in last_bins]
sill_cv = np.std(sill_means) / np.mean(sill_means) if np.mean(sill_means) > 0 else 1.0
sill_reached = sill_cv < 0.15
else:
sill_reached = False
# 6. Residual mean check (should be ~0 after detrending)
mean_residual = np.mean(residuals)
mean_stationary = abs(mean_residual) < (np.std(values) * 0.05)
is_stationary = variance_homogeneous and sill_reached and mean_stationary
return {
"is_stationary": is_stationary,
"mean_residual": float(mean_residual),
"mean_stationary": bool(mean_stationary),
"levene_p_value": float(levene_p),
"variance_homogeneous": bool(variance_homogeneous),
"sill_cv": float(sill_cv),
"sill_reached": bool(sill_reached),
"n_valid_bins": len(valid_bins)
}
Interpreting Results & Troubleshooting
The function returns a dictionary with three primary pass/fail flags:
mean_stationary: Confirms the detrended residuals center near zero.variance_homogeneous: Uses Levene’s test to ensure variance doesn’t systematically change across distance bands. See the official SciPy Levene documentation for test assumptions.sill_reached: Validates that the empirical semivariogram stabilizes rather than growing indefinitely.
If is_stationary returns False, diagnose the failure:
- High
levene_pfailure: Variance increases with distance. Apply a variance-stabilizing transformation (e.g., Box-Cox or log) or switch to universal kriging. sill_reachedfailure: The process exhibits long-range dependence or an unremoved trend. Increase the search radius, add polynomial detrending, or model anisotropy explicitly.mean_stationaryfailure: The linear trend removal was insufficient. Consider higher-order polynomial fitting or spatial filtering.
Once stationarity is confirmed, you can safely proceed to model fitting and spatial prediction. For advanced workflows involving anisotropy detection or automated variogram modeling, consult the Core Concepts of Spatial Statistics & Geostatistics reference guide. Always validate model assumptions with cross-validation before deploying spatial predictions in production environments.