Evaluating The Functional Realism Of Deep Learning Rainfall-Runoff Models Using Catchment Hydrology Principles








--
https://doi.org/10.1029/2025WR040076 <-- shared paper
--
“ABSTRACT: Deep learning (DL) models such as Long-Short-Term-Memory (LSTM) networks have achieved exceptional predictive accuracy in rainfall–runoff modeling. Yet these models learn from statistical correlations rather than hydrologic insights, raising the question of whether their internal functional reasoning is physically reliable. Despite previous studies highlighting unexpected outcomes from LSTMs under long-term climate shifts, functional realism - defined as the extent to which a model’s internal functioning aligns with defensible mechanisms of streamflow generation - remains largely underexplored. [The authors] introduce a hydrology-specific Explainable AI (XAI) framework that opens the black-box of LSTM. It extracts nonlinear, lag-dependent, and time-varying Impulse Response Functions (IRFs) which quantify the functional relationships that LSTM uses to reflect the isolated influence of precipitation (P), temperature (T), and potential evapotranspiration (PET) on simulated streamflow. IRFs reveal how LSTMs internalize streamflow generation during events, offering a catchment hydrology perspective for evaluating model realism. Applying this framework to 672 North American catchments with strong LSTM predictive skill, [they] find that high accuracy often masks hydrologically implausible reasoning: in over 70% of rain-dominated basins, short-term temperature rises unexpectedly raise simulated streamflow and enhance celerity rate even without rainfall; in snow-dominated regions, PET is misattributed as a driver of snowmelt-related flow and enhances the catchment’s celerity rate. [They] conclude[d] that correlation-driven learning can compromise the robustness of LSTM-based forecasts under weather extremes and short-term and long-term climatic shifts. [Their] framework bridges deep learning with hydrologic understanding and offers a scalable diagnostic for assessing the functional realism of DL models across diverse catchment types.
PLAIN LANGUAGE SUMMARY: Rainfall-runoff models help predict how precipitation, snowmelt, temperature, and potential evapotranspiration influence streamflow, which is essential for managing water resources and preparing for extreme events. Deep learning models like Long-Short-Term-Memory (LSTM) networks are increasingly used for this task due to their strong predictive performance. It is known that these models rely on correlations in the data rather than true cause-and-effect relationships grounded in hydrologic processes. What remains unclear is whether this correlation-based learning leads to hydrologically unrealistic behavior across different types of catchments and under varying weather and climate conditions. To address these questions, [they] developed a new Explainable AI tool that reveals how LSTMs respond to lagged climate inputs such as P, T, and PET, allowing us to assess the functional realism of model behavior. Using this tool, [they] analyzed LSTM predictions across 672 catchments in the US and Canada. [They] found that even in catchments where LSTMs performed very well, they often relied on misleading relationships-for example, predicting increased streamflow following heatwaves without rainfall, or interpreting PET, rather than T, as the main driver of snowmelt. These findings underscore the importance of evaluating not only model accuracy but also how well the model aligns with established hydrologic principles.
KEY POINTS:
• In most rain-dominated catchments, Long-Short-Term-Memory (LSTMs) show positive links between T or PET short-term rises and streamflow (and celerity rate)
• In many snow-dominated catchments, LSTMs produce enhanced streamflow and celerity with PET rises, treating PET as a proxy for temperature
• [Their] proposed Explainable AI framework serves as a screening tool to evaluate the trustworthiness of deep learning hydrologic models..”
#water #hydrology #surfacewater #pluvial #fluvial #rainfall #snow #snowmelt #runoff #precipitation #model #blackbox #robustness #functionalrealism #screening #parameters #accuracy #hydrologic #principles #trustworthy #modeling #spatialanalysis #spatial #mapping #GIS #spatiotemporal #USA #CONUS #AI #ExplainableAI #celerity #machinelearning #artificialintelligence #LSTM #deeplearning #evapotranspiration #waterresources #extremeweather #flood #flooding #risk #hazard #monitoring #prediction #catchments #streamflow #geomorphometry #network #flow #calibration

