The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

photo of Zheng Duan on Lund webpage

Zheng Duan

Associate senior lecturer

photo of Zheng Duan on Lund webpage

Estimation of dissolved organic carbon from inland waters at a large scale using satellite data and machine learning methods

Author

  • Lasse Harkort
  • Zheng Duan

Summary, in English

Dissolved Organic Carbon (DOC) in inland waters plays an essential role in the global carbon cycle and has significant public health effects. Machine learning (ML) together with remote sensing has emerged as a powerful and promising combination to quantify water quality parameters from space. However, inland water sample data for DOC is limited. Hence, little is known about the potential to quantify DOC content in inland waters, especially over large-scale areas. This study presents the first attempt to estimate DOC in inland waters over a large-scale area using satellite data and ML methods with the newly published open-source dataset AquaSat. Four ML approaches, namely Random Forest Regression (RFR), Support Vector Regression (SVR), Gaussian Process Regression (GPR), and a Multilayer Backpropagation Neural Network (MBPNN) were trained using more than 16 thousand samples across the continental United States matched with satellite data from Landsat 5, 7 and 8 missions. Satellite data from the Landsat missions were further extended with environmental data from the ERA5-Land product and used as input to train the ML algorithms. Our results show that including environmental data as inputs considerably improved the prediction of DOC for all ML algorithms, with GPR showing the most promising performance results with moderate estimation errors (RMSE: 4.08 mg/L). Permutation feature importance analysis showed that the wavelength range in the visible Green band (from Landsat) and the monthly average air temperature (from ERA5-Land) were the most important variables for the ML approaches. The results demonstrate the predictive strength of GPR and its useful feature to derive per pixel standard deviations for detailed analysis. Our results further highlight the important role of considering environmental processes to explain DOC variations over large scales. The application and performance of the GPR in mapping spatiotemporal variations of DOC in an entire water body were discussed by taking Lake Okeechobee (the 8th largest freshwater lake in the U.S.) as an illustrative example. While performance evaluation showed that DOC concentrations can be retrieved with adequate accuracy, algorithm development was challenged by the heterogenous nature of large-scale open source in situ data, issues related to atmospheric correction, and the low spatial and temporal resolution of the environmental predictors. This research demonstrates how open source, large-scale datasets like AquaSat in combination with ML and satellite remote sensing can make research toward large-scale estimation of inland water DOC more realistic while highlighting its remaining limitations and challenges.

Department/s

  • Dept of Physical Geography and Ecosystem Science
  • BECC: Biodiversity and Ecosystem services in a Changing Climate
  • MERGE: ModElling the Regional and Global Earth system

Publishing year

2023-02-01

Language

English

Publication/Series

Water Research

Volume

229

Document type

Journal article

Publisher

Elsevier

Topic

  • Oceanography, Hydrology, Water Resources

Keywords

  • Dissolved organic carbon
  • Landsat
  • Machine learning
  • Open source data
  • Remote sensing
  • Water quality

Status

Published

ISBN/ISSN/Other

  • ISSN: 0043-1354