Über dieses Buch

“Data-Driven Modeling: Using MATLAB® in Water Resources and Environmental Engineering” provides a systematic account of major concepts and methodologies for data-driven models and presents a unified framework that makes the subject more accessible to and applicable for researchers and practitioners. It integrates important theories and applications of data-driven models and uses them to deal with a wide range of problems in the field of water resources and environmental engineering such as hydrological forecasting, flood analysis, water quality monitoring, regionalizing climatic data, and general function approximation. The book presents the statistical-based models including basic statistical analysis, nonparametric and logistic regression methods, time series analysis and modeling, and support vector machines. It also deals with the analysis and modeling based on artificial intelligence techniques including static and dynamic neural networks, statistical neural networks, fuzzy inference systems, and fuzzy regression. The book also discusses hybrid models as well as multi-model data fusion to wrap up the covered models and techniques. The source files of relatively simple and advanced programs demonstrating how to use the models are presented together with practical advice on how to best apply them. The programs, which have been developed using the MATLAB® unified platform, can be found on extras.springer.com. The main audience of this book includes graduate students in water resources engineering, environmental engineering, agricultural engineering, and natural resources engineering. This book may be adapted for use as a senior undergraduate and graduate textbook by focusing on selected topics. Alternatively, it may also be used as a valuable resource book for practicing engineers, consulting engineers, scientists and others involved in water resources and environmental engineering.

Anzeige

Inhaltsverzeichnis

Frontmatter

Problems involving the process of water resources and environmental management such as simulation of natural events, warning of natural disasters, and impact analysis of development scenarios are of significant importance in case of the changing environment. Considering the complexity of natural phenomena as well as our limited knowledge of mathematical modeling, this might be a challenging problem. Recently, development of data-driven models has improved the application of specific tools to be used through the complex process of real-world modeling. Soft computing and statistical models are two common groups of data-driven models that could be employed to solve water resources and environmental problems. Data-driven models are among mathematical models, which use experimental data to analyze real-world phenomena. In contrast to physical models, they do not need a specific laboratory setup so are significantly cheaper. Also, in contrast to the analytical models, data-driven models can be used for the problems where we do not have enough knowledge about the intrinsic complexity of the phenomena. This chapter presents a brief review of different types of models that could be used for modeling water resources and environmental problems, reviews the process of model selection for a specific problem, and investigates the general approach of using data-driven models. The advanced stage of developing a model is discussed in the last section.

A stochastic variable is a combination of two components of deterministic variable,

D

, and random variable,

ε

. While

D

could be modeled by a range of mathematical models,

ε

is described by the probability theory using probability distribution function (

pdf

). Regarding the type of a random variable which might be discrete or continuous, it is defined by two types of discrete and continuous

pdf

s. Discrete distribution functions of

Bernoulli

,

binomial

, and

Poisson

are reviewed in this chapter along with the continuous distribution functions of

exponential

,

uniform

,

normal

, and

extreme value

. One of the most applicable fields of distribution functions is

frequency analysis

which is discussed in another section of this chapter. As far as the statistical analysis of real problems is concerned, hypothetical tests are widely used for deciding on either the parameters of one or several populations or the type of a distribution function which better fits the data. The hypothetical tests follow a general approach while that approach should be adapted for specific problems by defining appropriate statistical and critical values. The tests on the statistical parameters of populations are reviewed in this chapter. Furthermore, two famous tests of

chi-square

and

Kolmogorov–Smirnov

are presented to decide on the best distribution function for a specific random variable. Each of the above calculations is supported by the related commands and programs provided in MATLAB.

Regression analysis aims to study the relationship between one variable, usually called the dependent variable, and several other variables, often called the independent variables. These models are among the most popular data-driven models for their easy application and very well-known techniques. Regression models range from linear to nonlinear and parametric to nonparametric models. In the field of water resources and environmental engineering, regression analysis is widely used for prediction, forecasting, estimation of missing data, and, in general, interpolation and extrapolation of data. This chapter presents models for point and interval estimation of dependent variables using different regression methods. Multiple linear regression model, conventional nonlinear regression models, K-nearest neighbor nonparametric model, and logistic regression model are presented in different sections of this chapter. Each model is supported by related commands and programs provided in

Modeling of time series involves dealing with the important temporal dimension, which represents and processes sequential inputs. Many statistical-based methods are used to model and forecast time series data such as autoregressive (AR) and autoregressive moving average (ARMA) models, autoregressive integrated moving average (ARIMA) model, and autoregressive moving average with exogenous (ARMAX) data. Time series modeling involves techniques that relate time series data as dependent variables to the predictors, which all are a function of time. Many examples of time series data exist in the field of water resources and environmental engineering, including streamflow data, rainfall data, and time series of total dissolved solids in a river. This variety makes the application of time series very interesting in those fields. Two major applications are usually followed up by the time series modeling: forecasting and synthetic data generation. This chapter reviews the basic mathematical representation as well as the applicable fields of the well-known time series models. In addition to the time series analysis, different models and applications are presented by different programs developed in MATLAB.

Artificial neural network as the most famous artificial intelligence models are a collection of neurons with specific architecture formed based on the relationship between neurons in different layers. Neuron is a mathematical unit, and an artificial neural network that consists of neurons is a complex and nonlinear system. Artificial neural networks (ANNs) may have different architectures which result in different types of ANNs. A static ANN known as a multilayer perceptron (MLP) is the most applied ANN in different fields of engineering. This type of ANN is presented in this chapter and details on its calibration and validation are discussed. Furthermore, dynamic ANNs improved to consider the temporal dimension of data through the modeling process is presented. In this chapter, dynamic ANNs including input delay networks, recurrent networks, and a combination of both are discussed in details. Statistical neural networks, namely, radial basis estimator, generalized neural network, and probabilistic neural network, which are all developed based on a statistical-based estimation, are the third type of ANNs presented in this chapter. How to deal with calibration and validation of all models by MATLAB codes and commands are discussed. Application of the models in function approximation and data classification are presented through different examples.

Classifying data is a common task in data-driven modeling. Using support vector machines, we can separate classes of data by a hyperplane. A support vector machine (SVM) is a concept for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. The formulation of SVM uses the structural risk minimization principle, which has been shown to be superior to the traditional empirical risk minimization principle used by conventional neural networks. This chapter presents principles of classification and regression analysis by support vector machines, briefly. Also related MATLAB programs are presented.

While variables in mathematics usually take numerical values, in fuzzy logic applications, the non-numeric linguistic variables are often used to facilitate the expression of rules and facts. The idea of fuzzy logic is very suitable for engineering application where a precise representation of the real world is sought. In contrast to the statistical-based methods, fuzzy models do not need very strong assumptions and requirements. As far as the engineering application of fuzzy logic is concerned, two approaches are usually followed up: (1) developing fuzzy extensions of the classic methods and models and (2) developing models, which are basically originated by the fuzzy logic. Basic information in fuzzy logic, fuzzy clustering, fuzzy inference systems, and fuzzy regression are the main subjects which are presented in this chapter. Obviously, the related useful MATLAB commands are presented and discussed to support the methods of applied modeling of the presented subjects.

The need for increased accuracy and precision in data-driven models has motivated the researchers to develop innovative models. Hybrid models and multi-model ensemble estimations are applied to increase accuracy and precision of single models. To get an idea about how different models could be combined in a way to increase each other’s abilities, the chapter begins with a summary on the characteristics of the models presented in the previous chapters of the book. The models are compared based on different criteria to give the readers ideas on how to take advantages of the models’ strengths and avoid their weakness through the hybrid models and multi-model data fusion approach. The chapter continues with the examples of hybrid models and general techniques of multi-model data fusion. The approach of multi-model data fusion contains an important process of individual model generation which is going to be discussed in the last section of the chapter.

Shahab Araghinejad

Backmatter

Weitere Informationen

Titel

Data-Driven Modeling: Using MATLAB® in Water Resources and Environmental Engineering