Performance metrics¶
-
class
pystran.
Evaluation
(observed, modelled)[source]¶ Class for deriving different evaluation criteria.
References
[E1] (1, 2, 3) Gupta H.V., Sorooshian S., Yapo P.O.(1998), Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information, Water Resources Research,pp 751-763 [E2] H. Hauduc, M. B. Neumann, D. Muschalla, V. Gamerith, S. Gillot and P.A. Vanrolleghem (2011), Towards quantitative quality criteria to evaluate simulation results in wastewater treatment – A critical review. Proceedings 8th symposium on systems analysis and integrated assessment (Watermatex 2011) -
AME
()[source]¶ Absolute Maximum Error
Notes
The absolute maximum error indicates the maximum error of the model [E1]. This criterion is very sensitive to outliers
- range: [0, inf]
- optimum: 0
- category: Absolute criteria
-
APBIAS
()[source]¶ Absolute Percent Bias
Notes
Useful in combi with PBIAS, eg if PBIAS small and APB very large one could conclude that volumes are ok, but timing is missing (continuous gap)
- range: [0, inf]
- optimum: 0
- category: Total Relative error criteria
-
BIAS
(optim=False)[source]¶ Bias E[obs-mod]
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
- range: [-inf, inf]
- optimum: 0
- category: Total Relative error criteria
-
CrBal
(optim=False)[source]¶ Balance Criterion
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm (1- CrBal)
Notes
[E9] use the balance criterion to measure the ability of the model to reproduce the same cumulative as observed. The difference between the inversed fractions penalises larger differences between observed and modelled cumulative values.
- range: [-inf, 1]
- optimum: 1
- category: Total Relative error criteria
-
HighFDCE
()[source]¶ Flow Duration Curve based high flow criterion
Notes
Uses the upper part (lowest percentiles) of the Flow Duration Curve to focus on high flow regimes. Always use in combination with a second criterion to make sure the timing of the model is also satifying. Used in [E15].
-
IA
(optim=False)[source]¶ Index of agreement
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
Index of agreement is te ratio of the sum of squared errors (SSE) and the largest potential error with respect to the mean of the observed values, [E4]. This is sensitive to the model mean and to the peak values, and is insensitive to low magnitude values.
- range: [0, 1]
- optimum: 1
- category: comparison with reference model
-
LowFDCE
()[source]¶ Flow Duration Curve based low flow criterion
Notes
Uses the lower part (highest percentiles) of the Flow Duration Curve to focus on low flow regimes. Always use in combination with a second criterion to make sure the timing of the model is also satifying. Used in [E15].
-
MAE
()[source]¶ Mean Absolute Error
Notes
The mean absolute error indicates the average magnitude of the model error (accuracy) [E4]. Taking the absolute value avoids error compensation, but does not indicate the direction of the deviation.
- range: [0, inf]
- optimum: 0
- category: Absolute criteria
References
[E4] (1, 2, 3) Willmott C.J., Ackleson S.G., Davis R.E., Feddema J.J., Klink K.M., Legates D.R., O’Donnell J. and Rowe C.M. (1985) Statistics for the evaluation and comparison of models. Journal of Geophysical Research, 90(C5), 8995-9005.
-
MAPE
()[source]¶ Mean Absolute Percent Error
Notes
The mean absolute percent error used by [E3] is close to MARE. However, the errors are relative to the predicted values instead of the observed values. Consequently, the under-predicted values are penalised (for a similar error). This is an interesting criterion for situations in which one wants to determine a risk to reach concentration limits.
- range: [0, inf]
- optimum: 0
- category: Relative criteria
-
MARE
()[source]¶ Mean Absolute Relative Error
Notes
The mean absolute relative error is similar to the Mean Relative Error, but avoids the compensation of errors [E7].
- range: [0, inf]
- optimum: 0
- category: Relative criteria
References
[E7] Petersen B., Gernaey K., Henze M. and Vanrolleghem P.A. (2002) Evaluation of an ASM1 model calibration procedure on a municipal-industrial wastewater treatment plant. Journal of Hydroinformatics, 4(1), 15-38.
-
ME
(optim=False)[source]¶ Mean Error
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
The mean of residuals allows highlighting the existence of systematic bias, i.e. characteristic of a model leading to systematic over- or under-prediction [E1]. However, with this criterion errors can compensate each other, so no information on the magnitude of the errors is obtained.
- range: [-inf, inf]
- optimum: 0
- category: Absolute criteria
References
[E3] (1, 2, 3) Power M. (1993) The predictive validation of ecological and environmental models. Ecological Modelling, 68(1-2), 33-50.
-
MPE
(optim=False)[source]¶ Mean Percent Error
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
See also
MRE
Notes
The mean percent error [E3] provide the average relative model error. However, negative and positive errors can compensate for each other.
- range: [-inf, inf]
- optimum: 0
- category: Relative criteria
-
MRE
(optim=False)[source]¶ Mean Relative Error
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
The mean relative error [E5] provide the average relative model error. However, negative and positive errors can compensate for each other.
- range: [-inf, inf]
- optimum: 0
- category: Relative criteria
-
MSDE
()[source]¶ Mean Square Derivative Error
Notes
The mean square derivative error is the square of the differences of predicted and observed variations between two time steps [E5]. This criterion penalizes noisy time series and series with timing error; it thus allows evaluating the peak’s timing.
- range: [0, inf]
- optimum: 0
- category: Absolute criteria
-
MSE
()[source]¶ Mean Squared Error
Notes
The mean square error avoids error compensations and emphasises high errors [E4].
- range: [0, inf]
- optimum: 0
- category: Absolute criteria
-
MSLE
()[source]¶ Mean Squared Logarithm Error
Notes
The mean square logarithm error is the sum of the squares of the differences of the natural logarithm of the predicted and observed value [E5]. It emphasises low magnitude errors.
- range: [0, inf]
- optimum: 0
- category: Absolute criteria
References
[E5] (1, 2, 3, 4, 5, 6, 7) Dawson C.W., Abrahart R.J. and See L.M. (2010) HydroTest: Further development of a web resource for the standardised assessment of hydrological models. Environmental Modelling and Software, 25(11), 1481-1482.
-
MSRE
()[source]¶ Mean Square Relative Error
Notes
The mean square relative error avoids compensation of errors and emphasises larger relative errors [E5].
- range: [0, inf]
- optimum: 0
- category: Relative criteria
-
MSSoE
()[source]¶ Mean Sqaured sorted Errors
Notes
The mean square error of sorted errors is calculated based on sorted observed and predicted data (van Griensven and Bauwens, 2003). Observations and predictions are sorted independently one from the other. The sorted series are then compared (comparison of their cumulative distributions) and it is evaluated whether the model reproduces the same distribution as the observed data.
The time of occurrence of a given value of the variable is not accounted for in the MSSoE method.
- range: [0, inf]
- optimum: 0
- category: Absolute criteria, timestep independent
References
[E6] van Griensven A. and Bauwens W. (2003) Multiobjective autocalibration for semidistributed water quality models. Water Resources Research, 39(12), SWC91-SWC99.
-
MeAPE
()[source]¶ Median Absolute Percent Error
Notes
Median of the absolute relative error expressed in percentage [E5]. This criterion is less affected by outliers and the errors distribution form as the MARE criterion.
- range: [0, inf]
- optimum: 0
- category: Relative criteria
-
NSC
()[source]¶ Number of Sign Changes of the residuals
Notes
The number of sign changes,[E1]_, counts the number of times the residual (Oi-Pi) sign change. The minimum value is zero and the maximum n. A value close to zero indicates a systematic error (overestimating or under-estimating model) but a more consistent model. A value close to n indicates a random error.
- range: [0, nsize]
- optimum: /
- category: Absolute criteria
-
NSE
(optim=False)[source]¶ Nash-Sutcliffe Efficiency criterion
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
Widely used criterion in hydrology, values ranging from -infty -> 1 A zero value means the model is not better than the ‘no knowledge’ model, which is characterised by the mean of the observations. Sensitive to extreme values.
- range: [-inf, 1]
- optimum: 1
- category: comparison with reference model
-
NSE_BIAS
()[source]¶ Combination of Nash Sutcliff and BIAS
Notes
The criterium is gaining importance by the combined effect and is proposed in [E16]. Here an adaptation is implemented by taking the absolute value of the bias, to make the function symmetrical around the optimal value.
References
[E16] Viney, N.R., J. Perraud, J. Vaze F.H.S. Chiew, D.A. Post and A. Yang (2009b). The usefulness of bias constraints in model calibration for regionalisation to ungauged catchments. Proceedings, MODSIM 200
-
NSE_FDChigh
(w1=1.0, w2=1.0)[source]¶ Nash Sutcliff (mod) + high Flow; zelfde gewichtsfactor: als fout groter, ook beide groter!
Parameters: w1 : float (0-1)
weighting factor 1, NSE
w2 : float (0-1)
weighting factor 2, FDC
-
NSE_FDClow
(w1=1.0, w2=1.0)[source]¶ Nash Sutcliffe (mod) + low Flow; zelfde gewichtsfactor, als fout groter, ook beide groter!
Parameters: w1 : float (0-1)
weighting factor 1, NSE
w2 : float (0-1)
weighting factor 2, FDC
-
NSE_boxcox
(optim=False, llambda=0.25)[source]¶ Nash-Sutcliffe Efficiency criterion with boxcox transformed values
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
Widely used criterion in hydrology, values ranging from -infty -> 1 A zero value means the model is not better than the ‘no knowledge’ model, which is characterised by the mean of the observations.
Model residuals typically increase with higher flowvalues. This means that themodel residual variance or standard deviation typically increases with increasing flow. It also means that the higher flow values receive more weight in the goodness-of-fit statistics, [E10].
- range: [-inf, 1]
- optimum: 1
- category: comparison with reference model
References
[E10] Willems, P. A Time Series Tool to Support the Multi-criteria Performance Evaluation of Rainfall-runoff Models. Environmental Modelling & Software 24, no. 3 (March 2009): 311–321. http://linkinghub.elsevier.com/retrieve/pii/S1364815208001606.
-
NSE_log
(optim=False)[source]¶ Nash-Sutcliffe Efficiency criterion with logarithmic values
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
Widely used criterion in hydrology, values ranging from -infty -> 1 A zero value means the model is not better than the ‘no knowledge’ model, which is characterised by the mean of the observations. The log values of the observed and measured values are used to give more emphasis to the lower values
- range: [-inf, 1]
- optimum: 1
- category: comparison with reference model
-
NSE_sqrt
(optim=False)[source]¶ Nash-Sutcliffe Efficiency criterion with root values
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
Widely used criterion in hydrology, values ranging from -infty -> 1 A zero value means the model is not better than the ‘no knowledge’ model, which is characterised by the mean of the observations. The root values of the observed and measured values are used to give more emphasis to the lower values
- range: [-inf, 1]
- optimum: 1
- category: comparison with reference model
-
PBIAS
(optim=False)[source]¶ Percent Bias
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
The percent bias [E5] and relative volume error are the sum of errors related to the sum of observed values, expressed as relative value or in percentage. This criterion measures an overall adequacy, but the errors can be compensated.
(Also known as DEVRV, the Deviation of runoff volumes, From Statistical evaluation of WATFLOOD, Angela MacLean, University of Waterloo)
- range: [-inf, inf]
- optimum: 0
- category: Total Relative error criteria
-
PDIFF
(optim=False)[source]¶ Peak Difference
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
This criterion evaluate how well the highest modelled value matches the highest observed value in percent. However, it does not take into account whether the max(Oi) and max(Pi) occur at the same time-step i.
Consequently, in case of multiple events on the same time-series, first the single events must be extracted from the whole time series to have less chance to mix up with peaks from another event.
- range: [-inf, inf]
- optimum: 0
- category: single event
-
PEP
(optim=False)[source]¶ Percent Error In Peak
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
This criterion evaluate how well the highest modelled value matches the highest observed value in percent. However, it does not take into account whether the max(Oi) and max(Pi) occur at the same time-step i.
Consequently, in case of multiple events on the same time-series, first the single events must be extracted from the whole time series to have less chance to mix up with peaks from another event.
- range: [-inf, inf]
- optimum: 0
- category: single event
-
PI
(optim=False)[source]¶ Coefficient of Persistance
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
The coefficient of persistence is close tot the NSE criterion, but the simplistic model used is th elast observed value instead of the mean of observed values, [E12].
- range: [0, 1]
- optimum: 1
- category: comparison with reference model
-
R4MS4E
()[source]¶ Root 4 Mean Square 4 Error
See also
RMSE
Notes
To put even more emphasis on the larger errors, the fourth root mean quadruples error is used [E5]
-
RAE
()[source]¶ Relative Absolute Error
Notes
The RAE compares the sum of the absolute residuals to the residuals of the no knowledge model (mean of observed values, [E11]. This criterion does not allow error compensation.
- range: [0, inf]
- optimum: 0
- category: comparison with reference model
References
[E11] (1, 2) Legates D.R. and McCabe G.J. (1999) Evaluating the use of ‘goodness-of-fit’ measures in hydrologic and hydroclimatic model validation. Water Resources Research, 35(1), 233-241
-
RCOEF
(optim=False)[source]¶ Correlation coefficient
Parameters: optim : bool
if True, the objective is translated to be used in optimizations where a minimum value is seeked by the algorithm
Notes
Used to describe how well a regression line fits a set of data, compares variability in observed and modelled values. In general not the best criteria to check model performance, see more details in [E11].
- range: [0, 1]
- optimum: 1
- category: comparison with reference model
-
RFLAUT
(theta=1, method='biomath')[source]¶ First (or higher) lag autocorrelation, higher values of theta gives the higher value
Parameters: theta : int
lag to calculate
method : biomath|gupta|anders
method to calculate the correlation
Notes
Calculates the first lag of the autocorrelation of the residuals, according to the version proposed by [E1] when method ‘gupta1998’ is chosen. Default is the biomath version, as proposed by Gujer, 2008 and more information is given in [E12]
- range: [0, 1]
- optimum: 0
- category: others
References
[E13] Cierkens, Katrijn. Investigating Bioprocess Model Output Uncertainty as Function of Input Data Quantity and Model Structure. Ghent University, 2010.
-
RMAE
()[source]¶ Relative Mean Absolute Error
Notes
The relative mean absolute error is the sum of absolute errors related to the sum of observed data [E8]. The difference with the PBIAS and RVE is that errors are not compensated.
- range: [0, inf]
- optimum: 0
- category: Total Relative error criteria
References
[E8] (1, 2) Elliott J.A., Irish A.E., Reynolds C.S. and Tett P. (2000) Modelling freshwater phytoplankton communities: an exercise in validation. Ecological Modelling, 128(1), 19-26.
-
RMSE
()[source]¶ Root Mean Square Error
Notes
The root mean square error is an absolute criterion that is often used [4]_. It indicates the overall agreement between predicted and observed data. The square allows avoiding error compensation and emphasises larger errors. The root provides a criterion in actual units. Consequently, this quality criterion can be compared to the MAE to provide information on the prominence of outliers in the dataset.
-
RMSE_boxcox
(llambda=0.25)[source]¶ Root Mean Square Error with boxcox trfd values
Notes
The root mean square error is an absolute criterion that is often used [4]_. It indicates the overall agreement between predicted and observed data. The square allows avoiding error compensation and emphasises larger errors. The root provides a criterion in actual units. Consequently, this quality criterion can be compared to the MAE to provide information on the prominence of outliers in the dataset. Also applied in [E14].
-
RMSE_log
(llambda=0.25)[source]¶ Root Mean Square Error with boxcox trfd values
Notes
The root mean square error is an absolute criterion that is often used [4]_. It indicates the overall agreement between predicted and observed data. The square allows avoiding error compensation and emphasises larger errors. The root provides a criterion in actual units. Consequently, this quality criterion can be compared to the MAE to provide information on the prominence of outliers in the dataset.
-
RRMSE
()[source]¶ Relative Root Mean Square Error
See also
RMSE
Notes
The relative Root Mean Square Error is the Root Mean Square Error devided by the mean of the observations.
-
RSR
()[source]¶ RMSE-observation standard deviation ratio
Notes
The RMSE-observation standard deviation ratio is the RMSE of the predicted data divided by the RMSE of the no knowledge model (mean of observed values), [E12]. It is a scaled criterion that emphasises larger errors and can be, as for MAE and RMSE, compared to the RAE to indicate the influence of larger errors.
- range: [0, inf]
- optimum: 0
- category: comparison with reference model
References
[E12] (1, 2, 3) Moriasi D.N., Arnold J.G., Van Liew M.W., Bingner R.L., Harmel R.D. and Veith T.L. (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE, 50(3), 885-900
-
RVE
()[source]¶ Relative Volume Error
Notes
The Relative volume error are the sum of errors related to the sum of observed values, expressed as relative value or in percentage. This criterion measures an overall adequacy, but the errors can be compensated
- range: [-inf, inf]
- optimum: 0
- category: Total Relative error criteria
-
SARE
()[source]¶ Sum of Absolute Relative Error
Notes
- range: [0, inf]
- optimum: 0
- category: Relative criteria
-
SFDCE
()[source]¶ Slope of the flow duration curve error
Notes
Based on the FDC, measures how well the model captures the distribution of mid-level flow. The slope of a watershed’s flow duration curve indicates the variability, or flashiness, of its flow magnitudes. The SFDCE metric is thus simply the absolute error in the slope of the flow duration curve between the 30 and 70 percentile flows.
References
[E14] van Werkhoven, Kathryn, Thorsten Wagener, Patrick Reed, and Yong Tang. Sensitivity-guided Reduction of Parametric Dimensionality for Multi-objective Calibration of Watershed Models. Advances in Water Resources 32, no. 8 (2009): 1154–1169. http://dx.doi.org/10.1016/j.advwatres.2009.03.002.
-
SSE
()[source]¶ Sum of Squared Errors (of prediction)
Notes
- range: [0, inf]
- optimum: 0
- category: Absolute criteria
-
TMC
()[source]¶ Totel Mass Controller
Notes
[E6] use the Total Mass Controller criterion as an objective function. This criterion compares the cumulative predicted and observed values
- range: [0, inf]
- optimum: 0
- category: Total Relative error criteria
-
ThInC
()[source]¶ Theils Inequality Coefficient
Notes
Theil’s inequality coefficient used by [E3] and [E8] is the mean square error divided by the sum of observed data. This criterion avoids error compensation and emphasises larger errors.
- range: [0, inf]
- optimum: 0
- category: Total Relative error criteria
-