本文简述时序预测(以及其他机器学习/深度学习应用领域)常见的各种评价指标(Evaluation Metrics),供日后研究总结用。
相关阅读:概率论与数理统计常用公式大全
1 确定性预测指标
Deterministic
1.1 误差指标
Error-Based: Absolute / Relative / Cumulative / Scaled
1.1.1 MAE
Mean Absolute Error(平均绝对误差,MAE):衡量一组预测中误差的平均大小,不考虑误差的方向。Measures the average magnitude of the errors in a set of predictions, without considering their direction.
\text{MAE}=\frac1n \sum\limits_{t=1}^n|y_t-\hat y_t|
1.1.2 MSE
Mean Squared Error(均方误差,MSE):衡量平方误差的平均值。Measures the average of the squared errors.
\text{MSE}=\frac1n \sum\limits_{t=1}^n(y_t-\hat y_t)^2
1.1.3 RMSE
Root Mean Squared Error(均方根误差,RMSE):MSE的平方根,对较大误差赋予更高权重。Square root of the average of squared errors, giving higher weight to larger errors.
\text{RMSE}=\sqrt{\frac1n \sum\limits_{t=1}^n(y_t-\hat y_t)^2}
1.1.4 MAPE
Mean Absolute Percentage Error(平均绝对百分比误差,MAPE):以百分比形式衡量误差的大小。Measures the size of the error in percentage terms.
\text{MAPE}=\frac1n\sum\limits_{t=1}^n|\frac{y_t-\hat y_t}{y_t}| \times 100
1.1.5 sMAPE
Symmetric Mean Absolute Percentage Error(对称平均绝对百分比误差,sMAPE):基于相对误差衡量预测准确性。Measures the accuracy based on relative error.
\text{sMAPE}=\frac1n\sum\limits_{t=1}^n \frac{|y_t-\hat y_t|}{\frac{|y_t|+|\hat y_t|}2} \times 100
1.1.6 MFE
Mean Forecast Error(平均预测误差,MFE):预测误差的平均值,反映预测偏差。Average of forecast errors, indicating bias.
\text{MFE}=\frac1n\sum\limits_{t=1}^n (y_t-\hat y_t)
1.1.7 CFE
Cumulative Forecast Error(累计预测误差,CFE):所有预测误差的总和,衡量整个预测范围内的总偏差。Sum of all forecast errors, measures total bias over the forecast horizon.
\text{CFE}=\sum\limits_{t=1}^n (y_t-\hat y_t)
1.1.8 MASE
Mean Absolute Scaled Error(平均绝对缩放误差,MASE):通过Naïve预测(基准模型,不唯一)的MAE进行缩放的MAE。MAE scaled by the MAE of a naïve forecast.
\text{MASE}=\frac{\frac1n \sum\limits_{t=1}^n|y_t-\hat y_t|}{\frac1{n-1} \sum\limits_{t=2}^n|y_t-y_{t-1}|}
1.2 解释方差指标
Explained Variance Metrics
1.2.1 R²
Coefficient of Determination(决定系数,R2):模型解释的方差比例。Proportion of variance explained by the model.
R^2= 1 - \frac{\sum\limits_{t=1}^n(y_t-\hat y_t)^2}{\sum\limits_{t=1}^n(y_t-\bar y)^2}
1.2.2 调整R²
Adjusted Coefficient of Determination(调整决定系数,Adjusted R2):调整预测变量数量后的决定系数。R2 adjusted for the number of predictors.
\text{Adjusted}\ R^2=1-\frac{(1-R^2)(n-1)}{n-k-1}
1.2.3 EVS
Explained Variance Score(解释方差分数,EVS):衡量模型解释的方差比例。Measures the proportion of variance explained by the model.
\text{EVS}=1-\frac{\text{Var}(y_t-\hat y_t)}{\text{Var}(y_t)}
1.3 模型选择指标
Model Selection Metrics
1.3.1 AIC
Akaike Information Criterion(赤池信息准则,AIC):模型拟合优度与复杂度之间的权衡。Trade-off between goodness of fit and model complexity.
\text{AIC}=2k-2\ln \hat L
1.3.2 BIC
Bayesian Information Criterion(贝叶斯信息准则,BIC):类似于AIC,但对参数较多的模型有更强的惩罚。Similar to AIC with a stronger penalty for models with more parameters.
\text{BIC}=k\ln n - 2 \ln \hat L
1.3.3 HQC
Hannan-Quinn Criterion(汉南-奎因准则,HQC):AIC和BIC的替代方案,具有不同的惩罚项。Alternative to AIC and BIC with different penalty terms.
\text{HQC}= 2k\ln(\ln n) - 2\ln \hat L
1.3.4 AICc
Corrected Akaike Information Criterion(修正赤池信息准则,AICc):针对小样本尺寸修正的AIC。AIC with correction for small sample sizes.
\text{AICc}=\text{AIC}+\frac{2k(k+1)}{n-k-1}
2 概率性预测指标
Probabilistic
2.1 误差指标
Error-Based Metrics
2.1.1 Log Score
Logarithmic Score(对数分数,Log Score):使用对数函数评估预测概率与实际结果之间的差异。Evaluates the difference between predicted probabilities and actual outcomes using a logarithmic function.
\text{LogScore}=-\frac1n \sum_{t=1}^n\log p_t
2.1.2 CRPS
Continuous Ranked Probability Score(连续分级概率评分,CRPS):使用累积分布函数评估预测概率分布与观测值之间的差异。Evaluates the difference between the predicted probability distribution and the observed value using the cumulative distribution function.
\text{CRPS}=\int_{-\infty}^{+\infty}(\hat F(z)- I_{z\ge y_t})^2\text{d}z
2.2 区间指标
Interval Metrics
2.2.1 PICP
Prediction Interval Coverage Probability(预测区间覆盖概率,PICP):衡量观测值落在预测区间内的比例。Measures the proportion of observed values that fall within the predicted intervals.
\text{PICP}=\frac1n\sum\limits_{t=1}^n I_{y_t \in [\hat y_{\text{lower},t},\hat y_{\text{upper},t}]}
2.2.2 PIW
Prediction Interval Width(预测区间宽度,PIW):通过测量预测区间的宽度来评估预测精度。Evaluates precision by measuring the width of prediction intervals.
\text{PIW}=\frac1n\sum\limits_{t=1}^n (\hat y_{\text{upper},t} - \hat y_{\text{lower},t})
2.3 其他
2.3.1 分位数损失
- Quantile Loss(分位数损失,又称Pinball Loss):基于分位数
\tau
对过度预测和不足预测进行惩罚。Penalizes over- and under-predictions based on quantile\tau
.
\text{QuantileLoss}=\frac1n \sum\limits_{t=1}^n (I_{y_t \ge \hat y_t}\tau(y_t-\hat y_t)+I_{y_t<\hat y_t}(1-\tau)(\hat y_t-y_t))
2.3.2 锐度
- Sharpness(锐度):通过区间的宽度或方差评估预测的集中程度。Evaluates the concentration by the width of intervals or variance.
\text{Sharpness}=\frac1n \sum\limits_{i=1}^n\text{Var}(\hat y_i)
3 符号表
文中所有公式涉及的符号含义如下:
符号 | 含义 |
---|---|
n |
样本数量(时间点总数) |
t |
时间点索引(t = 1, 2, \cdots, n ) |
y_t |
在时间点t 的实际观测值 |
\hat y_t |
在时间点t 的预测值 |
\bar y |
实际观测值的平均值 |
k |
模型参数数量 |
\hat L |
模型似然函数的最大值 |
\hat F(z) |
预测分布的累积分布函数 |
I_{\text{condition}} |
指示函数(当条件满足时为1,否则为0) |
p_t |
预测概率 |
\tau |
分位数 |
\hat y_{\text{lower},t} \hat y_{\text{upper},t} |
预测区间的下限和上限 |
\text{Var}(\cdot) |
方差 |
tbd