dummy

1 资源

2 数据集

3 Homoscedasticity

homoscedasticity means "having the same scatter."

As variance is just the standard deviation squared, you might also see homoscedasticity described as a condition where the standard deviations are equal for all points.

Running a test without checking for equal variances can have a significant impact on your results and may even invalidate them completely.

很多统计检验都假设等方差条件, 如果条件不满足, 会产生错误的结果.

4 Heteroscedasticity

homoscedasticity means "having the different scatter." where points are at widely varying distances from the regression line.

4.1 Why is it important to check for heteroscedasticity?

在线性回归模型中, 不能用X解释Y的那些部分都放入了误差项(可能还有一些未被发现的因素), 模型的稳健型就看误差项, 如果误差项不是同方差的(比如随着X, 标准误差变动), 那么构建的模型不够稳.

4.2 How to detect heteroscedasticity?

4.2.1 1. Graphical Methods

分析误差项(残差分析)

  • Residual vs. Fitted Values Plot(残差散点图): 应该在Y=0, 这条直线上, 随机上下波动, 不会出现U型曲线, 残差相互独立性

  • Scale-Location Plot(标准化残差方根散点图): 学生化残差, 使用残差值的方根比残差值更无偏性, (sqrt(|E|)) is much less skewed than | E | for Gaussian zero-mean E), 小于2正常, 如果不开根方的话:标准化残差图, 图中的点一般在-2 ~ 2 之间正常.

  • Normality Q-Q Plot(残差QQ图): 倾斜的直线是ok的, 如果发现有曲线U, 说明残差不是正态的, 假定不成立, 残差正态性检验

  • Leverage plot(杠杆图): Cookie distance, 库克距离, 是否存在异常数据

4.2.2 2. Statistics Tests

4.3 How to rectify?

5 Draft

Breush Pagan (布劳殊-培干)

6 实践