dummy

1 资源

2 概念

观察点是Outlier Point, 不一定就是Influence Point

观察点是High Leverage Point, 也不一定是Influence Point

dummy

A data point is influential if it unduly influences any part of a regression analysis, such as the predicted responses, the estimated slope coefficients, or the hypothesis test results.

是不是Influence Point, 要看包含及排除这个观测点是否对预测Y值和回归模型系数以及统计检验结果有影响.

2.1 1. 离群点(Outlier)

An outlier is a data point whose response y does not follow the general trend of the rest of the data.

该观测值偏离预测的Y值

2.2 2. 杠杆率(Leverage)

A data point has high leverage if it has "extreme" predictor x values.

该观测值的X非常偏离其他X数据

推导公式:

\[ \begin{alignat}{1} Y &= X\beta + \epsilon \\ \hat{y} &= X(X^{'}X)^{-1}X^{'}y \\ H &= X(X^{'}X)^{-1}X^{'} \\ \hat{y}_i &= h_{i1}y_1+h_{i2}y_2+...+h_{ii}y_i+ ... + h_{in}y_n \;\;\;\;\; \text{ for } i=1, ..., n \\ \end{alignat} \]

H是个hat matrix, 为什么叫这个名字? 由\(\hat{y} = Hy\), 把y通过H矩阵变换为\(\hat{y}\), \(h_{ii}\)Leverages, 此值越大说明在\(\hat{y}_i\)\(y_i\)占有更大的角色.

dummy Here are some important properties of the leverages:
  1. The leverage hii is a measure of the distance between the x value for the ith data point and the mean of the x values for all n data points.
  2. The leverage hii is a number between 0 and 1, inclusive.
  3. The sum of the hii equals p, the number of parameters (regression coefficients including the intercept).

如何通过\(h_ii\)判断观察点的x值是异常的?

答: 杠杆点均值 \(\bar{h} = \dfrac{\sum_{i = 1}^{n}h_{ii}}{n} = \dfrac{p}{n}\), 如果\(h_{ii} \gt 3\dfrac{p}{n}\), 则\(h_{ii}\)是高杠杆率点.

3 实践