最小二乘法回归

互动式讲解

通过转盘 可以调整标量的值。

周围有灰色圆圈的点时可以拖动的。

统计回归基本上是一种从一批现有数据中预测未知数量的方法。例如,假设我们一开始就知道 "样本人口 "中一群人的身高和手掌大小,并且我们想找出一种方法,从身高预测不在样本中的人的手掌大小。通过应用OLS(线性回归),我们将得到一个方程,将手的大小--"自变量 "作为输入,而将身高--"因变量 "作为输出。

下面,OLS是在幕后进行的,以产生回归方程。回归中的常数--即所谓的 "betas"--是OLS输出的东西。在这里,beta_1是一个截距;它告诉人们即使手的尺寸为零,身高也会是多少。而β_2是手掌大小的系数;它告诉我们,在手掌大小给定的情况下,我们应该期望某人能长多高。拖动样本数据,看看beta的变化。

0 20 40 60 80 100 hand size 0 20 40 60 80 100 height
Beta 1 - The y-intercept of the regression line. Beta 2 - The slope of the regression line. 5.88 + 0.82 * hand size = height

At some point, you probably asked your parents, "Where do betas come from?" Let's raise the curtain on how OLS finds its betas.

Error is the difference between prediction and reality: the vertical distance between a real data point and the regression line. OLS is concerned with the squares of the errors. It tries to find the line going through the sample data that minimizes the sum of the squared errors. Below, the squared errors are represented as squares, and your job is to choose betas (the slope and intercept of the regression line) so that the total area of all the squares (the sum of the squared errors) is as small as possible. That's OLS!

-6.88 + 0.99 * hand size = height
0 20 40 60 80 100 x 0 20 40 60 80 100 y

Now, real scientists and even sociologists rarely do regression with just one independent variable, but OLS works exactly the same with more. Below is OLS with two independent variables. Instead of the errors being relative to a line, though, they're now relative to a plane in 3D space. So now the job of OLS is to find the equation for that plane. The slice of the plane through each axis is shown in the first two figures.

0 20 40 60 80 100 x1 0 20 40 60 80 100 y
0 20 40 60 80 100 x2 0 20 40 60 80 100

By playing with the dots, you can see that, when there are multiple variables involved, the true relationships can be very counterintuitive. That's why we have statistics: to make us unsure about things.

Below, see if you can choose the betas to minimize the sum of squared errors.

There are many other prediction techniques much more complicated than OLS, like logistic regression, weighted least-squares regression, robust regression and the growing family of non-parametric methods.

2.09 + 0.00 * hand size + 0.00 * hand size = height
0 20 40 60 80 100 x1 0 20 40 60 80 100 y
0 20 40 60 80 100 x2 0 20 40 60 80 100