- This is part of probstat.
In this section, we shall discuss linear regression. We shall focus on one-variable linear regression.
Model
We consider two variables
and
where
is a function of
. We refer to
as independent or input variable, and
as a dependent variable. We consider linear relationship between independent variable and dependent variable. We assume that there exist hidden variables
and
such that
where
is a random error. We further assume that the error is unbiased, i.e.,
and is independent of
.
Input: As an input to the regression process, we are given a set of
data points:
generated from the previous equation.
Goal: We want to estimate
and
.
The least squares estimators
Denote our estimate for
as
and for
as
. Using both variables as estimator, the error at data point
, the error is
.
We focus more on the sum of squared errors, i.e.,
.
The method of least squares use the parameters that minimize the squared errors as an estimator. Therefore, we want to find
and
that minimize
. To do so, we partially differentiate
with respect to
and
:
We set these two equations to zero to find the maximum and obtain these two equations we have to solve.
Before solving these two equations, let's define
Distribution of regression parameters
Statistical tests on regression parameters