ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"
(→Week 1) |
|||
แถว 55: | แถว 55: | ||
</math> | </math> | ||
− | === Ordinary Least Square === | + | === Example 1 === |
+ | When the class <math>Y= \{-1, 1\}</math>, i.e. classification problem, our objective <math>f_p(x)</math> reduces to | ||
+ | |||
+ | :<math> f_p(x) = \mathbb{E}_p[y|x] = Pr[y=1 | x] - Pr[y=-1 |x]</math> | ||
+ | |||
+ | One can show that the function | ||
+ | |||
+ | :<math>h_{*}(x) = sgn \mathbb{E}_p[y|x]</math> | ||
+ | |||
+ | minimizes the loss (proof omited) | ||
+ | |||
+ | === Example 2 === | ||
+ | |||
+ | When <math>Y = \mathbb{R}</math>, the problem is just like regression where we try to regress :<math>f_p(x)</math> | ||
+ | |||
+ | === Ordinary Least Square === | ||
+ | |||
+ | If the relation is linear, | ||
+ | |||
+ | :<math>y= X \beta + \epsilon</math> | ||
+ | |||
+ | OLS provably gives the minimum squared error. | ||
+ | |||
+ | Consider the error | ||
+ | |||
+ | :<math>||y- X \beta||^2 = (y- X\beta)^T (y-X \beta)</math> | ||
=== Tikhonov Regularization === | === Tikhonov Regularization === | ||
== Week 2 == | == Week 2 == |
รุ่นแก้ไขเมื่อ 07:45, 30 มีนาคม 2550
This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago.
เนื้อหา
Week 1: Introduction and OLS
Learning problem
Given a distribution on . We want to learn the objective function (with respect to the distribution ).
Learning Algorithms
Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.
Learning errors
Suppose the learning algorithm outputs h. The learning error can be measured by
One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.
And that's the reason why we try to learn
In other word, we claim that
The proof is easy.
We get
Then observe that,
- The first term only depends on distribution
- The third term is zero
Observe also that the term which is zero.
- The second term is equal to
Example 1
When the class , i.e. classification problem, our objective reduces to
One can show that the function
minimizes the loss (proof omited)
Example 2
When , the problem is just like regression where we try to regress :
Ordinary Least Square
If the relation is linear,
OLS provably gives the minimum squared error.
Consider the error