ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"
แถว 3: | แถว 3: | ||
== Week 1: Introduction and OLS == | == Week 1: Introduction and OLS == | ||
=== Learning problem === | === Learning problem === | ||
− | Given a distribution <math>\mathbb{P} </math> on <math> X \times Y</math>. We want to learn the objective function <math>f_p(x) = \mathbb{E}_p[y|x]</math> (with respect to the distribution <math>\mathbb{P}</math>). | + | Given a distribution <math>\mathbb{P} </math> on <math> X \times Y</math>. We want to learn the objective function <math>f_p(x) = \mathbb{E}_p[y|x]</math> (with respect to the distribution :<math>\mathbb{P}</math>). |
แถว 9: | แถว 9: | ||
Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead. | Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead. | ||
− | <math>A: \cup_{n=1}^{\infty} Z^n \rightarrow F</math> | + | :<math>A: \cup_{n=1}^{\infty} Z^n \rightarrow F</math> |
=== Learning errors === | === Learning errors === | ||
Suppose the learning algorithm outputs h. The learning error can be measured by | Suppose the learning algorithm outputs h. The learning error can be measured by | ||
− | <math>\int (y-h(x))^2 dP </math> | + | :<math>\int (y-h(x))^2 dP </math> |
One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity. | One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity. | ||
− | <math>||f_p - h||^2_{l_2(\mathbb{P})} = \int (f_p(x)-h(x))^2 P_x(x) dx </math> | + | :<math>||f_p - h||^2_{l_2(\mathbb{P})} = \int (f_p(x)-h(x))^2 P_x(x) dx </math> |
And that's the reason why we try to learn <math>\mathbb{E}_p[y|x]</math> | And that's the reason why we try to learn <math>\mathbb{E}_p[y|x]</math> | ||
แถว 24: | แถว 24: | ||
In other word, we claim that | In other word, we claim that | ||
− | <math>argmin_{h} \int (y-h(x))^2 dP = argmin_{h} ||f_p - h||^2_{l_2(\mathbb{P})}</math> | + | :<math>argmin_{h} \int (y-h(x))^2 dP = argmin_{h} ||f_p - h||^2_{l_2(\mathbb{P})}</math> |
The proof is easy. | The proof is easy. | ||
− | <math> | + | :<math>\int (y-h(x))^2 dP = \int ((y-f_p(x)) + (f_p(x)- h(x)))^2)dP </math> |
− | \int (y-h(x))^2 dP = \int ((y-f_p(x)) + (f_p(x)- h(x)))^2)dP | ||
− | </math> | ||
We get | We get | ||
− | <math> | + | :<math> \int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP + \int (f_p(x)- h(x))^2 dP + 2 \int (y-f_p(x)) (f_p(x)-h(x)) dP </math> |
− | \int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP + \int (f_p(x)- h(x))^2 dP + 2 \int (y-f_p(x)) (f_p(x)-h(x)) dP | ||
− | </math> | ||
แถว 44: | แถว 40: | ||
* The third term is zero | * The third term is zero | ||
− | <math> \int \int (y-f_p(x)) (f_p(x)-h(x)) p(x,y) dy dx = \int p(x) (f_p(x)- h(x)) [ \int (y-f_p(x)) p(y|x) dy ] dx </math> | + | :<math> \int \int (y-f_p(x)) (f_p(x)-h(x)) p(x,y) dy dx = \int p(x) (f_p(x)- h(x)) [ \int (y-f_p(x)) p(y|x) dy ] dx </math> |
Observe also that the term <math>\int (y-f_p(x)) p(y|x) dy = \mathbb{E}[y-\mathbb{E}[y|x] | x]</math> which is zero. | Observe also that the term <math>\int (y-f_p(x)) p(y|x) dy = \mathbb{E}[y-\mathbb{E}[y|x] | x]</math> which is zero. | ||
แถว 51: | แถว 47: | ||
* The second term is equal to | * The second term is equal to | ||
− | <math> | + | :<math> |
\int_{X} \int_{Y} (f_p(x)- h(x))^2 p(x,y) dy dx = \int_{X} (f_p(x) - h(x))^2 \int_{Y} p(x,y)dy dx = ||f_p - h||^2_{l_2(\mathbb{P})} | \int_{X} \int_{Y} (f_p(x)- h(x))^2 p(x,y) dy dx = \int_{X} (f_p(x) - h(x))^2 \int_{Y} p(x,y)dy dx = ||f_p - h||^2_{l_2(\mathbb{P})} | ||
</math> | </math> |
รุ่นแก้ไขเมื่อ 10:20, 30 มีนาคม 2550
This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago.
เนื้อหา
Week 1: Introduction and OLS
Learning problem
Given a distribution on . We want to learn the objective function (with respect to the distribution :).
Learning Algorithms
Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.
Learning errors
Suppose the learning algorithm outputs h. The learning error can be measured by
One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.
And that's the reason why we try to learn
In other word, we claim that
The proof is easy.
We get
Then observe that,
- The first term only depends on distribution
- The third term is zero
Observe also that the term which is zero.
- The second term is equal to
Example 1
When the class , i.e. classification problem, our objective reduces to
One can show that the function
minimizes the loss (proof omited)
Example 2
When , the problem is just like regression where we try to regress :
Ordinary Least Square
If the relation is linear,
OLS provably gives the minimum squared error.
Consider the error