ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
แถว 3: แถว 3:
 
== Week 1: Introduction and OLS ==  
 
== Week 1: Introduction and OLS ==  
 
=== Learning problem ===  
 
=== Learning problem ===  
Given a distribution <math>\mathbb{P} </math> on <math> X \times Y</math>. We want to learn the objective function <math>f_p(x) = \mathbb{E}_p[y|x]</math> (with respect to the distribution <math>\mathbb{P}</math>).   
+
Given a distribution <math>\mathbb{P} </math> on <math> X \times Y</math>. We want to learn the objective function <math>f_p(x) = \mathbb{E}_p[y|x]</math> (with respect to the distribution :<math>\mathbb{P}</math>).   
  
  
แถว 9: แถว 9:
 
Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.
 
Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.
  
<math>A: \cup_{n=1}^{\infty} Z^n \rightarrow F</math>
+
:<math>A: \cup_{n=1}^{\infty} Z^n \rightarrow F</math>
  
 
=== Learning errors ===
 
=== Learning errors ===
 
Suppose the learning algorithm outputs h. The learning error can be measured by  
 
Suppose the learning algorithm outputs h. The learning error can be measured by  
  
<math>\int (y-h(x))^2 dP </math>
+
:<math>\int (y-h(x))^2 dP </math>
  
 
One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.  
 
One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.  
  
<math>||f_p - h||^2_{l_2(\mathbb{P})} = \int (f_p(x)-h(x))^2 P_x(x) dx </math>
+
:<math>||f_p - h||^2_{l_2(\mathbb{P})} = \int (f_p(x)-h(x))^2 P_x(x) dx </math>
  
 
And that's the reason why we try to learn <math>\mathbb{E}_p[y|x]</math>
 
And that's the reason why we try to learn <math>\mathbb{E}_p[y|x]</math>
แถว 24: แถว 24:
 
In other word, we claim that  
 
In other word, we claim that  
  
<math>argmin_{h} \int (y-h(x))^2 dP = argmin_{h} ||f_p - h||^2_{l_2(\mathbb{P})}</math>
+
:<math>argmin_{h} \int (y-h(x))^2 dP = argmin_{h} ||f_p - h||^2_{l_2(\mathbb{P})}</math>
  
 
The proof is easy.  
 
The proof is easy.  
  
<math>
+
:<math>\int (y-h(x))^2 dP  = \int ((y-f_p(x)) + (f_p(x)- h(x)))^2)dP </math>
\int (y-h(x))^2 dP  = \int ((y-f_p(x)) + (f_p(x)- h(x)))^2)dP  
 
</math>
 
  
 
We get  
 
We get  
  
<math>   
+
:<math>  \int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP + \int (f_p(x)- h(x))^2 dP + 2 \int (y-f_p(x)) (f_p(x)-h(x)) dP  </math>
\int (y-h(x))^2 dP = \int (y-f_p(x))^2 dP + \int (f_p(x)- h(x))^2 dP + 2 \int (y-f_p(x)) (f_p(x)-h(x)) dP   
 
</math>
 
  
  
แถว 44: แถว 40:
 
* The third term is zero  
 
* The third term is zero  
  
<math> \int \int (y-f_p(x)) (f_p(x)-h(x)) p(x,y) dy dx = \int p(x) (f_p(x)- h(x)) [ \int (y-f_p(x)) p(y|x) dy ] dx  </math>  
+
:<math> \int \int (y-f_p(x)) (f_p(x)-h(x)) p(x,y) dy dx = \int p(x) (f_p(x)- h(x)) [ \int (y-f_p(x)) p(y|x) dy ] dx  </math>  
  
 
Observe also that the term <math>\int (y-f_p(x)) p(y|x) dy = \mathbb{E}[y-\mathbb{E}[y|x] | x]</math> which is zero.  
 
Observe also that the term <math>\int (y-f_p(x)) p(y|x) dy = \mathbb{E}[y-\mathbb{E}[y|x] | x]</math> which is zero.  
แถว 51: แถว 47:
 
* The second term is equal to  
 
* The second term is equal to  
  
<math>  
+
:<math>  
 
\int_{X} \int_{Y} (f_p(x)- h(x))^2 p(x,y) dy dx = \int_{X} (f_p(x) - h(x))^2 \int_{Y} p(x,y)dy dx =  ||f_p - h||^2_{l_2(\mathbb{P})}   
 
\int_{X} \int_{Y} (f_p(x)- h(x))^2 p(x,y) dy dx = \int_{X} (f_p(x) - h(x))^2 \int_{Y} p(x,y)dy dx =  ||f_p - h||^2_{l_2(\mathbb{P})}   
 
</math>
 
</math>

รุ่นแก้ไขเมื่อ 10:20, 30 มีนาคม 2550

This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago.

Week 1: Introduction and OLS

Learning problem

Given a distribution on . We want to learn the objective function (with respect to the distribution :).


Learning Algorithms

Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.

Learning errors

Suppose the learning algorithm outputs h. The learning error can be measured by

One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.

And that's the reason why we try to learn

In other word, we claim that

The proof is easy.

We get


Then observe that,

  • The first term only depends on distribution
  • The third term is zero

Observe also that the term which is zero.


  • The second term is equal to

Example 1

When the class , i.e. classification problem, our objective reduces to

One can show that the function

minimizes the loss (proof omited)

Example 2

When , the problem is just like regression where we try to regress :

Ordinary Least Square

If the relation is linear,

OLS provably gives the minimum squared error.

Consider the error

Tikhonov Regularization

Week 2