ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"

รุ่นแก้ไขเมื่อ 07:45, 30 มีนาคม 2550

This page contains a list of topics, definitions, and results from Machine Learning course at University of Chicago.

เนื้อหา

1 Week 1: Introduction and OLS
2 Week 2

Week 1: Introduction and OLS

Learning problem

Given a distribution $\mathbb {P}$ on $X\times Y$ . We want to learn the objective function $f_{p}(x)=\mathbb {E} _{p}[y|x]$ (with respect to the distribution $\mathbb {P}$ ).

Learning Algorithms

Let Z be the set of possible samples. The learning algorithm is a function that maps a number of samples to a measurable function (denoted here by F a class of all measurable functions). Sometimes we consider a class of computable functions instead.

$A:\cup _{n=1}^{\infty }Z^{n}\rightarrow F$

Learning errors

Suppose the learning algorithm outputs h. The learning error can be measured by

$\int (y-h(x))^{2}dP$

One can prove that minimizing this quantity could be reduced to the problem of minimizing the following quantity.

$||f_{p}-h||_{l_{2}(\mathbb {P} )}^{2}=\int (f_{p}(x)-h(x))^{2}P_{x}(x)dx$

And that's the reason why we try to learn $\mathbb {E} _{p}[y|x]$

In other word, we claim that

$argmin_{h}\int (y-h(x))^{2}dP=argmin_{h}||f_{p}-h||_{l_{2}(\mathbb {P} )}^{2}$

The proof is easy.

$\int (y-h(x))^{2}dP=\int ((y-f_{p}(x))+(f_{p}(x)-h(x)))^{2})dP$

We get

$\int (y-h(x))^{2}dP=\int (y-f_{p}(x))^{2}dP+\int (f_{p}(x)-h(x))^{2}dP+2\int (y-f_{p}(x))(f_{p}(x)-h(x))dP$

Then observe that,

The first term only depends on distribution $\mathbb {P}$
The third term is zero

$\int \int (y-f_{p}(x))(f_{p}(x)-h(x))p(x,y)dydx=\int p(x)(f_{p}(x)-h(x))[\int (y-f_{p}(x))p(y|x)dy]dx$

Observe also that the term $\int (y-f_{p}(x))p(y|x)dy=\mathbb {E} [y-\mathbb {E} [y|x]|x]$ which is zero.

The second term is equal to

$\int _{X}\int _{Y}(f_{p}(x)-h(x))^{2}p(x,y)dydx=\int _{X}(f_{p}(x)-h(x))^{2}\int _{Y}p(x,y)dydx=||f_{p}-h||_{l_{2}(\mathbb {P} )}^{2}$

Example 1

When the class $Y=\{-1,1\}$ , i.e. classification problem, our objective $f_{p}(x)$ reduces to

f_{p}(x)=\mathbb {E} _{p}[y|x]=Pr[y=1|x]-Pr[y=-1|x]

One can show that the function

h_{*}(x)=sgn\mathbb {E} _{p}[y|x]

minimizes the loss (proof omited)

Example 2

When $Y=\mathbb {R}$ , the problem is just like regression where we try to regress : $f_{p}(x)$

Ordinary Least Square

If the relation is linear,

y=X\beta +\epsilon

OLS provably gives the minimum squared error.

Consider the error

||y-X\beta ||^{2}=(y-X\beta )^{T}(y-X\beta )

@@ แถว 55: / แถว 55: @@
 </math>
-=== Ordinary Least Square ===
+=== Example 1 ===
+When the class <math>Y= \{-1, 1\}</math>, i.e. classification problem, our objective <math>f_p(x)</math> reduces to
+:<math> f_p(x) = \mathbb{E}_p[y|x] = Pr[y=1 | x] - Pr[y=-1 |x]</math>
+One can show that the function
+:<math>h_{*}(x) = sgn \mathbb{E}_p[y|x]</math>
+minimizes the loss (proof omited)
+=== Example 2 ===
+When <math>Y = \mathbb{R}</math>, the problem is just like regression where we try to regress :<math>f_p(x)</math>
+=== Ordinary Least Square ===
+If the relation is linear,
+:<math>y= X \beta + \epsilon</math>
+OLS provably gives the minimum squared error.
+Consider the error
+:<math>||y- X \beta||^2 = (y- X\beta)^T (y-X \beta)</math>
 === Tikhonov Regularization ===
 == Week 2 ==

ผลต่างระหว่างรุ่นของ "Machine Learning at U of C"

รุ่นแก้ไขเมื่อ 07:45, 30 มีนาคม 2550

เนื้อหา

Week 1: Introduction and OLS

Learning problem

Learning Algorithms

Learning errors

Example 1

Example 2

Ordinary Least Square

Tikhonov Regularization

Week 2

รายการเลือกการนำทาง

เครื่องมือส่วนตัว

เนมสเปซ

สิ่งที่แตกต่าง

ดู

เพิ่มเติม

ค้นหา

การนำทาง

เครื่องมือ