Anki Deck Changes

Note 1: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: @%wrCnWSYb

added

New Note

Front

The loss gradient is n-dimensional, where each axis corresponds to a parameter of the model. We seek the global minimum, i.e. where the mean loss is smallest.

Back

The loss gradient is n-dimensional, where each axis corresponds to a parameter of the model. We seek the global minimum, i.e. where the mean loss is smallest.

Field-by-field Comparison

Field	Before	After
Text		The loss gradient is {{c1::n-dimensional}}, where each axis corresponds to {{c2::a parameter of the model}}. We seek the {{c3::global minimum}}, i.e. where the {{c3::mean loss is smallest}}.

Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 2: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: Cz3eAYn$g6

added

New Note

Front

The absolute (L1) loss is defined as $\ell_{\text{abs}}(r) = |r|$. Its main drawback is that it is not differentiable at zero.

Back

The absolute (L1) loss is defined as $\ell_{\text{abs}}(r) = |r|$. Its main drawback is that it is not differentiable at zero.

Field-by-field Comparison

Field	Before	After
Text		The {{c1::absolute (L1)}} loss is defined as $\ell_{\text{abs}}(r) = {{c2::\|r\|}}$. Its main drawback is that it is {{c3::not differentiable at zero}}.

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 3: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: Ek0p90pSOj

added

New Note

Front

Why does the Huber loss use $\delta|r| - \frac{1}{2}\delta^2$ (not just $\delta|r|$) for the linear region?

Back

Why does the Huber loss use $\delta|r| - \frac{1}{2}\delta^2$ (not just $\delta|r|$) for the linear region?

To ensure continuous differentiability at $r = \pm\delta$:

The slopes must match: $L_{\text{square}}' = r$, so the abs part gets slope $\pm\delta$.
The y-values must match: square gives $\frac{1}{2}\delta^2$ but $\delta|\pm\delta| = \delta^2$, so we subtract $\frac{1}{2}\delta^2$.

Field-by-field Comparison

Field	Before	After
Front		Why does the Huber loss use $\delta\|r\| - \frac{1}{2}\delta^2$ (not just $\delta\|r\|$) for the linear region?
Back		To ensure <b>continuous differentiability</b> at $r = \pm\delta$:<br><ul><li>The slopes must match: $L_{\text{square}}' = r$, so the abs part gets slope $\pm\delta$.</li><li>The y-values must match: square gives $\frac{1}{2}\delta^2$ but $\delta\|\pm\delta\| = \delta^2$, so we subtract $\frac{1}{2}\delta^2$.</li></ul>

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 4: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: JHuwTa2Ri*

added

New Note

Front

When computing loss over multiple datapoints, we take the average loss across all points.

Back

When computing loss over multiple datapoints, we take the average loss across all points.

e.g. mean squared error or mean Huber loss

Field-by-field Comparison

Field	Before	After
Text		When computing loss over {{c1::multiple datapoints}}, we take the {{c2::average loss}} across all points.
Extra		e.g. mean squared error or mean Huber loss

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 5: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: RJES!%$LBz

added

New Note

Front

Derive the normal equation for MSE regression from first principles.

Back

Derive the normal equation for MSE regression from first principles.

We want $\hat{\boldsymbol{w}}$ minimizing $\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2$.

Set gradient to zero:
$\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0$
$\iff \boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}$

Field-by-field Comparison

Field	Before	After
Front		Derive the <b>normal equation</b> for MSE regression from first principles.
Back		We want $\hat{\boldsymbol{w}}$ minimizing $\\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\\|_2^2$.<br><br>Set gradient to zero:<br>$\nabla_{\boldsymbol{w}} \\|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\\|_2^2 = 2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0$<br>$\iff \boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}$

Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 6: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: U1Eb8hej*j

added

New Note

Front

Write the MSE regression objective in matrix form and state the optimal weight vector.

Back

Write the MSE regression objective in matrix form and state the optimal weight vector.

Objective: $\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2$

Optimal weights: $\hat{\boldsymbol{w}} = \arg\min_{\boldsymbol{w} \in \mathbb{R}^d} \|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2$

Field-by-field Comparison

Field	Before	After
Front		Write the MSE regression objective in matrix form and state the optimal weight vector.
Back		Objective: $\\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\\|_2^2$<br><br>Optimal weights: $\hat{\boldsymbol{w}} = \arg\min_{\boldsymbol{w} \in \mathbb{R}^d} \\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\\|_2^2$

Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 7: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: UWJy4WStr%

added

New Note

Front

What is a loss function in regression?

Back

What is a loss function in regression?

A function $\ell(r)$ that characterizes how "bad" a prediction is, where $r = y - \hat{y}$ is the residual.

Field-by-field Comparison

Field	Before	After
Front		What is a <b>loss function</b> in regression?
Back		A function $\ell(r)$ that characterizes how "bad" a prediction is, where $r = y - \hat{y}$ is the residual.

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 8: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: g6K#9WP@pG

added

New Note

Front

The advantage of Huber loss over L1 and L2 is that it is less sensitive to outliers than L2 while still being differentiable everywhere (unlike L1).

Back

The advantage of Huber loss over L1 and L2 is that it is less sensitive to outliers than L2 while still being differentiable everywhere (unlike L1).

Field-by-field Comparison

Field	Before	After
Text		The advantage of Huber loss over L1 and L2 is that it is {{c1::less sensitive to outliers than L2}} while still being {{c2::differentiable everywhere (unlike L1)}}.

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 9: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: k0Ztc^E2Jy

added

New Note

Front

Setting $\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 0$ gives ${{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}$, which simplifies to the normal equation: ${{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}$.

Back

Setting $\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 0$ gives ${{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}$, which simplifies to the normal equation: ${{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}$.

At a minimum the gradient must be zero. The factor 2 cancels out, leaving the normal equation.

Field-by-field Comparison

Field	Before	After
Text		Setting $\nabla_{\boldsymbol{w}} \\|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\\|_2^2 = 0$ gives ${{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}$, which simplifies to the <b>normal equation</b>: ${{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}$.
Extra		At a minimum the gradient must be zero. The factor 2 cancels out, leaving the normal equation.

Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 10: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: rthT$U8vno

added

New Note

Front

The square (L2) loss is defined as $\ell(r) = {{c2:::\frac{1}{2} r^2}}$. The $\frac{1}{2}$ factor is included because it makes the derivative clean: $L_{\text{square' = r$}}.

Back

The square (L2) loss is defined as $\ell(r) = {{c2:::\frac{1}{2} r^2}}$. The $\frac{1}{2}$ factor is included because it makes the derivative clean: $L_{\text{square' = r$}}.

Very sensitive to outliers since the residual is squared.

Field-by-field Comparison

Field	Before	After
Text		The {{c1::square (L2)}} loss is defined as $\ell(r) = {{c2:::\frac{1}{2} r^2}}$. The $\frac{1}{2}$ factor is included because {{c3::it makes the derivative clean: $L_{\text{square}}' = r$}}.
Extra		Very sensitive to outliers since the residual is squared.

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 11: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: uMZ9&uAV6k

added

New Note

Front

Write the definition of the Huber loss.

Back

Write the definition of the Huber loss.

\[\ell_{\text{huber}}(r) = \begin{cases} \frac{1}{2} r^2 & |r| \le \delta, \\ \delta |r| - \frac{1}{2} \delta^2 & |r| > \delta. \end{cases}\]
Uses square loss for $[-\delta, \delta]$ and absolute loss outside.

Field-by-field Comparison

Field	Before	After
Front		Write the definition of the <b>Huber loss</b>.
Back		\[\ell_{\text{huber}}(r) = \begin{cases} \frac{1}{2} r^2 & \|r\| \le \delta, \\ \delta \|r\| - \frac{1}{2} \delta^2 & \|r\| > \delta. \end{cases}\]<br>Uses square loss for $[-\delta, \delta]$ and absolute loss outside.

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 12: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: xwS2i87ME6

added

New Note

Front

The asymmetric loss is $\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}$. A higher $\tau$ means steeper penalty for over-shooting (positive residual), lower $\tau$ means steeper penalty for under-shooting.

Back

The asymmetric loss is $\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}$. A higher $\tau$ means steeper penalty for over-shooting (positive residual), lower $\tau$ means steeper penalty for under-shooting.

Like absolute loss, but with two different slopes on each side of the y-axis.

Field-by-field Comparison

Field	Before	After
Text		The asymmetric loss is $\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}$. A higher $\tau$ means {{c2::steeper penalty for over-shooting (positive residual)}}, lower $\tau$ means {{c3::steeper penalty for under-shooting}}.
Extra		Like absolute loss, but with two different slopes on each side of the y-axis.

Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Field	Before	After
Front		Why does the Huber loss use \(\delta\|r\| - \frac{1}{2}\delta^2\) (not just \(\delta\|r\|\)) for the linear region?
Back		To ensure <b>continuous differentiability</b> at \(r = \pm\delta\):<br><ul><li>The slopes must match: \(L_{\text{square}}' = r\), so the abs part gets slope \(\pm\delta\).</li><li>The y-values must match: square gives \(\frac{1}{2}\delta^2\) but \(\delta\|\pm\delta\| = \delta^2\), so we subtract \(\frac{1}{2}\delta^2\).</li></ul>

Field	Before	After
Text		Setting \(\nabla_{\boldsymbol{w}} \\|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\\|_2^2 = 0\) gives \({{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}\), which simplifies to the <b>normal equation</b>: \({{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}\).
Extra		At a minimum the gradient must be zero. The factor 2 cancels out, leaving the normal equation.

Field	Before	After
Text		The asymmetric loss is \(\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}\). A higher \(\tau\) means {{c2::steeper penalty for over-shooting (positive residual)}}, lower \(\tau\) means {{c3::steeper penalty for under-shooting}}.
Extra		Like absolute loss, but with two different slopes on each side of the y-axis.

Note 1: ETH::Electives::IML

Previous

New Note

Front

Back

Note 2: ETH::Electives::IML

Previous

New Note

Front

Back

Note 3: ETH::Electives::IML

Previous

New Note

Front

Back

Note 4: ETH::Electives::IML

Previous

New Note

Front

Back

Note 5: ETH::Electives::IML

Previous

New Note

Front

Back

Note 6: ETH::Electives::IML

Previous

New Note

Front

Back

Note 7: ETH::Electives::IML

Previous

New Note

Front

Back

Note 8: ETH::Electives::IML

Previous

New Note

Front

Back

Note 9: ETH::Electives::IML

Previous

New Note

Front

Back

Note 10: ETH::Electives::IML

Previous

New Note

Front

Back

Note 11: ETH::Electives::IML

Previous

New Note

Front

Back

Note 12: ETH::Electives::IML

Previous

New Note

Front

Back