Anki Deck Changes

Commit: f107f764 - Add IML week 1 notes

Author: Jonas B <65017752+Scr1pting@users.noreply.github.com>

Date: 2026-02-22T20:38:31+01:00

Changes: 12 note(s) changed (12 added, 0 modified, 0 deleted)

Note 1: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: @%wrCnWSYb
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::2._Gradient
The loss gradient is n-dimensional, where each axis corresponds to a parameter of the model. We seek the global minimum, i.e. where the mean loss is smallest.

Back

ETH::Electives::IML::1._Regression::2._Gradient
The loss gradient is n-dimensional, where each axis corresponds to a parameter of the model. We seek the global minimum, i.e. where the mean loss is smallest.
Field-by-field Comparison
Field Before After
Text The loss gradient is {{c1::n-dimensional}}, where each axis corresponds to {{c2::a parameter of the model}}. We seek the {{c3::global minimum}}, i.e. where the {{c3::mean loss is smallest}}.
Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 2: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: Cz3eAYn$g6
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
The absolute (L1) loss is defined as \(\ell_{\text{abs}}(r) = |r|\). Its main drawback is that it is not differentiable at zero.

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
The absolute (L1) loss is defined as \(\ell_{\text{abs}}(r) = |r|\). Its main drawback is that it is not differentiable at zero.
Field-by-field Comparison
Field Before After
Text The {{c1::absolute (L1)}} loss is defined as \(\ell_{\text{abs}}(r) = {{c2::|r|}}\). Its main drawback is that it is {{c3::not differentiable at zero}}.
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 3: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: Ek0p90pSOj
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
Why does the Huber loss use \(\delta|r| - \frac{1}{2}\delta^2\) (not just \(\delta|r|\)) for the linear region?

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
Why does the Huber loss use \(\delta|r| - \frac{1}{2}\delta^2\) (not just \(\delta|r|\)) for the linear region?

To ensure continuous differentiability at \(r = \pm\delta\):
  • The slopes must match: \(L_{\text{square}}' = r\), so the abs part gets slope \(\pm\delta\).
  • The y-values must match: square gives \(\frac{1}{2}\delta^2\) but \(\delta|\pm\delta| = \delta^2\), so we subtract \(\frac{1}{2}\delta^2\).
Field-by-field Comparison
Field Before After
Front Why does the Huber loss use \(\delta|r| - \frac{1}{2}\delta^2\) (not just \(\delta|r|\)) for the linear region?
Back To ensure <b>continuous differentiability</b> at \(r = \pm\delta\):<br><ul><li>The slopes must match: \(L_{\text{square}}' = r\), so the abs part gets slope \(\pm\delta\).</li><li>The y-values must match: square gives \(\frac{1}{2}\delta^2\) but \(\delta|\pm\delta| = \delta^2\), so we subtract \(\frac{1}{2}\delta^2\).</li></ul>
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 4: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: JHuwTa2Ri*
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
When computing loss over multiple datapoints, we take the average loss across all points.

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
When computing loss over multiple datapoints, we take the average loss across all points.

e.g. mean squared error or mean Huber loss
Field-by-field Comparison
Field Before After
Text When computing loss over {{c1::multiple datapoints}}, we take the {{c2::average loss}} across all points.
Extra e.g. mean squared error or mean Huber loss
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 5: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: RJES!%$LBz
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::2._Gradient
Derive the normal equation for MSE regression from first principles.

Back

ETH::Electives::IML::1._Regression::2._Gradient
Derive the normal equation for MSE regression from first principles.

We want \(\hat{\boldsymbol{w}}\) minimizing \(\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2\).

Set gradient to zero:
\(\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0\)
\(\iff \boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}\)
Field-by-field Comparison
Field Before After
Front Derive the <b>normal equation</b> for MSE regression from first principles.
Back We want \(\hat{\boldsymbol{w}}\) minimizing \(\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2\).<br><br>Set gradient to zero:<br>\(\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0\)<br>\(\iff \boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}\)
Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 6: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: U1Eb8hej*j
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::2._Gradient
Write the MSE regression objective in matrix form and state the optimal weight vector.

Back

ETH::Electives::IML::1._Regression::2._Gradient
Write the MSE regression objective in matrix form and state the optimal weight vector.

Objective: \(\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2\)

Optimal weights: \(\hat{\boldsymbol{w}} = \arg\min_{\boldsymbol{w} \in \mathbb{R}^d} \|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2\)
Field-by-field Comparison
Field Before After
Front Write the MSE regression objective in matrix form and state the optimal weight vector.
Back Objective: \(\|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2\)<br><br>Optimal weights: \(\hat{\boldsymbol{w}} = \arg\min_{\boldsymbol{w} \in \mathbb{R}^d} \|\boldsymbol{y} - \boldsymbol{X}\boldsymbol{w}\|_2^2\)
Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 7: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: UWJy4WStr%
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
What is a loss function in regression?

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
What is a loss function in regression?

A function \(\ell(r)\) that characterizes how "bad" a prediction is, where \(r = y - \hat{y}\) is the residual.
Field-by-field Comparison
Field Before After
Front What is a <b>loss function</b> in regression?
Back A function \(\ell(r)\) that characterizes how "bad" a prediction is, where \(r = y - \hat{y}\) is the residual.
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 8: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: g6K#9WP@pG
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
The advantage of Huber loss over L1 and L2 is that it is less sensitive to outliers than L2 while still being differentiable everywhere (unlike L1).

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
The advantage of Huber loss over L1 and L2 is that it is less sensitive to outliers than L2 while still being differentiable everywhere (unlike L1).
Field-by-field Comparison
Field Before After
Text The advantage of Huber loss over L1 and L2 is that it is {{c1::less sensitive to outliers than L2}} while still being {{c2::differentiable everywhere (unlike L1)}}.
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 9: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: k0Ztc^E2Jy
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::2._Gradient
Setting \(\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 0\) gives \({{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}\), which simplifies to the normal equation: \({{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}\).

Back

ETH::Electives::IML::1._Regression::2._Gradient
Setting \(\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 0\) gives \({{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}\), which simplifies to the normal equation: \({{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}\).

At a minimum the gradient must be zero. The factor 2 cancels out, leaving the normal equation.
Field-by-field Comparison
Field Before After
Text Setting \(\nabla_{\boldsymbol{w}} \|\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{w}}\|_2^2 = 0\) gives \({{c1::2\boldsymbol{X}^\top(\boldsymbol{X}\hat{\boldsymbol{w}} - \boldsymbol{y}) = 0}}\), which simplifies to the <b>normal equation</b>: \({{c2::\boldsymbol{X}^\top \boldsymbol{X} \hat{\boldsymbol{w}} = \boldsymbol{X}^\top \boldsymbol{y}}}\).
Extra At a minimum the gradient must be zero. The factor 2 cancels out, leaving the normal equation.
Tags: ETH::Electives::IML::1._Regression::2._Gradient

Note 10: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: rthT$U8vno
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
The square (L2) loss is defined as \(\ell(r) = {{c2:::\frac{1}{2} r^2}}\). The \(\frac{1}{2}\) factor is included because it makes the derivative clean: \(L_{\text{square' = r\)}}.

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
The square (L2) loss is defined as \(\ell(r) = {{c2:::\frac{1}{2} r^2}}\). The \(\frac{1}{2}\) factor is included because it makes the derivative clean: \(L_{\text{square' = r\)}}.

Very sensitive to outliers since the residual is squared.
Field-by-field Comparison
Field Before After
Text The {{c1::square (L2)}} loss is defined as \(\ell(r) = {{c2:::\frac{1}{2} r^2}}\). The \(\frac{1}{2}\) factor is included because {{c3::it makes the derivative clean: \(L_{\text{square}}' = r\)}}.
Extra Very sensitive to outliers since the residual is squared.
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 11: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Classic
GUID: uMZ9&uAV6k
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
Write the definition of the Huber loss.

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
Write the definition of the Huber loss.

\[\ell_{\text{huber}}(r) = \begin{cases} \frac{1}{2} r^2 & |r| \le \delta, \\ \delta |r| - \frac{1}{2} \delta^2 & |r| > \delta. \end{cases}\]
Uses square loss for \([-\delta, \delta]\) and absolute loss outside.
Field-by-field Comparison
Field Before After
Front Write the definition of the <b>Huber loss</b>.
Back \[\ell_{\text{huber}}(r) = \begin{cases} \frac{1}{2} r^2 &amp; |r| \le \delta, \\ \delta |r| - \frac{1}{2} \delta^2 &amp; |r| &gt; \delta. \end{cases}\]<br>Uses square loss for \([-\delta, \delta]\) and absolute loss outside.
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions

Note 12: ETH::Electives::IML

Deck: ETH::Electives::IML
Note Type: Horvath Cloze
GUID: xwS2i87ME6
added

Previous

Note did not exist

New Note

Front

ETH::Electives::IML::1._Regression::1._Loss_Functions
The asymmetric loss is \(\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}\). A higher \(\tau\) means steeper penalty for over-shooting (positive residual), lower \(\tau\) means steeper penalty for under-shooting.

Back

ETH::Electives::IML::1._Regression::1._Loss_Functions
The asymmetric loss is \(\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}\). A higher \(\tau\) means steeper penalty for over-shooting (positive residual), lower \(\tau\) means steeper penalty for under-shooting.

Like absolute loss, but with two different slopes on each side of the y-axis.
Field-by-field Comparison
Field Before After
Text The asymmetric loss is \(\ell_\tau(r) = {{c1::\tau \max\{r,0\} + (1-\tau)\max\{-r,0\}}}\). A higher \(\tau\) means {{c2::steeper penalty for over-shooting (positive residual)}}, lower \(\tau\) means {{c3::steeper penalty for under-shooting}}.
Extra Like absolute loss, but with two different slopes on each side of the y-axis.
Tags: ETH::Electives::IML::1._Regression::1._Loss_Functions
↑ Top