You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To simplify notation, we can index the weights of a hypothesis not by $n$ as previously $w(n(h))$), but directly with $h$ as $w(h)$.
Altogether, the SRM rule becomes:
Given an instance $x$, the algorithm Memorize returns the majority of known labels of instances in the sample $S$, which equal $x$. If there are no instances in $S$ equal to $x$ predict the majority of labels in $S$. It can be shown that this algorithm is consistent.
This example makes it clear, that consistency is too weak to capture "learning" (or to be more precise: Generalization abilities).
Comparison of Notions for Learning
The following table compares the three different notions of learning in terms of bounds and prior knowledge.
Bounds on the true error based on the empirical risk
Bounds on the quality of the output compared to the best hypothesis in the class based on the sample size
Encode prior knowledge
(agn.) PAC
$\checkmark$
$\checkmark$ (in advance)
$\checkmark$ (by specifying an $\mathcal{H}$)
Nonuniform
$\checkmark$ (e.g. Thm 7.4 in the book)
$\checkmark$ (depends on the best $h \in \mathcal{H}$)