1 Answers
In machine learning and mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions in classification problems. Given X {\displaystyle {\mathcal {X}}} as the space of all possible inputs , and Y = { − 1 , 1 } {\displaystyle {\mathcal {Y}}=\{-1,1\}} as the set of labels , a typical goal of classification algorithms is to find a function f : X → R {\displaystyle f:{\mathcal {X}}\to \mathbb {R} } which best predicts a label y {\displaystyle y} for a given input x → {\displaystyle {\vec {x}}}. However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same x → {\displaystyle {\vec {x}}} to generate different y {\displaystyle y}. As a result, the goal of the learning problem is to minimize expected loss , defined as
where V , y ] {\displaystyle V,y]} is a given loss function, and p {\displaystyle p} is the probability density function of the process that generated the data, which can equivalently be written as
Within classification, several commonly used loss functions are written solely in terms of the product of the true label y {\displaystyle y} and the predicted label f {\displaystyle f}. Therefore, they can be defined as functions of only one variable υ = y f {\displaystyle \upsilon =yf} , so that V , y ] = ϕ ] = ϕ {\displaystyle V,y]=\phi ]=\phi } with a suitably chosen function ϕ : R → R {\displaystyle \phi :\mathbb {R} \to \mathbb {R} }. These are called margin-based loss functions. Choosing a margin-based loss function amounts to choosing ϕ {\displaystyle \phi }. Selection of a loss function within this framework impacts the optimal f ϕ ∗ {\displaystyle f_{\phi }^{*}} which minimizes the expected risk.
In the case of binary classification, it is possible to simplify the calculation of expected risk from the integral specified above. Specifically,