1 Answers
The likelihood function describes the joint probability of the observed data as a function of the parameters of the chosen statistical model. For each specific parameter value θ {\displaystyle \theta } in the parameter space, the likelihood function p {\displaystyle p} therefore assigns a probabilistic prediction to the observed data X {\displaystyle X} . Since it is essentially the product of sampling densities, the likelihood generally encapsulates both the data-generating process as well as the missing-data mechanism that produced the observed sample.
To emphasize that the likelihood is a function of the parameters, while the sample is taken as given, it is often written as L {\displaystyle {\mathcal {L}}} . According to the likelihood principle, all of the information a given sample provides about θ {\displaystyle \theta } is expressed in the likelihood function. In maximum likelihood estimation, the value which maximizes the probability of observing the given sample, i.e. θ ^ = argmax θ ∈ Θ L {\displaystyle {\hat {\theta }}=\operatorname {argmax} _{\theta \in \Theta }{\mathcal {L}}} , serves as a point estimate for the parameter of the distribution from which the sample was drawn. Meanwhile in Bayesian statistics, the likelihood function serves as the conduit through which sample information influences p {\displaystyle p} , the posterior probability of the parameter, via Bayes' rule.