Probability theory reference
Probability theory quantifies uncertainty and makes predictions under conditions of incomplete information. The theory rests on several essential axioms.
Consider a random experiment with a set of all possible outcomes
A probability measure
The triple
The complement rule states the following.
For any event
The union of two events follows the inclusion-exclusion principle.
This extends to multiple events.
Conditional probability captures how the occurrence of one event affects the probability of another. The conditional probability of
The definition applies when
Two events
This generalizes to multiple events.
For any subset
The law of total probability expresses the probability of an event as a weighted sum over a partition.
For any partition
Bayes' theorem relates conditional probabilities.
Using the law of total probability, we expand the denominator.
Random variables formalize the mapping from outcomes to numerical values. A random variable
The cumulative distribution function (CDF) of a random variable
CDF completely characterizes the distribution of
Discrete random variables take countably many values. The probability mass function (PMF) is defined as follows.
PMF satisfies
Continuous random variables have a probability density function (PDF)
PDF satisfies
The relationship between CDF and PDF is defined by these equations.
The expected value (mean) of a random variable
The variance measures the spread around the mean and is calculated using this formula.
The moment generating function (MGF) encapsulates all moments of X.
When it exists, MGF uniquely determines the distribution. The
The characteristic function always exists and is defined by this formula.
Bernoulli distribution models binary outcomes with parameter
Its mean is
The binomial distribution
The mean is
The geometric distribution models the number of trials until the first success. Its PMF is given by:
The mean is
The Poisson distribution
Both its mean and variance equal
The uniform distribution
The mean is
The normal distribution
The mean is
The standard normal distribution
For any normal random variable
The exponential distribution
The mean is
The gamma distribution
The mean is
Joint distributions describe the behavior of multiple random variables. For discrete random variables
For continuous random variables, the joint PDF
Marginal distributions are obtained by summing or integrating out variables.
Independence of random variables
The covariance measures the linear relationship between random variables. It is calculated using this formula.
The correlation coefficient normalizes the covariance with the following.
Values range from
Transformations of random variables follow specific rules. For a function
For linear transformations
For a strictly monotonic function
This generalizes to multivariate transformations through the Jacobian.
Sums of independent random variables exhibit special properties. If
The MGF of their sum is the product of the individual MGFs.
The convolution relation gives the PDF of
Central limit theorem (CLT) is one of the most important results in probability theory. Let
The symbol
CLT explains the ubiquity of the normal distribution in natural phenomena that result from many small, independent effects.
The law of large numbers (LLN) comes in two forms. The weak LLN states this convergence.
The strong LLN makes a stronger statement.
All of these laws formalize the intuition that the sample mean converges to the expected value as sample size increases.
Conditional expectations extend the concept of conditional probability to random variables. For discrete random variables
For continuous random variables, the definition is an integral.
The law of total expectation states the following.
The conditional variance formula decomposes the variance.
Markov chains model stochastic processes with the Markov property:
The future depends on the past only through the present state.
A discrete-time Markov chain has transition probabilities defined by:
The transition matrix
The
For an irreducible and aperiodic Markov chain, a unique stationary distribution
The chain converges to this distribution regardless of the initial state.
Order statistics examine the distributions of ranked random variables. Given i.i.d. random variables
The minimum
Bayesian statistics treats unknown parameters as random variables with prior distributions. Given data
The posterior distribution
A conjugate prior yields a posterior in the same family, simplifying computation. E.g. the Beta distribution is conjugate to the Bernoulli and Binomial likelihoods.
Sufficiency reduces data without losing information about parameters. A statistic
The Fisher-Neyman factorization theorem provides a way to identify sufficient statistics.
The functions
Maximum likelihood estimation (MLE) finds the parameter value
Often, we maximize the log-likelihood instead.
For i.i.d. observations, the log-likelihood has this form.
The method of moments estimates parameters by equating sample moments to theoretical moments.
Hypothesis testing evaluates evidence against a null hypothesis
Type I error occurs when rejecting a true
The likelihood ratio test compares the maximized likelihoods under
Under certain regularity conditions,
- ← Previous
Linux syscalls - Next →
Writing an AVX2 DGEMM kernel