LARGE DEVIATIONS THEORYAUTHOR:Md Ashif RezaMINI_PROJECT:MA202NAME:MD ASHIF REZAROLL:170101023BRANCH:CSEINTRODUCTIONIn probability theory, the theory of large deviations concerns the asymptotic behaviour ofremote tails of sequences of probability distributions. While some basic ideas of the theorycan be traced to Laplace, the formalization started with insurance mathematics, namely ruintheory with Cram©r and Lundberg. A unified formalization of large deviation theory wasdeveloped in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristicideas of concentration of measures and widely generalizes the notion of convergence ofprobability measures.
Roughly speaking, large deviations theory concerns itself with the exponential decline of theprobability measures of certain kinds of extreme or tail events.INTRODUCTORY EXAMPLESAn elementary exampleConsider a sequence of independent tosses of a fair coin. The possible outcomes could beheads or tails. Let us denote the possible outcome of the i-th trial by , where we encode headas 1 and tail as 0. Now let denote the mean value after trials, namely? ? =1?€‘ ?=1? ? .Then lies between 0 and 1.
From the law of large numbers (and also from our experience) weknow that as N grows, the distribution of converges to 0.5 = E [?] (the expectation value ofa single coin toss) almost surely.Moreover, by the central limit theorem, we know that ? ? is approximately normallydistributed for large . The central limit theorem can provide more detailed information aboutthe behavior of ? ? than the law of large numbers. For example, we can approximately finda tail probability of ? ? , ?(? ? > ?) that ? ? is greater than , for a fixed value of . However,the approximation by the CLT may not be accurate if is far from E [? ? ] unless is sufficientlylarge. Also, it does not provide information about the convergence of the tail probabilities as? ’ €ћ. However, the large deviation theory can provide answers for such problems.Let us make this statement more precise. For a given value 0.5 < ? < 1, let us compute thetail probability ?(? ? > ?). Define ?(?) = ?ln? + (1 €’ ?)ln (1 €’ ?) + ln2.Note that the function is a convex, nonnegative function that is zero at x=1/2 and increases asyou move to x=1. It is the negative of the Bernoulli entropy with p=1/2; that it’s appropriatefor coin tosses follows from the asymptotic equipartition property applied to a Bernoulli trial.Then by Chernoff’s inequality, it can be shown that ?(? ? > ?) < exp (€’?(?)). Thisbound is rather sharp, in the sense that ?(?) cannot be replaced with a larger number whichwould yield a strict inequality for all positive [3] (However, the exponential bound can still bereduced by a subexponential factor on the order of 1/€?; this follows from the Stirlingapproximation applied to the binomial coefficient appearing in the Bernoulli distribution.)Hence, we obtain the following result: ?(? ? > ?) ‰€ exp (€’?(?)). The probability?(? ? > ?) decays exponentially as grows to infinity, at a rate depending on x. This formulaapproximates any tail probability of the sample mean of i.i.d. variables and gives itsconvergence as the number of samples increases.LARGE DEVIATIONS FOR SUMS OF INDEPENDENT RANDOMVARIABLESIn the above example of coin-tossing we explicitly assumed that each toss is an independent trial,and the probability of getting head or tail is always the same.Let be independent and identically distributed (i.i.d.) random variables whose common distributionsatisfies a certain growth condition. Then the following limit exists: lim?’€ћ1?ln?(? ? > ?) = €’?(?).Here M N =? ? =1?€‘ ?=1? ? , as before Function is called the “rate function” or “Cram©rfunction” or sometimes the “entropy function”.The above-mentioned limit means that for large N,?(? ? > ?) ‰€ exp [€’?(?)], which is the basic result of large deviations theory.If we know the probability distribution of , an explicit expression for the rate function can beobtained. This is given by a Legendre”Fenchel transformation,?(?) = sup?>0 [? €’ ?(?)],is called the cumulant generating function (CGF) and denotes the mathematical expectation.If follows a normal distribution, the rate function becomes a parabola with its apex at themean of the normal distribution.If {? ? } is a Markov chain, the variant of the basic large deviations result stated above mayhold.Given a Polish space let {“™ ? } be a sequence of Borel probability measures on , let {? ? } be asequence of positive real numbers such that lim? ? = +€ћ , and finally let ?:? ’ [0,+€ћ] bea lower semicontinuous functional on ?. The Sequence {“™ ? } is said to satisfy a largedeviation principle with speed {? ? } and rate if, and only if, for each Borel measurable set? ‚ ?, €’ inf?€€? € ?(?) ‰¤ lim _? ?€’1 log (“™ ? (?)) ‰¤ lim? ?€’1 log (“™ ? (?)) ‰¤ €’inf?€€? ?(?), where ?and? € denote respectively the closure and interior of ?.CONCLUSIONSAPPLICATIONSPrinciples of large deviations may be effectively applied to gather information out of a probabilisticmodel. Thus, theory of large deviations finds its applications in information theory and riskmanagement. In physics, the best known application of large deviations theory arise inthermodynamics and statistical mechanics (in connection with relating entropy with rate function).Large deviations and entropyThe rate function is related to the entropy in statistical mechanics. This can be heuristicallyseen in the following way. In statistical mechanics the entropy of a particular macro-state isrelated to the number of micro-states which corresponds to this macro-state. In our cointossing example the mean value ? ? could designate a particular macro-state. And theparticular sequence of heads and tails which gives rise to a particular value of ? ?constitutes a particular micro-state. Loosely speaking a macro-state having a higher numberof micro-states giving rise to it, has higher entropy. And a state with higher entropy has ahigher chance of being realised in actual experiments. The macro-state with mean value of1/2 (as many heads as tails) has the highest number of micro-states giving rise to it and it isindeed the state with the highest entropy. And in most practical situations we shall indeedobtain this macro-state for large numbers of trials. The “rate function” on the other handmeasures the probability of appearance of a particular macro-state. The smaller the ratefunction the higher is the chance of a macro-state appearing. In our coin-tossing the value ofthe “rate function” for mean value equal to 1/2 is zero. In this way one can see the “ratefunction” as the negative of the “entropy”.There is a relation between the “rate function” in large deviations theory and the Kullback”Leibler divergence and Novak.In a special case, large deviations are closely related to the concept of Gromov”Hausdorfflimits.REFERENCES1. S.R.S. Varadhan, Asymptotic probability and differential equations, Comm. Pure Appl. Math.19 (1966),261-286.2. “Large deviations for performance analysis : queues, communications, and computing”,Shwartz, Adam, 1953- TN: 1228486.3. Varadhan, S.R.S.,The Annals of Probability 2008, Vol. 36, No. 2, 397″419.4. S.R.S. Varadhan, Large Deviations and Applications (SIAM, Philadelphia, 1984).6. Touchette, Hugo (1 July 2009). “The large deviation approach to statistical mechanics”.Physics Reports. 478 (1″3): 1″69.