A draft version of the MS can be found here (last updated, Aug 23, 2011; note that the page numbering there differs from that of the published version). It is based primarily on these lecture notes.

Pre-errata (errors in the draft that were corrected in the published version):

p. 20: In Exercise 1.1.11(i), “if and only for” should be “if and only if for”.

p. 21: In Exercise 1.1.18, in the definition of convexity, should be .

p. 46: In Exercise 1.3.16, Weilandt should be Wielandt. Similarly on p. 47 after Exercise 1.3.9, in Exercise 1.3.22(v) on page 53, on page 137 before (2.82), on page 184 after (2.129), and on page 208 before 2.6.6. Also, before (1.66), the supremum should be over rather than .

p. 72: All occurrences of on this page should be .

p. 183: The formula (2.127) should be attributed to Dyson ( The three fold way, J. Math. Phys. vol. 3 (1962) pgs. 1199-1215) rather than to Ginibre. Similarly on pages 251, 259, and 265.

p. 225-226: U should be U_0 (several occurrences). Also, should be and should be .

p. 225, Section 2.8.2: right parenthesis should be added after “sufficient decay at infinity.”

Page 37: After (1.49), “uniform lower bound” should be “uniform upper bound”, and after the second line of the following display, should be .

Page 41: In the proof of Theorem 1.3.1, should strictly speaking be (though it makes no difference to the remainder of the argument).

Page 41: In Exercise 1.3.1, should be .

Page 49: After (1.72), should be , and “orthogonal” should be “orthogonal (using the real part of the inner product)”.

Page 51: In Section 1.3.6, the role of rows and columns should be reversed in “at least as many rows as columns”.

Page 53: In Exercise 1.3.22 (vi), (vii), the eigenvalues should be replaced by singular values .

Page 61: In the proof of Theorem 2.1.3, should be throughout (three occurrences), and should be . Also, the reference to (2.9) here may be replaced by (2.10). In the second last line of the proof of Lemma 2.1.2, a closing parenthesis is missing.

Page 68: In the last display of Proposition 2.1.9, should be .

Page 69: In Theorem 2.1.10, should be . In (2.15), should be .

Page 70: The footnote “Note that we must have …” should read “Note that we should take if we wish to allow the variance to actually be able to attain the value “.

Page 74: In the proof of Lemma 2.1.14, should be (two times).

Page 74: In the proof of Lemma 2.1.15, should be , and should be “one of or “.

Page 76: In the proof of Lemma 2.1.16, after (2.24), the expectation in the next two expressions should instead be conditional expectation with respect to .

Page 78: In the proof of Proposition 2.1.19, the definitions of and are missing absolute value signs (they should be and respectively).

Page 88:At and just before (2.4.1), should be . (Also to avoid having to deal with distributions, one should temporarily truncate to, say, and then let go to infinity at the end of the argument.) In the display after (2.40), should be . Just before the last display in the proof of Theorem 2.2.8, should be .

Page 90: In Theorem 2.2.9(i), k should range in 1,2,3,… rather than 0,1,2,… .

Page 95: In the first display of the proof of Theorem 2.2.11, should be .

Page 97: Near the end of Section 2.2.5: [TaVuKr2010] should be [TaVu2009b].

Page 98: In the proof of Theorem 2.2.13, should be assumed to be Lipschitz and not just continuous.

Page 99: In the final display, every term should have an expectation symbol attached to it.

Page 107: After (2.58), should be (two occurrences).

Page 113: Before (2.62), the symbol should be .

Page 114: In Proposition 2.3.10, should be (two occurrences).

Page 117: In item (ii) i the list after (2.69), the condition should be added.

Page 126, second line: the error term needs to be improved to .

Page 127: In the proof of Lemma 2.3.22, the first arrival can be either a fresh leg or a high multiplicity edge, not simply a fresh leg as stated in the text. However, this does not affect the rest of the argument.

Page 128: For each non-innovative leg, one also needs to record a leg that had already departed from the vertex that one is revisiting; this increases the total combinatorial cost here from to (and the first display should be similarly adjusted). However, the rest of the argument remains unchanged. In the last display and the first display of the next page, should be .

Page 130: The statement “(2.76) holds” should read “(2.76) fails”.

Page 157: In the discussion of classical independence in Section 2.5, “all of vanishes” should be “ both vanish”.

Page 170: Before Exercise 2.5.10, the constraint should be .

Page 174: In Exercise 2.5.15, the additional hypothesis that X and Y are self-adjoint should be added. In Exercise 2.5.16, add “Show more generally that and are freely independent for any polynomials .

Page 175: In the second display, an extra right parenthesis should be added to the left-hand side.

Page 176: In the proof of Lemma 2.5.20, should be . Also, should be .

Page 181: The formulae for in Exercises 2.5.20 and Exercises 2.5.21 should be swapped with each other. Also, the formula for the third cumulant is incorrect; this quantity is in fact equal to the third free cumulant (but and are not equal in general).

Page 183: In (2.127), the factor is missing from the denominator.

Page 184: In the paragraph before (2.128), “eigenvalues of ” should be “eigenvalues of “.

Page 187-188: The derivation of the Ginibre formula requires modification, because the claim that the space of upper triangular matrices is preserved with respect to conjugation by permutation matrices is incorrect. Instead, the given data needs to be replaced by a pair consisting of the random matrix , together with a random enumeration of the eigenvalues of , and the factorisation is then subjected to the constraint that has diagonal entries in that order. (To put it another way, one works in an n!-fold cover of the space of matrices with simple spectrum.) One then performs the analysis in the text, with the enumeration of the eigenvalues of a perturbation of understood to be the one associated with the diagonal entries of . (Details may be found at the associated blog entry for this section.)

Page 191: In the last line in the paragraph after (2.137), should be .

Page 192: In Footnote 52 to Section 2.6.3, the exponent should be instead.

Page 203: In Exercise 2.6.6, a factor of is missing in the error term. Earlier in the eigenfunction equation for , should be .

Page 206: In Remark 2.6.8, the denominator in the first display should instead be in the numerator, and similarly for (2.169); the denominator two displays afterwards should similarly be .

Page 212: For the application of Markov’s inequality and through to the next page, all appearances of should be replaced by , and “for at least values of ” should be “for at least values of . Any appearance of should instead be .

Page 213: In Exercise 2.7.1, should be , the condition should be , the final bound should be rather than , and should be . The definition of incompressibility should be , with to be chosen later, in the next display should be , and “within … positions” on the next paragraph should be “within … positions”. Finally, in footnote 58, the summation should go up to rather than to in both occurrences.

Page 214: should be (two occurrences), and should be in Exercise 2.7.2.

Page 215: In the last line “Proposition 2.7.3” should be “Proposition 2.7.3 and (2.172)”, and on the next page, should be (two occurrences).

Page 217: In Exercise 2.7.3, should be . “” should be “ is comparable to .”

Page 218. After the first complete sentence, add “This of course contains the event that .”

p. 19, Examples 1.1.9: Cauchy distribution is referred to here but not defined anywhere. Also, numbered examples have the heading “Examples x.y.z”. Should it be “Example x.y.z”?

p. 20, Lemma 1.1.10, part (iii): The text before this lemma says this is a consequence of eq. (1.23) but this one, in contrast to parts (i) and (ii), seems to directly follow from definition of sub-exponential tails.

p. 20, Exercise 1.1.4, line 2: “there exist C” -> “there exists C”

p. 20, Exercise 1.1.5, line 2: same as above

p. 20, sentence after eq. (1.26): “Upper bounds … are known as large deviation inequality.” Last word should be “inequalities”.

p. 20, 5th line from the bottom: missing space in “Chebyshev’s inequality(1.26)”

Further comments from p. 21 up to the end of Section 1.1 (p. 41 of the manuscript)

p. 24, Exercise 1.1.12, 2nd line: “subsapce” -> “subspace”

p. 26, eq. (1.29): missing period

p. 26, Remark 1.1.14, last line: “we will discuss in the next set of notes” Does this change to “next chapter” etc. now that the notes are being converted into a book?

p. 27, 5th line from the bottom: “underyling” -> “underlying”

p. 33, Example 1.1.20, last sentence: linearity of condition expectation (and that it follows from linearity of unconditional expectation) is currently mentioned as part of this example. For unconditional expectation, linearity was emphasized and placed in the main text. Perhaps do the same here?

p. 36, Remark 1.1.27, 3rd line: extra semi-colon

[Thanks; these corrections will appear in the next revision of the ms.]

[…] finished writing the first draft of my second book coming out of the 2010 blog posts, namely “Topics in random matrix theory“, which was based primarily on my graduate course in the topic, though it also contains […]

section 1.2 (Stirling’s formula): The build-up to the final result (1.49) is very enlightening and I thoroughly enjoyed every bit of it! The use of techniques of increasing sophistication to derive the successively better approximations (1.43,44), (1.45), (1.46) and (1.49) is very nice. The observations recorded in footnote 7 (“one can often get a non-terrible bound for a series by using the largest term”) and Remark 1.2.2 (“near these maxima, e^\phi(x) usually behaves like a rescaled Gaussian”) show that these techniques have wider applicability.

p. 45, two times in the paragraph right after Exercise 1.1.2: “gaussian” -> “Gaussian”

p. 46, line 6: “constrains” -> “constrain”

p. 46, third line from the bottom: It is clear that by matrix C you mean the sum A+B but this is not mentioned anywhere.

p. 47, paragraph in the middle of the page: missing space in “augmented matrix(2.80)”

p. 53, section 1.3.3: It is amazing how all these named inequalities (Weyl, Lidskii, Ky-Fan) all follow from the minimax formulae. Remark 1.3.6 whets the reader’s appetite for more!

p. 53, last line: two periods after the reference “[Klyachko]”

p. 55, Exercise 1.3.7: “p-Schatten norms are indeed a norm” -> “p-Schatten norm is indeed a norm” (or is it okay as is?)

p. 55, 2 lines above (1.67): “coeffiicents” -> “coefficients”

[Thanks, these corrections will appear in the next revision of the ms. – T.]

p. 69, footnote 1: As noted above, the covariance should be set to zero. Also, shouldn’t we take complex conjugates of the j-dependent terms?

p. 70, line 12 from the bottom: “ways one can assign” -> “ways one can choose” (?)

p. 71, Remark 2.1.1: “net effect of such care … unspecified constant C”
I wasn’t sure what this refers to. The are no unspecified constant in (2.7). Does this refer to the C in (2.8) below the remark?

p. 73, line 7 from the bottom: space missing in “Markov’s inequality(1.13)”

p. 74, Remark 2.1.4: “reliance on the identity e^{X+Y} = e^x e^Y”
Just a clarification: the later section on Golden-Thompson inequality (Section 3.2) will show to get around this barrier, right? Will it be good to refer to that section in this remark?

p. 80: last sentence in proof of strong law of large numbers: It could just be me but I couldn’t follow the instructions to complete the proof here. What does “lacunary nature of the n_m” mean?

p. 80, Proposition 2.1.9: “for some constants C_A, c_A depending on A,p,C,c”
There seems to be no “p” in the picture here.

p. 82, line 4 from the bottom: missing period after the math display “Y := …”

p. 85: proof of Theorem 2.1.12 (Gaussian concentration inequality for Lipschitz functions): “It is tempting to use the fundamental theorem of calculus along a line segment”
It is clear from the remaining argument that the circular arc is very clever choice since it allows one to use the Gaussian nature of X,Y to exploit independence between X_\theta and its derivative. But this does bring up a natural question: is there a less magical and more brute-force argument not relying heavily on Gaussian assumption that just does the obvious thing: use a line segment to apply fundamental theorem of calculus? I see that the next result (Talagrand’s inequality) lifts the Gaussian assumption but adds the convexity assumption on F. Any pointers to a concentration inequality (if there is one) for the “jointly Lipschitz case” that neither requires X_i’s to be Gaussian nor requires F to be convex would be great.

The lacunary nature of the n_m allows one to estimate and for any , whence the claims in the text.

I believe that the general concentration inequality fails if one does not make a convexity or Gaussian-type assumption (although there are substitute hypotheses, such as a log-Sobolev inequality, which can compensate for this). The book by Ledoux covers these topics in some detail.

There I tried to give a kind of historic overview of RMT with a brief mentioning of main research directions, accompanied by a rather extensive bibliography. Obviously my account of the field is rather strongly biased by my personal research interests & experience, as well as by my background in Theoretical Physics.
That is why I will be much obliged for bringing
to my attention omissions from the list/discussion of any important/influential mathematical papers which you may happen to notice.

Sure, any other corrections, amendments and suggestions are more than welcome – including those concerning the presentation style and/or grammar, as I am not a native English speaker.

Further comments from p. 86 through p. 93 (end of Section 2.1 “Concentration of Measure”) of the current manuscript:

p. 86, line 7 from the bottom: There is little possibility of confusion here but I’ll just point out that \Omega is being used for the unit disk here while it was used for the sample space in the foundational material.

p. 87, Lemma 2.1.14: After the assumption, “Let A be a convex set …”, the statement appears abruptly. Is some text required here?

p. 87, line 8 from the bottom: “coefficent” -> “coefficient”

p. 88, 2nd math display: missing power of 2 in the distances d_c inside the expectation

p. 91, line 11 from the bottom: missing space in “Markov’s inequality(1.13)”

p. 93, line 4: “The summands here are pairwise independent …”
This confused me: do you mean that (X_i,X_j) and (X_i’,X_j’) are independent for (i,j) \neq (i’,j’)? Probably not, for it doesn’t seem right (they might share variables). But I am not able to figure out the intended interpretation of the above statement. This means that I am unable to follow the variance calculation for d(X,V)^2.

For page 90, it is best to use an independent variable (such as x_n) to denote partial differentiation, as opposed to the random variable X_n, to avoid confusion. For page 93, it is true that one does not have independence when variables are shared, but as long as one is off the diagonal, one still has zero correlation between the quantities , and on the diagonal one of course has full independence. So one can compute the variances for the on-diagonal and off-diagonal terms, and then combine them by the triangle inequality (losing a factor of 2 or so, but this is an acceptable loss). I’ll amend the text accordingly.

p. 103, eq. (2.39): This is called sigma because of the ‘s’ in ‘sup’, right? It doesn’t look the standard deviation of anything.

pp. 108-109, explicit computation of moment up to k=4: It was really nice that these initial cases were explicitly worked out before jumping into the calculation for general k. This makes the later discussion (pp. 109-110) much easier to understand (and even anticipate).

p. 111. line -12: “boudned” -> “bounded”

p. 112, middle of the page: What is mentioned is a beautiful way to understand what the Lindeberg trick does: it factors/decomposes the central limit theorem into two components. First, it shows that the limiting distribution of interest doesn’t change if we replace arbitrary RVs by gaussian RVs. Second, the distribution is explicitly calculated in the case of gaussian RVs (which turns out to be gaussian itself(!) in the CLT case).

p. 115. line after eq. (2.45): “we have lost an exponent of 1/4”
The exponent is 1 in Theorem 2.2.8 and is 1/4 here. So, haven’t we lost 3/4 in the exponent?

p. 118, line 11, “to in particular to give”: there’s seems to be an extra “to” here

p. 119, line right after proof of Theorem 2.2.14: “improvement already reduces the 1/4 loss in (2.45) to 1/2”
If my comment for p. 115 made any sense, then this should also say “reduces the 3/4 loss … to 1/2”

p. 121, beginning of section 2.2.7. (“Predecessor comparison”):
“Suppose one had never of the normal distribution, but one still suspected the existence of the central limit theorem. … Could one still work out what that limit was?”
Could think of a better start for a section meant to end the discussion of the CLT!!!

p. 123, footnote 11: there seems to be an extra opening parenthesis here

p. 124, first para:
“it is profitable to study the distribution of some random object … by comparing it with its predecessor”
“it may … be helpful to approximate a discrete process with a continuous evolution”
This is one of those pithy “summary” paragraphs that are sprinkled throughout the text and make the reading all the more rewarding!!!

pp. 138-141: It is really helpful to see the cases k=4,6 explicitly worked out before handling the general k case. Also seeing the upper bound improve from n to n^{3/4} to n^{2/3} for k=2,4,6 raises hopes that higher k’s might eventually lead to n^{1/2} upper bound. Indeed, we see at bottom of p. 142 that k as large as log(n) is enough to get n^{1/2} (up to log factors).

p. 141, last para: The equivalence symbol is used three times but the second time, it is 3 horizontal bars (probably \equiv) while the other two times it’s a “~”. Is this is a typo or am I missing something?

p. 142, line 10 from the bottom: missing space in “Markov’s inequality(1.13)”

p. 143, 2nd line below prop. 2.3.13: was puzzled by what you mean by “half a logarithm”. Does it have something to do with the fact that log(sqrt{n}) = 1/2 log(n)?

p. 143, 3rd line below prop. 2.3.13: “…later in these notes” -> “… later in this chapter” (?)

[Thanks for the corrections! Entropy refers to metric entropy – the (logarithm of) the number of possible states in a configuration. “half of a logarithm” should just be “logarithm”. Convexity is used for emphasis (one occasionally talks about non-convex “norms”, such as the L^p norms for p<1, for instance). – T.]

[…] Download or Read Book here Might also interested…The Astrobiology Primer: An Outline of General KnowledgeA Guide to Claims-Based Identity and Access ControlMany-Minds RelativityThe Food of the Gods and How it Came to EarthEconomic aspects and business models of Free SoftwareComing up for AirOperator Algebras and Quantum Statistical MechanicsEvidence, Proof and Justice: Legal Philosophy and the Provable in English CourtsLet’s Go FishingThe Book of Wonder You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site. […]

Thanks Prof. Tao for the clarifications regrading my previous comments. I also see that you have uploaded an updated version (on Apr 11, 2011). Some comments from p. 143 through p. 159 (end of section 2.3) of the current (Apr 11, 2011 version) manuscript:

p. 144, Exercise 2.3.10: “draw a line segment between … whenever i_a = i_b”
For the cycle (a,b,a,c,a,d) in Example 2.3.14, this will give us line segments joining 1 to 3, 3 to 5 and 1 to 5 since i_1 = i_3 = i_5 = a. This is k/2 not k/2-1. I guess we should not connect 1 to 5. Perhaps you meant “i_a = i_b and i_c \neq i_a for a < c “formula derived in Section 2.2” (?)

p. 149, Proof of Theorem 2.3.21, line 10: “net contribution of the remaining cycles is O(… n^{k/2})”
It seems that n^{k/2} should be n^{k/2+0.98}.

p. 159, extra right parenthesis at the end of footnote 25.

[Corrected, thanks. Unfortunately, HTML processing deleted a portion of your comment; you may need to repost that portion, using &lt; and &gt; instead of < and > to avoid the HTML parser. -T]

I am studying stochastic control as an electrical engineer and I wish to explore more deeply the connections that seem to exist between stochastic stability and martingales. I do understand the use of Lyapunov functions in determining stability of deterministic dynamical systems; but I wish to understand more clearly the stochastic analogs of the definition of stability and the Lyapunov functions. I can intuitively see that classical Newtonian potential theory and martingales are very similar but I wish to read more precise formulations of these concepts. I’d be grateful if you could give pointers to your work or any other existing literature on stochastic stability

p. 169, line 7
You refer to the Harish-Chandra integration formula in connection with the analogue of the Fourier method. You also say that it will not be discussed in this book. Is there an accessible reference for the reader of this book that can help him/her understand this approach based on non-commutative Fourier analysis?

p. 169, line 13 from the bottom: missing sqrt over n in the subscript of s_\mu

p. 169, last math display:
Two things seem missing here. I think there should a 1/n multiplying the first term -1/z. Second, all M_n’s should be M_n/sqrt{n}, right?

p. 173, line 9: “constaants” -> “constants”

p. 174, line 2: “concetration” -> “concentration”

p. 174, line just above 2nd math display: “from the linearity of trace” -> “from the linearity of expectation” (?)

p. 174, last line: Pointing out a minor notational inconsistency. Here, in “(X’)*”, prime denotes transpose and * denotes complex conjugate. However, later on pp. 175-176, the notation used is “X* R X” where * denotes both transpose and complex conjugate.

p. 178, eq. (2.102) and its discussion: “… this equation came by playing off two ways in which the spectral properties of a matrix M_n interacted with that of its minor M_{n-1}; firstly via Cauchy interlacing inequality, and secondly via the Schur complement formula.”
The basic idea is “predecessor comparison” but the way this is accomplished here is quite remarkable! Cauchy interlacing yields one key ingredient viz. eq. (2.96). But then, after the use of Schur complement formula, it wasn’t clear where things were going until you pointed out that in X* R X, R is independent of X and so we can bring in Talagrand’s inequality. Very nice!

p. 211: Proof of Proposition 2.5.21
I know that Exercise 2.5.19 asks the reader to expand the sketch into a full proof. Still, I would like comment that it would have been very useful to see a full proof as part of the text (given that Prop. 2.5.21 connects freeness to the main theme of this book: random matrices).

p. 212: line 4 from the bottom
“one can that” -> “one can assume that” (?)

p. 214: R-transform
From the 3rd unnumbered equation on p. 214, it feels like one would define a transform by z(s) + 1/s. However, the R-transform is defined with s replaced by -s. Is there a particular reason for defining it this way?

p. 216: Remark 2.5.24: reference to [Sp]
In the bibliography (p. 327) [Sp] refers to a URL in which a tilde before “speicher” seems to be missing. It could be my pdf viewer’s problem or a problem in the pdf itself. If it’s the latter, the LaTeX package ‘url’ provides a command to typeset URLs containing tildes, etc.

p. 217: line 7
missing right parenthesis in “(or just the additivity property of R-transforms”

[Thanks for the corrections! These will appear in the next revision of the paper. The reason for the R-transform sign conventions is that the literature is based on the moment power series instead of the Stieltjes transform; the two essentially differ by a minus sign, see (2.119). -T]

p. 135, Proposition 2.3.10: In the inequality, the median operator $M$ should be in bold face.

p. 142: “Summign” should be “Summing”.

p. 143, second para from bottom: “each class of cycle” should be “each class of cycles”.

p. 144: In Exercise 2.3.10, I think that $a$ should be connected by a line segment to $b$ only if $i_a = i_b$ *and* $i_c \neq i_a$ for $a < c < b$. For instance, in the non-crossing partition consisting of just one part, just the first condition would give an edge between every pair of points, which is clearly too large. Is this correct?

p. 148, second para from bottom: In "$\epsilon$ to grow slowly in $n$", did you mean that $\epsilon$ goes to zero slowly in $n$?

p. 150, the paragraph following Equation 2.73: "apeparance" should be "appearance".

p. 159, last para of Section 2.5: "See this [So1999] for" should be "See [So1999] for".

p. 159, last para of Section 2.5: "There has also been" should be "There have also been".

p. 159, Footnote 23: Spurious closing paranthesis.

p. 161, Remark 2.4.1, just before the second equation: Missing "\emph{in}" in "converges the vague topology to $\mu$".

p. 206, 3rd line of subsection 2.6.5
Missing argument “z” in “P_0” while it is there in “P_{n-1}(z)”.

p. 209, Theorem 2.7.1 statement, 2nd line
extra “1” in “1p”

p. 213, line 1
“we prove the theorem” should be “we prove the proposition”

p. 215, Theorem 2.7.5 statement, line 4
lambda should appear in the subscript of small-oh (instead of epsilon)

p. 216, last line
“next set of lectures” should be “next section”

p. 218, discussion of “universality” in the 2nd para after Theorem 2.7.8
As you mention in the nice high-level discussion, Theorem 2.7.8 is proved by showing that the limiting distribution is the same whether we have the Bernoulli or Gaussian ensemble.
Is it known what is the “universality class” here? In other words, what is the set of distributions allowed for the entries of M such that we still get the same limit that Edelman worked out for the Gaussian ensemble?

Thanks for the corrections! Regarding the universality of the least singular value, this is currently known for all iid matrix ensembles of mean zero and variance 1 that obey a finite moment condition (the C_0^th moment needs to be bounded for some sufficiently large C_0; the optimal value of C_0 is not currently known).

Thanks for your reply to my qeustion regarding the universality class for Theorem 2.7.8. Here are some comments for Section 2.8 (pp. 223–234) of the current (Aug 23, 2011) version of the manuscript.

p. 224, line 8 from the bottom
U_0 is referred to as U three times

p. 225, Section 2.8.2, line 5
unmatched left parenthesis in “(at least, when the measure … sufficient decay at infinity.”

p. 228, line 3
“g_n” should be “f_n”

pp. 229–231
The discussion starting from Girko’s strategy, followed by Bai’s approach and ultimately ending in a description of the techniques used by Tao-Vu-Krishnapur is really nice. Unlike other sections of the book, some of the details are omitted. But the discussion prepares the reader well and motivates him/her to read the original papers!

Does the proof of WLLN actually use the uniform integrability of sequences of i.i.d. integrable random variables? The proof doesn’t mention it but it seems to me necessary if you’re going to pick N to bound all tails for large n.

In my book I only consider the WLLN for identically distributed, absolutely integrable variables, which are automatically uniformly integrable, and for which selection of the truncation parameter N can be achieved directly from the monotone convergence theorem. The WLLN can be generalised to averages of uniformly integrable sequences by a similar argument to the one given in my book, though.

Mislabeled reference: end of the Lindberg individual swapping section of CLT, refers to Appendix D of TaVuKr10 for multidimensional Berry-Esseen. Should be TaVu2009b, or TaVu2010 for the uncorrelated case (still Appendix D in both). (On the blog it’s mislabeled but does link to TaVu2009b.)

Thanks a lot for these notes – I’m relatively new to concentration of measure, and this has helped in my research.

I did have one question/possible correction – in the proof of Proposition 2.1.9 on page 80, you perform a dyadic decomposition of the variables X_i. However, based on the indicator events you define, it seems that only the positive X_i events are decomposed, and all negative X_i events are grouped into X_i,0.

It seems like the obvious ways to decompose [-infinity,0] would be by either changing the existing random variables to X_i,m := X_i I(2^(m-1) < |X_i| < 2^m), or to define similar random variables as X_i I(-2^m < X_i < 2^(m-1)). Did you intend one of these decompositions?

Thanks,
G

[The former decomposition was intended; thanks for the correction! – T.]

Exercise 1.1.9, p. 16: it was claimed that Poisson distribution is sub-Gaussian. But it appears that its MGF grows double exponentially which is way faster than exp(quadratic). Also the tail probability scales 1/factorial, slower than gaussian tail.

Oh, I overlooked that $k$ was even, thank you.
Is there a reason why in the proof of Hoeffding’s lemma from pp. 61 you are emphasizing $X=O(1)$ *after* assuming that $b-a=1$ ? Since $X=O(1)$ should already follow from the hypothesis of $X$ being bounded (in which case the bounding constant is hidden inside the $O$). I feel like you wanted to convey some intuition here that I’m missing.

(A minor nitpick: I think in the proof of Theorem 2.1.3 on pp. 62 , when computing $\mathbf{E}\exp(tX)$, you’re referring to (2.10) instead of (2.9) – although since the former follows from latter one could call this a moot point.)

I’d be very grateful if you could indicate a short example or reference to such an application, since seeing hands-on what you mean would enable a better appreciation of the importance of the independence of the implied constants.
(I assume the case is similar for other inequalities, like Chernoffs bound, where the implied constants also should be independent of e.g. $\sigma$.)

In the proof of Chernoff’s inequality you use a parameter , which is to be optimised later. I’m wondering about this restriction: Why do you allow ? In this case, since , you could only get the trivial bound, so one may as well omit .
Is there a specific reason to restrict to be at most ? The equality would be valid for all and I everything else seems to work too (the final bound on the probability would actually be better, I think, since one wouldn’t need to distinguish by cases if the that minimizes the right-hand side of the last inequality lies in or not). Maybe it is of advantage later in the text to have a bound of the form (2.11) ?

The condition is needed so that we may ignore the term in (2.9). In this particular case, t=0 is indeed never the optimal choice, and could be removed if desired. (But it is traditional to allow parameters to range in compact spaces such as [0,1] if there is no harm in doing so, if only because this automatically ensures that any continuous optimisation problem attains its extremum, which is then automatically finite.)

On p. 5 it is said that the relation $E \subset F$ should be understood as “$E$ is contained in $F$” or “$E$ implies $F$” or “$E$ holds only if $F$ holds”, but $E$ and $F$ should be reversed in these last two interpretations.

[Actually, things seem to be correct as stated. For instance, “ is a multiple of 4 implies is even”, or “ is a multiple of 4 only if it is even”, and the set of multiples of 4 is a subset of the set of even numbers. -T.]

Dear Prof. Tao,
as a newcomer in probability theory (coming from functional analysis), I enjoy reading your book Topics in RMT very much.
However, I was a bit confused while doing Exercise 2.5.16.
I was able to verify the free independence (according to Definition 2.5.18.) of and but I also noted that the same argument can be used to show the free independence of and if we consider almost the same situation but we replace the free group (generated by ) by the free Abelian group (generated by ).
I was surprised as I thought that the non-commutativity of the underlying group is essential. Moreover, it turns out that (according to Def. 2.5.18.) the shift operator on is freely independent from itself, or (which is more or less the same) the classical random variable drawn uniformly at random from the unit circle of the complex plain (let us denote it by for further use) is freely independent from itself.
This latter example also shows that Exercise 2.5.15. can not be done with the current definition of free independence, because is a faithful non-commutative probability space (which is, in fact, commutative), and and are freely independent if They also commute, but none of them is equal to a scalar.
I guess, all of these problems can be solved if in the definition of free independence (Def. 2.5.18.) all the expressions are replaced by that is, if we require that the equation holds not only for one variable polynomials, but also for two-variable polynomials with arguments
(And in Exercise 2.5.15. we should then consider after normalization.)

Thanks for this. There was an existing erratum to Exercise 2.5.15 adding the hypothesis of self-adjointness, but I have added an erratum for Exercise 2.5.16 as well. (The definition of free independence is standard, so it would be difficult to fix the exercises by changing the definition.)

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.