Learning Statistics
Being a physicist at the Rockefeller University — and anyone who isn’t a biologist or a biochemist is a physicist around here — means that any mathematical problem that arises lands in your lap. This is both enlivening and rather uncomfortable. Of late, many of the problems have been statistical, which is particularly uncomfortable: I’m extremely good with probability in all the gory, measure-theoretic details, and even somewhat conversant with random processes, but I never did any statistics.
So I’ll learn. First, I need decent books, but I don’t know what books are decent until I know something about the field. To pull myself up by my bootstraps, I go to the relevant section of the library, and start pulling books off the shelves, flipping through them, and trying to figure out what’s going on. Iteratively, I get a slightly clearer view, and discard a large number of the books as inappropriate. I will generally find a book in the first round of this and go off and read part of it. In this case it was Jean René Barra’s Mathematical Basis of Statistics, a lovely, Bourbakist book.
In the next iteration, I found Jack Carl Kiefer’s Introduction to Statistical Inference, which has a beautifully clear view of the mathematical structure of inference and how the various schemes such as Bayesian statistics fit into it.
The next day I was in the subbasement again — that’s where the math books are here — and this time unearthed John Tukey’s Data Analysis and Regression. Tukey steps back even further and spends a lot of time addressing bad stuff happening in the tails of distributions, situations when your data or techniques aren’t up to inference, robustness, and all the unpleasantness that reality can dish up.
And in parallel, I’ve been charmed by Edward Tufte’s inspiring The Visual Display of Quantitative Information. Don’t read it if you don’t want to spend the rest of your life foaming at the mouth when people put up bar charts or pie charts, or trying to make your graphs meaningful.
And then I presented this paper, which gives a few basic rules on error bars, to my lab in journal club, and I desparately wish I hadn’t. In reaction, I now proclaim: you cannot understand statistics without mathematical sophistication, and I’m not willing to try again.
So I’ll post gems as I find them, but I’m not going to write for nonmathematicians. Remember, there is no such thing as nonmathematical science.
Stuart Wray:
I like “Information Theory, Inference, and Learning Algorithms” by Mackay and “Probability Theory” by Jaynes.
9 July 2007, 1:58 pmmadhadron:
Jaynes isn’t really relevant in this case. His book is on probability, and when he dips into statistics, it’s as a Bayesian. Bayesian statistics is fine if you have a prior which you can measure in some way, and Bayesian procedures are a really useful mathematical tool for asking questions about all possible procedures, but I have yet to see any convincing argument for using Bayesian inference for things such as basic science.
11 July 2007, 11:16 am