To strive, to seek, to find...: January 2008

Thursday, January 31, 2008

Our intuition about probability (part 2)

In a previous post, I described how our intuition about probability can lead us astray, using an example dealing with the average number of tosses needed of a fair coin to generate specific sequences of heads and tails. I provided an informal argument as to why the pattern 'HTH' took more tosses, on average, to generate than 'HTT', and I also provided the results of a simulation demonstrating this. In this post, I'll discuss the mathematical proof of this, as described in a paper by V. C. Hombas, though in a manner that is (hopefully) more suitable for a lay audience.

First, some definitions and notations. Let X be the number of tosses until the pattern of interest (either 'HTH' or 'HTT') appears; the average number of tosses for this pattern to appear (or more formally, the "expectation value of X") is then written as 'E[X]'. The concept of conditional probability is also important; written 'P(A|B)', it asks what the probability of event A is given the fact that event B has already occurred. In this discussion, then, the notation 'E[X|H]' asks what the average number of tosses for generating pattern X is, given the fact that the first toss was a head. Similarly, the notation 'E[X|HT]' asks what the average number of tosses for generating pattern X is, given the fact the first two tosses were a head and a tail, and so on.

With that, let's calculate what the E[X] is for the pattern X = 'HTH'. With a fair coin, we are equally likely to get a head as a tail on any toss (i.e., P(H) = P(T)), giving us an equation for the first toss of our search:

E[X] = 0.5 E[X|H] + 0.5 E[X|T]

If we is toss a tail, then we count a one toss "penalty" and restart our search for the 'HTH' pattern; mathematically speaking, E[X|T] = 1 + E[X]. Substituting this expression into the last one, we get our first numbered equation:

(1) E[X] = 0.5 E[X|H] + 0.5 (1 + E[X])

The next step is to evaluate E[X|H], which is the expectation value of X once we have the first head in the 'HTH' pattern. Again, because P(H) = P(T),

E[X|H] = 0.5 E[X|HT] + 0.5 E[X|HH]

If we toss 'HH', we must start the search over again; however, we can use the second head we tossed as the beginning of the 'HTH' pattern and only have to pay a one toss "penalty". Thus, E[X|HH] = (1 + E[X|H]), and we get our second numbered equation:

(2) E[X|H] = 0.5 E[X|HT] + 0.5 (1 + E[X|H])

Again using the fact that P(H) = P(T), we expand E[X|HT] as

E[X|HT] = 0.5 E[X|HTH] + 0.5 E[X|HTT]

Since 'HTH' is what we're looking for, we count the number of tosses since we started searching and say E[X|HTH] = 3. If we toss 'HTT', we have to start searching for the pattern from the beginning, paying a three toss "penalty" in the process; so, E[X|HTT] = 3 + E[X], and

(3) E[X|HT] = 1.5 + 0.5 (3 + E[X])

Substituting equation (3) into equation (2) into equation (1) and working through the algebra, we find E[X] = E[HTH] = 10, which matches the result we got from the simulation performed in the previous post.

Finding E[X] for X = 'HTT' is the same up to the last step of expanding E[X|HT]. Here, E[X|HTT] = 3 since we are looking for 'HTT'. E[X|HTH] forces a restart of the search from the last head along with a two toss "penalty" for the initial 'HT'; so, E[X|HTH] = 2 + E[X|H], and

E[X|HT] = 1.5 + 0.5 (2 + E[X|H])

Making the substitutions into equations (1) and (2) and crunching the algebra, we find E[X] = E[HTT] = 8, again matching the results from the simulation.

Since E[HTT] < E[HTH], we have proved that on average, the pattern 'HTT' should appear before 'HTH' when tossing a fair coin.

Monday, January 28, 2008

Multiple Inspector windows

From the unofficial Apple blog, a great tip on how to open multiple Inspector windows in applications like Keynote and Pages. Inspector is the little toolbox that allows you to tweak things like object size and alignment; producing multiple Inspector windows is ridiculously simple: hold down the Option key and click on the tab you want.

Wednesday, January 23, 2008

Our intuition about probability (part 1)

Peter Donnelly's TED talk highlights some common mistakes made in interpreting probabilities and statistics as well as the consequences of such mistakes.

One of the interesting questions he poses is this: Suppose we toss a fair coin repeatedly, counting the number of tosses made before getting the pattern 'HTT' (head, tail, tail); for example, if we toss 'HHTHHTTH', the count would be 7. We perform a large number of trials, resetting the count of tosses between trials, to get the average number of tosses required to produce 'HTT'. We then repeat this experiment (counting number of tosses requried to produce a pattern across a large number of trials), but instead of looking for 'HTT', we look for 'HTH'. We then ask which, if either, is larger: the average number of tosses needed to produce 'HTT' or the average number of tosses needed to produce 'HTH'?

If you are like most people (including me), your reasoning goes something like this: For a fair coin, the probability of a head P(H) = 1/2, and each toss is independent of the one before it. Thus, P(HTT) = P(HTH) = (1/2)(1/2)(1/2) = 1/8, and the average number of tosses required to produce 'HTT' is equal to the average number of tosses required to produce 'HTH'.

However, when you actually perform this experiment, you discover that the average number of tosses required to produce 'HTH' is larger than the average number of tosses required to produce 'HTT'. We can demonstrate this using the following R code:

# Number of tosses per trial; this should be fairly large to ensure (so
# far as we can) that the 'HTH' and 'HTT' patterns occur in each trial
n.tosses <- 50;

# Number of trials to run; this should be large so we get a good
# estimate for the average number of tosses required to generate each
# pattern ('HTH' or 'HTT')
n.trials <- 10000;

# Perform n.trials of coin tosses; the value of each toss is either
# '1' (head) or '0' (tail), as drawn from a binomial distribution with
# one trial and a probability of success of 0.5
generate.string <- function(x)
    { paste(rbinom(n.tosses, 1, 0.5), collapse=''); }
data <- sapply(1:n.trials, generate.string);

# Number of tosses for the patterns of interest in the data.
# We need the '+ 2' correction factor since the number of tosses is
# counted at the end of the pattern, but regexpr returns the index of
# where the pattern starts.
hth <- regexpr('101', data) + 2;
htt <- regexpr('100', data) + 2;

# Print the average number of tosses required to produce each pattern
cat('HTH:', mean(hth), 'tosses\n');
cat('HTT:', mean(htt), 'tosses\n');

Running this simulation, we see that 'HTH' takes an average of ~10 tosses to produce while 'HTT' only takes an average of ~8 tosses. The explanation for this lies in the fact that the 'HTT' pattern allows for some overlap in our search while the 'HTH' pattern does not.

Borrowing from Donnelly's talk, suppose we have already tossed the 'HT' pattern, and the next toss is a 'H'. If we are looking for 'HTH', we are done. If we are looking for 'HTT', we have to continue looking, but we can use the 'H' that was just produced as the possible start of the 'HTT' pattern we are looking for.

On the other hand, suppose we have already tossed the 'HT' pattern, and the next toss is a 'T'. If we are looking for 'HTT', we are done. If we are looking for 'HTH', we have to continue looking; unfortunately, unlike the case in the last paragraph, we cannot use our most recent toss (a 'T') as the possible start of the 'HTT' pattern we are looking for. We are forced to wait for the next 'H' to show up, and thus, are now behind by at least one toss. This explains why the 'HTH' pattern takes more tosses, on average, to generate than the 'HTT' pattern.

Obviously, a formal, mathematical proof of this exists, but that is a topic for another post.

EDIT [31 Jan. 2008]: I have now added another post with the mathematical proof of the situation.

Tuesday, January 22, 2008

The Plague!

Shelley Batts has a great series on the Black Death, discussing what medieval medical plague doctors wore in an attempt to prevent infection, the most likely causative organism and mode of transmission (Yersinia pestis [AKA, bubonic plague] and fleas on rats, respectively), and alternatives to the Y. pestis theory. She also notes that plague is still a problem, as discussed in this PLoS Medicine paper; it has even led to at least one death in the United States.

Tara Smith provides more information on criticisms of Y. pestis as the causative agent of the Black Death, examines the validity of those criticisms, and discusses what the discovery of Y. pestis DNA in medieval mass graves tells us about the etiology of the Black Death.

Friday, January 18, 2008

Notes on standard deviation

I picked up these facts while working on a derivation for work, mainly from a discussion of standard deviation on Mark Chu-Carroll's blog. When most of us learn basic statistics, we are taught the 68/95/99.7% rule; that is, for a Gaussian (normal) distribution, ~68% of the data lie within 1 standard deviation of the mean (μ ± 1σ), ~95% within μ ± 2σ, and ~99.7% within μ ± 3σ. However, out in the "real" world, we rarely come across perfect Gaussian distributions or even distributions we can approximate as being Gaussian. Thus, we can't apply this rule to make sensible statements about how data cluster around the mean. Fortunately, there are two theorems (given below without proof) that do allow us to make such statements, regardless of the distribution¹.

Chebyshev's inequality tells us that for any distribution, no more than 1/λ² of the data are more than λ standard deviations away from the mean; equivalently, at least (1 - 1/λ²) of the data are within λ standard deviations from the mean. So, regardless of the distribution, at least 75% of the data are within μ ± 2σ, and ~88.9% are within μ ± 3σ.

Further, if the distribution is unimodal, we can refine our statements regarding the distribution of data around the mean, yielding the Vysochanskiï-Petunin inequality; it states that for all λ > √(8/3) ≈ 1.633, no more than 4/(9λ²) of the data are more than λ standard deviations away from the mean. So, for any unimodal distribution, ~88.9% of the data are within μ ± 2σ, and ~95.1% are within μ ± 3σ. Note that the λ > √(8/3) limit is important; without it, the distribution X must be symmetric about the mean (i.e., X(μ + x₀) = X(μ - x₀)) in order for the (4/9) factor to be correct.

Other statements about the distribution of data about a mean or mode do exist (e.g., the Camp-Meidell inequality or the more general Gauss-Winckler inequality). However, these have more complicated assumptions about the data distribution, and won't be discussed here (for now).

[1] Actually, that's not entirely true. Certain distributions, such as the Cauchy distribution, either have an undefined or infinite variance; in these cases, none of the theorems discussed above apply, and we can't say anything about the distribution of data around the mean. For the purposes of this post, then, "distribution" should be read as "distribution with a finite variance".

Wednesday, January 16, 2008

So much interesting science

PZ Myers has a couple posts reviewing the evolution of eyes within the animal kingdom and within vertebrates specifically.

From the cognitive science bag, posts on why songs get stuck in our heads, why being too happy may be a bad thing, how odors affects our judgments of people, and how language affects our thoughts.

In (slightly) off-beat news, discussions on what causes sphenopalatine ganglioneuralgia, how traffic jams happen, and how effective aluminum foil hats are in stopping mind control rays.

Finally, a link to the collected papers of R. A. Fisher.

Tuesday, January 15, 2008

Something to try on your next flight

Pour yourself a drink while the plane does a barrel roll, like this guy:

James Fallows provides an explanation. This is the same physics that causes you not to (usually) notice when your plane is executing a turn.

Wednesday, January 9, 2008

Life is not a Disney movie

Along the lines of my post of the "Circle of Life" video, Darren Naish discusses the ugliness of predation, describing how "kills are often not clean and simple, but may involve the prey animal getting eaten when still alive." It's a reminder that what we see on TV nature shows don't fully reflect the brutality that actually occurs out in the wild.

I don't understand this need to believe that Nature is (almost by definition) warm and fuzzy, especially among "animal rights" activists, who seem to believe that if only we humans stopped being "cruel" to animals, lions and gazelles will finally get along. As I once quipped, the only reason we think Nature is beautiful is that we've killed enough of it to make it that way.