Wednesday, November 25, 2009

Statistics 101

1. Racial Bias

When officers reported knowing the race of the driver in advance, 66 percent of the drivers stopped were black, compared with 45 percent when police reported not knowing the race of the driver in advance, according to the RAND study. (This is a report done in Oakland Police, USA, in 2004)

2. Hit-and-Run Accident (Fictional example proposed by the psychologists Amos Tversky and Daniel Kahneman in the early 1970s)

A certain town has two taxi companies, Blue Cabs and Black Cabs. Blue Cabs has 15 taxis, Black Cabs has 85 (slight variations do exist, e.g. some use 75 instead of 85). Late one night, there is a hit-and-run accident involving a taxi. All of the town's 100 taxis were on the streets at the time of the accident. A witness sees the accident and claims that a blue taxi was involved. At the request of the police, the witness undergoes a vision test under conditions similar to the those on the night in question. Presented repeatedly with a blue taxi and a black taxi, in random order, he shows he can successfully identify the color of the taxi 4 times out of 5. (The remaining 1/5 of the time, he misidentifies a blue taxi as black or a black taxi as blue.) If you were investigating the case, which company would you think is most likely to have been involved in the accident?
Faced with eye-witness evidence from a witness who has demonstrated that he is right 4 times out of 5, you might be inclined to think it was a blue taxi that the witness saw. You might even think that the odds in favor of it being a blue taxi were exactly 4 out of 5 (i.e., a probability of 0.8), those being the odds in favor of the witness being correct on any one occasion.

Do you think the witness was reliable?

However, the facts are quite different. Based on the data supplied, the probability that the accident was caused by a blue taxi is only 0.41. That's right, the probability is less than half. It was more likely to have been a black taxi.

How do you arrive at such a figure? Use Bayes Theorem.
Compute the product
P(blue taxi) x P(witness is right),
and divide the answer by the sum
[P(blue taxi) x P(witness is right) + P(black taxi) x P(witness is wrong)].
Putting in the various figures, this becomes the product 0.15 x 0.8 divided by the sum [0.15 x 0.8 + 0.85 x 0.2], which works out to be 0.12/[0.12 + 0.17] = 0.12/0.29 = 0.41.


To look at the problem from another angle:
For the 15 blue taxis, he would (correctly) identify 80% of them as being blue, namely 12. (In this hypothetical argument, we are assuming that the actual numbers of taxis accurately reflect the probabilities.)
For the 85 black taxis, he would (incorrectly) identify 20% of them as being blue, namely 17.
So, in all, he would identify 29 of the taxis as being blue.
Thus, on the basis of the witness's evidence, we find ourselves looking at a group of 29 taxis.
Of the 29 taxis we are looking at, 12 are in point of fact blue.
Consequently, the probability of the taxi in question being blue, given the witness's testimony, is 12/29, i.e. 0.41.


You see, sometimes, our intuitions can be so wildly misleading!

For the case of 15 blue taxis and 75 black taxis, the probability of the same witness of correctly identifying a blue taxi was 12/27 = 0.44.

No comments:

Post a Comment