Sunday, November 22, 2009

The Numbers Behind Numb3rs - DNA Profiling

DNA profiling (also called genetic fingerprinting) is a technique to identify individuals on the basis of their respective DNA profiles. Although 99.9% of human DNA sequences are the same in every person, enough of the DNA is different to distinguish one individual from another. DNA profiling uses repetitive ("repeat") sequences that are highly variable, called variable number tandem repeats (VNTR). VNTRs loci are very similar between closely related humans, but so variable that unrelated individuals are extremely unlikely to have the same VNTRs.

a. The FBI's CODIS System: based on 13 loci (including Virginia's 8-loci); total of 3,000,000 in the database;
b. Virginia Division of Forensic Science: based on 8 loci; total of 101,905 offender profiles (as of November 1999) in the database;
c. Arizona: based on 13-loci; total of 65,000 offender profiles in the datase.

The probability that someone would match a random DNA sample at any one site (locus, pl. loci) is roughly 1 in 10.

The probability that someone would match a random DNA sample at 8 loci is __________.

The probability that someone would match a random DNA sample at 9 loci is __________.

The probability that someone would match a random DNA sample at 13 loci is __________.

Case Study: USA v. Raymond Jenkins
At first, the DNA profiling was done against Virginian database of 8-loci. The likelihood of an accidental match is roughly 101,905 x 1/100,000,000 = 1/1,000. Considered quite high, Jenkins was released from custody.
Later, more evidence was found and another DNA profiling was performed against the FBI's database of 13-loci. The likelihood of an accidental match is roughly 3,000,000x 1/10,000,000,000,000 = 1/3,000,000. Considered extremely rare, Jenkins was then arrested and charged with second-degree murder.

The Database Match Calculation:
Suppose an DNA sample is sent to Arizona for DNA profiling with 13 loci and 65,000 DNA profiles in the database,
what is the probability of there being a 9-locus match?
Given that the probability that someone would match a random DNA sample at any one locus is roughly 1 in 10.

(1) For 9-locus DNA profiling, the probability for a 9-locus match is _________________.

(2) Out of 13 loci, there are ________________ different ways of having 9-locus match.

(3) Hence, finding a match on any 9 loci of the 13 is 715 / 1,000,000,000.

(4) If any one profile is picked in the database, the proability of a second profile not matching on 9 loci is 1- 715/10^9.

(5) The probability of all 65,000 entries not matching on 9 loci is (1- 715/10^9)^65,000. Using Binomial Theorem, this is approximately 1- 65,000x715/10^9.

(6) The probability of there being a 9 loci match is thus 1- (1- 65,000x715/10^9) = 65,000x715/10^9 = 0.05 = 5%.

In fact, an actual analysis of the Arizona database uncovered 144 individuals whose DNA profiles matched at 9 loci, 144/65,000 = 0.22% (differed from the theoretical 5%, why?), one pair matched at 11 loci and one pair matched at 12 loci. The 11 and 12 -locus matches turned out to be siblings, hence not random.

Extension: Brithday Problem (23, >50%). In a class of 40, the probability of two students share the same birth date is as high as 85%. Surprised?!

No comments:

Post a Comment