Reflections of a Data Scientist: (R) Binomial Distribution

What R lacks in graphical capability, it makes up for in analysis, which in my opinion, is of far greater importance. Today we will discuss R's ability to streamline binomial distribution analysis.

For our example, let’s say that you have five dice. Each die has six faces, and each of those faces contains a number (1,2,3,4,5,6).

Now, for our example to qualify as a binomial probability distribution, it must meet ALL of the following requirements:

Requirements

1. The procedure has a fixed number of trials.

2. The trials must be independent.

3. Each trial must have all outcomes classified into two categories (success or failure).

4. The probability of a success remains the same in all trials.

Notation

Notation for probability distributions is as follows:

p = Probability of success. (one trial)

q = Probability of failure. (one trial)

n = Fixed number of trials.

x = The number of successes in ’n' trials.

P(x) = the probability of getting exactly ‘x’ successes among the ’n’ trials.

* Source for the above material: https://www.youtube.com/watch?v=BR1nN8DW2Vg

User: DrCraigMcBridePhd Video: "Statistics - Binomial & Poisson Distributions"

So, for the sake of our example, let's say that we want to know the probability of rolling all five dice, one after the other, and having each face land showing the "6" side.

Therefore:

p = 1/6 (1 out of 6 chance that a roll of "6" will occur on one die.)

q = 5/6 (5 out of 6 chance that it will not.)

n = 5 (5 rolls will be made.)

x = 5 (5 rolls of "6" are needed.)

P(x=5) = ??? (What is the probability that all 5 rolls will show a face of "6"?)

In R, this would be expressed in the code below:

dbinom(x=5, size=5, prob=1/6)

Which would generate the result of:

[1] 0.0001286008

How about we try the same experiment, with the same parameters, except this time, we want to know the probability of rolling a “5” or a “6” on any die. Again, we will be rolling each of the 5 dice once.

# p = 2/6 (2 out of 6 chance that a roll of "6" or "5" will occur on one die.) #

# q = 4/6 (4 out of 6 chance that it will not.) #

# size (n) = 5 (5 rolls will be made.) #

# x = 5 (5 rolls of "6" or "5" are needed.) #

# P(x=5) = ??? (What is the probability that all 5 rolls will show a face of "5" or "6"?) #

# In R, this would be expressed in the code below: #

dbinom(x=5, size=5, prob=2/6)

Which would generate the result of:

[1] 0.004115226

# Now let's check the probability of not rolling a "6" on one die, given 5 trials. #

# prob = probability that event will not occur #

# The probability of NOT rolling a "6" on one die, one roll #

dbinom(x=1, size=1, prob=5/6)

Which would generate the results:

0.8333333

# Also could run the following code in each instance: #

# prob = probability that event will occur #

1 - dbinom(x=1, size=1, prob=1/6)

# The probability of NOT rolling a "6" on 2 dice, given 2 dice being separately rolled. #

dbinom(x=2, size=2, prob=5/6)

# The probability of NOT rolling a "6" on 3 dice, given 3 dice being separately rolled. #

dbinom(x=3, size=3, prob=5/6)

# The probability of NOT rolling a "6" on 4 dice, given 4 dice being separately rolled. #

dbinom(x=4, size=4, prob=5/6)

# The probability of NOT rolling a "6" on 5 dice, given 5 dice being separately rolled. #

dbinom(x=5, size=5, prob=5/6)

# Finally, let's say that you wanted to know the probability of rolling two or less "6"'s on a die face given… #

# 5 separate rolls of 5 dice #

# p = 1/6 (1 out of 6 chance that a roll of "6" will occur on one die.) #

# q = 5/6 (5 out of 6 chance that it will not.) #

# n = 5 (5 rolls will be made.) #

# x<=2 = (2 rolls of "6" are needed.) #

# P(x<=2) = ??? (What is the probability that two dice or less show the face "6"?) #

# In R, this would be expressed in the code below: #

sum(dbinom(x=0:2, size=5, prob=1/6))

# or #

dbinom(x=0, size=5, prob=1/6) + dbinom(x=1, size=5, prob=1/6) + dbinom(x=2, size=5, prob=1/6)

# or: #

pbinom(q=2, size=5, prob=1/6, lower.tail = TRUE)

Each method would produce the result of:

[1] 0.9645062

# What is the probability that 3 dice or more show the face "6"? #

dbinom(x=3, size=5, prob=1/6) + dbinom(x=4, size=5, prob=1/6) + dbinom(x=5, size=5, prob=1/6)

# or #

pbinom(q=2, size=5, prob=1/6, lower.tail = FALSE)

# or #

1 - pbinom(q=2, size=5, prob=1/6, lower.tail = TRUE)

# or #

1 - sum(dbinom(x=0:2, size=5, prob=1/6))

Each method would produce the result of:

[1] 0.9645062

Conversely, you could also generate a result which indicates the value of the probability of
2 dice having a value of "6", given five dice.

# In R, this would be expressed in the code below: #

dbinom(x=2, size=5, prob=1/6)

Which would produce the result of:

[1] 0.160751

In the next article, I will discuss a similar concept, known as the "Poisson Distribution". Stay tuned data enthusiasts.

Reflections of a Data Scientist

Monday, August 21, 2017

(R) Binomial Distribution

No comments:

Post a Comment