Saturday, February 24, 2024

(R) Ethiopian Multiplication


It is unfortunate that Africa is often overlooked as it pertains to the continent’s pre-colonial achievements. In today’s article, we’re going to examine the manner in which Ethiopians once performed multiplication calculations.

Interestingly enough, this method seems to almost purposely avoid the inclusion of decimal figures within its algorithmic components. I’m not sure if this was the result of a greater aversion to irrational figures, or simply just a coincidence of the consequence of its design.

The methodology which I will be referencing throughout this entry can be found below:

“Ethiopian Binary Math | the Engines of Our Ingenuity.” Engines.egr.uh.edu, engines.egr.uh.edu/episode/504. Accessed 24 Feb. 2024.

Example of Method Application

For our example, we will multiply the values of 9 and 115.

If we were to perform this multiplication exercise though the utilization of our modern multiplication algorithm, we would take a far different approach. However, if we were to approach this mathematical inquiry from the Ethiopian perspective, we would take the following steps.


We would begin by recording the first initial value (9), and then record the quotient of that value by dividing by 2, but not before discarding any potential remainder. Next, we would divide that value (4) by 2, recording the quotient after discarding any potential remainder. We would continue this process until our final recorded value was 1.

After having dealt with the first value, we would record the second initial value (115), and then record the product of that value multiplied by 2. Next, we would then record that value’s (115) product after multiplying by 2 (230). We would then continue this process until the number of values within the second column, equals the number of values contained within the first column.

To come to the final sum, we would add all of the values within the Value 2 column which possess a corresponding odd number entry within the Value 1 column.

9 and 1 are both odd numbers, and their corresponding entries are 115 and 920. Therefore, by summing 115 and 920, we reach the value of 1035.

If we were to reverse the order of multiplication, 115 multiplied by 9, as opposed to 9 multiplied by 115, our outcome would not change. Although, the column values would differ.


########################################################################### 

# The first value within the multiplication equation #


val1 <- 115

# The second value within the multiplication equation #

val2 <- 9

########################################################################### 

vec1 <- c()

vec2 <- c()


while (val1 != 0)

{

vec1 <<- c(vec1, val1)

print(val1)

val1 <- val1 - ceiling(val1/2)

}


# Number of observational entries for val1 #

n <- length(vec1)


while (n != 0)

{

vec2 <<- c(vec2, val2)

print(val2)

val2 <- val2 * 2

n <<- (n - 1)

}


# Create data frame #

Eth_Frame <- data.frame(vec1, vec2)


# Function which identifies odd elements within a vector #

odds <- function(x) subset(x, x %% 2 != 0)


# Identify odd elements #

odd_values <- odds(vec1)


# Match odd elements within odd_values vector to Eth_Frame variable 'vec1' #

Eth_Frame$odd_vec <- odd_values[match(Eth_Frame$vec1, odd_values)]


# Remove all observational values in which vec1 = odd_vec within 'Eth_Frame' #

Eth_Frame <- subset(Eth_Frame, Eth_Frame$vec1 == Eth_Frame$odd_vec)


# Sum all remaining elements within variable 'vec2', within data frame 'Eth_Frame' #

sum(Eth_Frame$vec2)

########################################################################### 

Corresponding output:

> sum(Eth_Frame$vec2)
[1] 1035

########################################################################### 

If you would like to see the contents of each column prior to the calculation being completed, print the Eth_Frame to the console prior to defining the odds function. Also, try reversing each value’s entry within the code (val1 <- 9 val2 <- 115), to verify that order does not play a role in determining the algorithm’s outcome.

Final Thoughts

I’m not going to pontificate too deeply on the concept detailed within this entry. However, some scholars have found similarities when comparing this method of multiplication, to the way in which digital computing methodologies handle similar tasks. I do find it fascinating that the genesis of many of the concepts which empower modernity, seem to find their un-actualized origins within prior historic periods.

It is also thought provoking in that, the aversion to zero and irrationality in mathematics which began in Greece, continued to haunt the west, and some western adjacent societies for an absurd duration of time. 

For more on that topic, please refer to the link below: 

Matson, John. “The Origin of Zero.” Scientific American, 21 Aug. 2009, www.scientificamerican.com/article/history-of-zero/.

Until next time.

-RD

Monday, February 12, 2024

(R) Is Taylor Swift’s Presence Significantly Impacting NFL Game Outcomes?

I hope that everyone enjoyed Super Bowl 58. Regardless of the outcome, I hope that we can all agree that the athleticism displayed by both the 49ers and the Chiefs, made this particular game, one for the ages. 

I was planning on writing this article even before the contest was decided. However, I feel that it’s more relevant now, given the game’s results.

So, let’s ask the question:

Is Taylor Swift’s Presence Significantly Impacting NFL Game Outcomes (for Kansas City)?

To answer this question, we’ll be employing two separate tests and hypotheses.

Both assessments will utilize the Welch Two Sample T-Test methodology.

Our hypotheses are:

H0: (Null) – There was not a significant difference as it pertains to Kansas City’s total offensive yards per game with Taylor Swift in attendance.

HA: (Alternative) – The was a significant difference as it pertains to Kansas City’s total offensive yards per game with Taylor Swift in attendance.

~ AND ~

H0: (Null) – There was not a significant difference as it pertains to Kansas City’s total yards allowed per game with Taylor Swift in attendance.

HA: (Alternative) – The was a significant difference as it pertains to Kansas City’s total yards allowed per game with Taylor Swift in attendance.



Hogan, Kate. “Every Time Taylor Swift Went to a Kansas City Chiefs Game - and Whether or Not the Team Won.” Peoplemag, PEOPLE, 12 Feb. 2024, people.com/every-time-taylor-swift-went-to-kansas-city-chiefs-game-if-they-won-8410511. 

“2023 Kansas City Chiefs Rosters, Stats, Schedule, Team Draftees, Injury Reports.” Pro, www.pro-football-reference.com/teams/kan/2023.htm. Accessed 12 Feb. 2024.

###########################################################################

CSV File:

,KC - 2023 NFL Season - Team Statistics,,,
Week,Offense (TotYd),Defense (TotYd),Opponent,Taylor Swift (Y/N)
1,316,368,DET,N
2,399,271,JAX,N
3,456,203,CHI,Y
4,401,336,NYJ,Y
5,333,329,MIN,N
6,389,197,DEN,Y
7,483,358,LAC,Y
8,274,240,DEN,N
9,267,292,MIA,N
10,Bye Week,,,
11,336,238,PHI,N
12,360,358,LVR,N
13,337,382,GNB,Y
14,346,327,BUF,Y
15,326,206,NEW,Y
16,308,205,LVR,Y
17,373,263,CIN,Y
18,268,353,LAC,N

###########################################################################

## Code for Testing and Analysis ##

Offense_TotYd <- c(316, 399, 456, 401, 333, 389, 483, 274, 267, 336, 360, 337, 346, 326, 308, 373, 268)

Defense_TotYd <- c(368, 271, 203, 336, 329, 197, 358, 240, 292, 238, 358, 382, 327, 206, 205, 263, 353)

Offense_TotYd_No <- c(316, 399, 333, 274, 267, 336, 360, 268)

Defense_TotYd_No <- c(368, 271, 329, 240, 292, 238, 358, 353)

Offense_TotYd_Swift <- c(456, 401, 389, 483, 337, 346, 326, 308, 373)

Defense_TotYd_Swift <- c(203, 336, 197, 358, 382, 327, 206, 205, 263)

t.test(Offense_TotYd_No, Offense_TotYd_Swift, paired = FALSE, conf.level = 0.95)

t.test(Defense_TotYd_No, Defense_TotYd_Swift, paired = FALSE, conf.level = 0.95)

sd(Offense_TotYd_No)

sd(Offense_TotYd_Swift)

sd(Defense_TotYd_No)

sd(Defense_TotYd_Swift)

###########################################################################


Findings


There was a significant difference as it pertains to Kansas City’s total offensive yards per game (2023) with Taylor Swift in attendance (n = 9), as compared to total offensive yards per game with Taylor Swift not in attendance (n = 8). Kansas City’s total offensive yards per game with Taylor Swift in attendance (M = 379.89, SD =59.23), Kansas City’s total offensive yards per game with Taylor Swift not in attendance (M = 319.13, SD = 47.67), Conditions; t(14.878) = -2.341, p = 0.03.

There was not a significant difference as it pertains to Kansas City’s total yards allowed per game (2023) with Taylor Swift in attendance (n = 9), as compared to total yards allowed per game with Taylor Swift not in attendance (n = 8). Kansas City’s total yards allowed per game with Taylor Swift in attendance (M = 275.22, SD = 75.69), Kansas City’s total yards allowed per game with Taylor Swift not in attendance (M = 306.13, SD = 53.03), Conditions; t(14.294) = 0.98307, p = 0.34.

Conclusions

While Tay Tay’s presence was generally beneficial to the Kansas City Chiefs as it pertains to the assessed metrics (Offensive Yards per Game, Yards Allowed per Game), only the offensive side of the ball saw a significant differentiation in performance (p = .03). Perhaps T. Swizzle provided the Kansas City Chiefs with some extra oomph via her top 40 mojo. Whatever the case may be, Tayla Swiff continues to be a good luck charm for the Chiefs whenever she is in attendance. 

Monday, January 29, 2024

(R) Is Motion Prior to the Snap Correlated with Wins per Season?

The year of our Lord 2023, has been an anemic year as it pertains to NFL offensive prowess. There are numerous articles written as to why this may be the case. I personally believe that it may be due to an over reliance on the passing game. While this approach was optimal throughout prior years, recent defensive innovations devised to temper this phenomenon, have since severely limited its present effectiveness.

There are many methodologies which can be utilized to counteract the effectiveness of newly emergent algorithmic football defenses. The tried and true amongst such, is the application of motion prior to the snap.

In doing research for this article, I stumbled upon the following data:



So, in the spirit of upcoming Super Bowl LVIII (49ers vs Chiefs), let’s analyze this information in order to discern whether team wins are correlated with pre-snap motion.

###########################################################################

# Getting the Libraries in Order (for Graphical Output) #

library(ggpubr)

library(tseries)

# Motion Prior to Snap (through Week 6) #

# Data Collected by ESPNStatsInfo #

# Populate Data Frame #

Team <- c('Dolphins', 'Rams', '49ers', 'Lions', 'Packers', 'Chargers', 'Seahawks', 'Falcons', 'Ravens', 'Titans', 'Giants', 'Bears', 'Chiefs', 'Colts', 'Steelers', 'Texans', 'Jaguars', 'Jets', 'Broncos', 'Vikings', 'Washington', 'Bengals', 'Patriots', 'Buccaneers', 'Cardinals', 'Bills', 'Browns', 'Panthers', 'Saints', 'Raiders', 'Eagles', 'Cowboys')

Motion_Percentage <- c(80.2, 65.4, 77.5, 62.3, 58.3, 55.6, 48.9, 61.1, 49.5, 48, 44.3, 56.1, 68.3, 45.4, 45.4, 52.3, 44.9, 35.5, 38.1, 46.3, 53.2, 44.6, 51.1, 43.9, 39.9, 48.7, 47.3, 35.4, 28.5, 52.9, 21.7, 42.1)

Wins_2023 <- c(11, 10, 12, 12, 9, 5, 9, 7, 13, 6, 6, 7, 11, 9, 10, 10, 9, 7, 8, 7, 4, 9, 4, 9, 4, 11, 11, 2, 9, 8, 11, 12)

Motion_Report <- data.frame(Team, Wins_2023, Motion_Percentage)

# Derive Mean and Standard Deviation #

mean(Motion_Report$Motion_Percentage)

sd(Motion_Report$Motion_Percentage)

mean(Motion_Report$Wins_2023)

sd(Motion_Report$Wins_2023)

# Apply Correlation Methodology $

cor.test(Motion_Report$Motion_Percentage, Motion_Report$Wins_2023)

# Create Graphic Visualizations #

data <- data.frame(Motion_Percentage, Wins_2023)

ggscatter(data, x = "Motion_Percentage", y = "Wins_2023",

add = "reg.line", conf.int = TRUE,

cor.coef = TRUE, cor.method = "pearson",

xlab = "Rate of Motion (Through Week 6)", ylab = "Season Wins (2023)")

###########################################################################


Findings


There was a positive correlation between the two variables: Season Wins (2023) (n = 32) and Rate of Motion (Through Week 6) (n = 32). Season Wins (2023) (M = 8.5, SD = 2.747), Rate of Motion (Through Week 6) (M = 49.772, SD = 12.526), Conditions; t(30) = 1.4809, p = .15. Pearson Product-Moment Correlation Coefficient: r = .26.

###########################################################################

Conclusions

While the p-value findings (p = .15) can be viewed as non-significant at the alpha level of .05, we must take into account that there are certain experimental limitations which will innately confound our results. For one, wins per season is zero sum. Meaning, that a win for one team, is always loss for another. Also, as both motion and wins per season are discrete variables, there is a limited predefined range of differentiation which exists between each variable. This, combined with wins per team being non-independent, reduces the test result to a generalization.

Success of motion implementation is being assessed solely on wins, and the mechanism for generating such is being assessed by offensive motion alone. Our assessment does not account for strength of schedule, team defensive prowess, player fundamentals, etc.

However, that being said, with a p = .15, and a correlation coefficient value of r = .26, it likely is more fortuitous, all things being equal, to implement an offense which possesses pre-snap motion. There certainly are many other factors which can determine outcome which are not assessed within this model, but in all likelihood, they will not have a large impact upon the overall findings.

Monday, October 16, 2023

The Friendship Paradox

Like many of the critical attributes of life, that which is most evident, lies obscured by monotony. This is especially true as it pertains to mathematical paradoxes, as the most enlightening insights within the field, have the habit of appearing both obvious and universally evident after discovery. Like many mystical traditions, these insights are best discovered through contradiction and reduction.

The Paradox

The Friendship Paradox, in simpler terms, identifies the common phenomenon in which an individual, typically possesses less friends than his friends. Additionally, the sum of friends which his friends possess, will be greater than the sum of his total number of friends.

This paradox possesses wide reaching implications, as it describes events which are self-arising and irrefutable. However, before we can detail applicability, we must demonstrate the paradox as it was initially discerned.

First, let’s get some terminology down.

In graph theory, circular graphics are known as nodes, or vertexes. The lines which illustrate relationships between the nodes are known as edges.


Now, let’s utilize this style of graphical representation to demonstrate the relationships between 5 individuals.


The chart below represents the above relationships, but in a different format.


As each relationship is symmetric, if one friend considers himself to be a friend of another individual, that individual also considers the initial individual to be his friend as well. As shown above, A is friends with B and E. B is friends with both A and E, and also friends with C.

If we derive the mean as it relates to the average number of friends that an individual possesses within our experiment, we come to the value of 2.8.


In this instance, E possesses the most friends, and every individual who is friends with E, possesses less. Therefore, the average number of friends that an individual within a group possesses (2.8), will likely be greater than the actual number of friends that a singular individual possesses. 

The Philosophical Implications   

If a single individual begins to quantify a particular phenomenon as it relates to their person on an individualized basis, or even as it relates to a novel phenomenon, then the natural consequence of this endeavor is that this individual from the onset will find himself at a disadvantage.

For example, a new creation upon its genesis, possessing autonomy, will immediately be concerned with attaining sustenance. This was not a concern which was possessed within the prior state of non-being. In contemplating one’s beauty, a young woman immediately begins to compare herself to those whom she perceives as being more beautiful. We would never anticipate the inverse to occur.

This is the paradox of living, striving to possess more while the value of that which we possess becomes diminished. This is due to the passage of time, but also due to singular possession of a resource also diminishing in value. Something within our possession loses value from the moment of possession, as both the individual and the possession are diminished by the natural passage of time.

Example (2):

Here is another example, if an individual walks into a crowded elevator filled with random strangers, then there is a greater probability that this individual will have the same number, or a greater number of friends, than each stranger within the elevator. However, if the same individual were invited to a party hosted by a friend, then there is a lesser probability of this individual possessing more friends, or a similar number of friends, as compared to each other party attendees. In the elevator scenario, there is no guarantee that any individual within the elevator possesses a single friend. This also includes the individual entering the already crowded elevator. However, in the party scenario, each party goer has at least one friend, that being - the party’s host. In this case, the count begins at the neutral value of 1, except for the case of the party’s host, who is friends with every individual in attendance.

As described above, the friendship paradox also seeks to demonstrate, “the sum of friends which his friends possess, will be greater than the sum of his total number of friends.”

In the case of our first example, this value would be calculated as follows:


To better illustrate this phenomenon, I’ve constructed a new example relationship diagram below:


In this instance, E has more friends than A, B, C, D.

E has 4 friends. While A, B, C, D each have 1 friend (E).

In total: A, B, C, D possess the same number of friends in sum (1+1+1+1).

If A, B, C, or D possessed one additional friend – F, then in total, they would together possess more friends in sum than E (1 + 1 + 1 + 2).

If this were the case, the paradox would hold, as E would have a total of 4 friends, but the total number of his friends of friends would be greater (5).

Conclusion

That's all for today.

I hope that you enjoyed this entry and will visit again soon.

-RD

(R) Utilizing Crowd Prediction Methodologies to Draft the Optimal Fantasy Football Team (II)

What would an application be without proof? A notion?

To prove that the ADP drafting strategy is superior to other ranking methodologies, I performed the following analysis.

#############################################################################

Data Source(s):

#############################################################################


ADP

https://fantasydata.com/nfl/fantasy-football-leaders?position=1&season=2022&seasontype=1&scope=1&subscope=1&scoringsystem=2&startweek=1&endweek=1&aggregatescope=1&range=1

Offense

https://fantasydata.com/nfl/ppr-adp?season=2022&leaguetype=2&type=ppr

Kickers

https://fantasydata.com/nfl/fantasy-football-leaders?position=6&season=2022&seasontype=1&scope=1&subscope=1&scoringsystem=2&startweek=1&endweek=1&aggregatescope=1&range=1

DST

https://fantasydata.com/nfl/fantasy-football-leaders?position=7&season=2022&seasontype=1&scope=1&subscope=1&scoringsystem=2&startweek=1&endweek=1&aggregatescope=1&range=1

#############################################################################

The Analysis (n = 300)

#############################################################################


ADP <- c(1.4, 2.4, 2.7, 4.2, 4.9, 5.8, 7, 7.2, 8.8, 10, 10.6, 12.4, 13.1, 14, 15.3, 16, 16.6, 17.9, 18.5, 19.1, 19.6, 19.8, 21.8, 22.7, 24.7, 25.9, 26.1, 27.3, 27.9, 29.3, 30.5, 31.6, 32.4, 33.2, 33.2, 34.1, 36.1, 38, 39.1, 39.3, 39.4, 39.7, 42.3, 43.2, 43.5, 44.2, 45.1, 46.5, 46.8, 47.3, 48.4, 50.2, 50.9, 51.5, 52.2, 53.9, 54.1, 56, 56.7, 57.9, 59.4, 61.2, 61.5, 62.1, 62.9, 63, 63.8, 63.9, 68, 68.1, 69.2, 69.9, 70.1, 71, 71.4, 71.7, 72, 72.9, 75.3, 75.4, 77.1, 77.9, 82.3, 83.8, 84.4, 84.8, 85, 86.7, 87.9, 89.5, 90.2, 90.2, 90.7, 91.3, 91.7, 91.8, 94.4, 95.9, 96.2, 98.5, 98.7, 99.1, 101.9, 102.6, 102.7, 103.6, 103.7, 105.4, 106.5, 106.7, 107.8, 108.6, 109.7, 109.9, 110.6, 111.6, 113.1, 113.6, 113.9, 114.9, 117.3, 117.7, 118.5, 119, 119.6, 120.5, 121, 121.2, 121.3, 122.7, 124.2, 125, 128.9, 129.8, 130.2, 130.2, 130.3, 130.8, 131.2, 132, 135.1, 135.9, 136.1, 136.3, 137.9, 138.9, 140.1, 140.9, 141.8, 142.7, 143.8, 144.8, 145.3, 145.4, 145.7, 147, 147, 148.1, 148.4, 148.9, 149, 149.4, 151.3, 151.4, 152.1, 152.5, 152.6, 153.8, 154.9, 155.6, 155.7, 156.7, 156.8, 156.8, 158, 158.5, 158.8, 158.9, 159.4, 160.1, 160.5, 160.7, 161, 161.5, 162, 163, 163.5, 164, 164.6, 164.6, 165, 165.2, 165.6, 166, 167, 168, 169, 169.5, 170, 171, 172, 173, 174, 175, 176, 177, 178, 178, 179, 180, 180, 181, 182, 183, 184, 185, 186, 187, 187, 188, 189, 190, 190, 191, 192, 192, 193, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265)


PPR <- c(146.4, 356.36, 372.7, 302.76, 368.66, 201.4, 223.46, 237.8, 242.4, 239.5, 211.7, 335.5, 191.1, 316.6, 248.6, 284, 316.3, 301.6, 281.4, 395.52, 168.4, 42, 226.1, 347.2, 216.5, 190.5, 225.4, 185.8, 200.2, 164, 299.6, 75.6, 220.9, 281.26, 141.3, 205.1, 199.1, 229, 417.4, 177.7, 81.2, 176.5, 43.6, 200.5, 180.7, 115.1, 159.4, 259.2, 350.7, 328.3, 167.6, 84.8, 226.8, 236.08, 51.1, 84.9, 98.3, 378.04, 222.8, 145.6, 90.9, 204.2, 216.7, 200.52, 142.7, 52.2, 171.6, 180, 156, 101.5, 126.8, 248.8, 74.2, 166.4, 79, 177.9, 225.76, 267.6, 141.2, 185.3, 249.1, 239.2, 53.5, 246, 87.1, 198.6, 254.6, 271.66, 88.1, 115.6, 227.8, 151.7, 174.8, 105.7, 108.38, 148.7, 69.4, 202.5, 241.9, 88.6, 148.2, 12.46, 168.3, 219.08, 115.7, 178.6, 147.3, 57.3, 87.9, 159.4, 291.58, 55.8, 198.2, 112.7, 237.3, 126, 98.2, 73.5, 135.7, 165.9, 150, 43.4, 135, 105.4, 215.4, 225.9, 70.4, 230.92, 167.12, 139.1, 166.5, 87.7, 88.4, 102, 122.4, 7, 94.1, 114, 105.04, 155.28, 102.9, 160, 103.9, 101.6, 159, 102, 98, 161, 295.98, 295.62, 115, 87.9, 112, 133, 101, 103, 55.2, 43.92, 116, 97, 131, 180.3, 121, 130.6, 143, 142, 119.8, 215.7, 25.5, 125, 123.6, 99, 39, 110, 97.1, 130, 129, 97, 57, 18, 118, 154, 168, 100, 116.9, 20.1, 104, 98.2, 186, 97, 60.2, 110, 101, 52.2, 155.6, 93.8, 3.2, 110, 169.3, 97.6, 34.4, 161.1, 78.9, 25.8, 51.6, 112.3, 115, 176.3, 198.1, 106, 26.4, 82.7, 149.1, 117.8, 88.3, 81.7, 110.1, 83, 4.5, 50.1, 164.1, 73, 12, 57.68, 139, 61.6, 112, 77.4, 45.4, 46.5, 15.1, 35.5, 72.7, 75.2, 84.5, 110, 0.2, 142.1, 34, 12.2, 83, 82.6, 21.1, 77.2, 196.3, 11.4, 51.7, 8.4, 54.1, 161.24, 46.2, 8.6, 8.4, 13.8, 289, 37.8, 170.08, 128.4, 89.6, 112.8, 104.1, 284.32, 154.16, 59.3, 24.9, 114.5, 121.42, 158.5, 59.8, 98.92, 0, 115.1, 10.2, 181.52, 14.8, 4.2, 12.8, 18.8, 103.9, 196.56, 1.3, 11.8, 16.2, 26, 39.1, 43.1, 53.6, 103.8, 27.3, 303.88, 30.8, 64.5, 3.5, 73.88, 110.2, 64.7, 84.3, 30.5, 10.2, 70.3)

cor.test(ADP, PPR)

Which produces the output:

Pearson's product-moment correlation

data: ADP and PPR

t = -13.394, df = 298, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.6791003 -0.5370390

sample estimates:

cor

-0.6130004


#############################################################################

Creating the Visual Output

#############################################################################


my_data <- data.frame(ADP, PPR)

library("ggpubr")

ggscatter(my_data, x = "ADP", y = "PPR",

add = "reg.line", conf.int = TRUE,

cor.coef = TRUE, cor.method = "pearson",

xlab = "ADP (2022)", ylab = "Player Score (PPR)")

#############################################################################

Conclusion

#############################################################################


There was a negative correlation between the two variables: ADP (n = 300), and Player Score (PPR). ADP (M = 134.81, SD = 73.31), Player Score (PPR) (M = 133.33, SD = 86.66), Conditions; t(298) = -13.394 , p < .01. Pearson Product-Moment Correlation Coefficient: r = -0.61.

#############################################################################

As the findings indicate, there is a significant negative correlation as it pertains to ADP and player performance (PPR - 2022 Season). In plain terms, this means that we should typically expect to see better fantasy performance from players with lower ADP rankings. I hope that everyone enjoyed this article and my adherence to APA reporting standards. <3

Until next time. 

Stay cool, Data Heads.

-RD

Monday, October 9, 2023

Utilizing Crowd Prediction Methodologies to Draft the Optimal Fantasy Football Team

To win your league in Fantasy Football, or at least qualify for the playoffs, you don’t need to buy magazines, study tape, or watch ESPN. All that is required, is a general understanding of crowd psychology.

What is ADP?

ADP represents the average draft position for players in fantasy drafts. Each league, for each fantasy sport, typically displays this player value during the drafting process. This value, as the name suggests, is derived from where a particular player was selected by other “fantasy team owners”, during prior drafts.

As was discussed within a previous article demonstrating draft order and its impact on a fantasy team’s final placement, if each fantasy team owner drafted optimally within their respective position, we would expect the final standings to directly reflect the initial draft order.

However, how can an individual be sure that he is drafting optimally? The answer is simpler than one might assume. To achieve optimal drafting potential, one must adhere to drafting players with the best available ADP value throughout the drafting process.

Why does it Work?

Following the crowd consensus, should provide a fantasy participant with their best opportunity for victory. The strictest adherent to this methodology, will benefit most from the number non-adherents within their league. Let’s consider why this is the case.

While pundit, or site rankings of players are often determined by a single individual, or a group of informed individuals, ADP rankings are determined by draft consensus. Meaning, that there are more minds at work as it pertains to determining a player’s draft value. These ranks are also assigned through the drafting process. This differs from other ranking processes, in that, the act of drafting establishes the value. This is similar as to how market participants set prices through buying and selling assets. Whereas, the ranking process is more akin to the way in which planned economies function.   

We’ll assume that ADP perfectly correlates with the eventual points scored by each player within the league. Therefore, with this assumption in place, and also assuming that each league participant drafts optimally, we should expect to see point distributions resemble something like the graphic below. 


(In our example scenario, each subsequent player is valued at one point less than the previous player.)

Therefore, all things being equal, we would expect the final point totals for each league participant to be:


Gaining the Edge

In every instance, the largest advantage belongs to the team which drafts first, with diminishing advantage being assessed sequentially throughout the remaining draft order. To attempt to compensate for this diminishment, or to expand one’s edge regardless of draft position, a league participant should strongly adhere to the ADP value ranking system while drafting. By not attempting to gain an edge through self perceived insight, opportunities will arise as a result of opponents who attempt otherwise.

Every draft misstep is the micro-process of reallocating points from your team to another team within your league. In the example below, the teams highlighted in green are adhering to a strict ADP drafting strategy. The teams highlighted in red, instead are going for a less strict approach.


As is shown in the graphic, the ADP adhering teams were able to benefit from the mistakes made by their opponents. In each instance, the green teams were able to draft the players which were passed upon by their red counterparts. Thus, the ADP adhering teams increased their edge at the expense of the non-adhering teams.


I Know that I Know Nothing


The above strategy, functions on the foundation of two re-enforcing cognitive biases. One being the overestimation of one’s own abilities and talents, and the other being the discounting of the abilities and talents of others.

As far as football is concerned, I have personally witnessed friends who watch far more football than I do, who know far more about the players than I do, blow out their drafts, and fail to make their league’s playoffs in complex and interesting ways. In almost every case, the culprit tends to be impatience and exotic maneuvering. What’s also strange about this cohort of individuals, is that they tend to quickly abandon the strange individualized strategies which initially required a high level of conviction to attempt. This phenomenon itself might warrant an article in the future. 

Be sure to watch the waiver wire, as further edge can be gained from managers who prematurely release underperforming players. Also, it should be noted, that ADP rankings as a drafting criteria, are only applicable in leagues which do not utilize custom rule sets.

Sunday, July 23, 2023

Odd Man Out: The Problem with Serpentine Drafts

In this article, we’re going to continue the trend of discussing topics related to Fantasy Sports. Specifically, the innate problem which I see within the serpentine draft format. I feel that this topic is particularly appropriate for this time of year, as football fans are gearing up for their own fantasy league drafts.

If you’re unfamiliar with the serpentine draft format, it is best described as:

A serpentine draft, or sometimes referred to as a a "Snake" draft, is a type in which the draft order is reversed every round (eg 1..12, 12..1, 1..12, 12..1, etc.). For example, if you have the first pick in the draft, you will pick first in round one, and then last in round two.

Source: https://help.fandraft.com/support/solutions/articles/61000278703-draft-types-and-formats

I’ve created a few examples of this draft type below. The innate issue which I see within this draft format, pertains to the differentiation between the projected point value of each draft selection, as determined by a team’s draft position. The more teams present within a league, the greater the point disparity between teams.

Assuming that each team executed an optimized drafting strategy, we would expect the outcome to resemble something like the illustration below.

Each number within a cell represents the best player value available to each team, each round. The green cells contain starting player values, and the grey cells contain back-up player values.


As you can observe from the summed values below each outcome column, each team possess a one point advantage against the team which selected subsequently, and a one point disadvantage against the team which selected previously. The greatest differentiation occurring between the team which made the first selection within the draft order, as measured against the team which made the last selection within the draft order: 11 (1026 – 1015).

As previously mentioned, the less teams within a league, the less the number of selection rounds. As a result of such, there is less of a disparity between the teams which pick earlier within the order, as compared to teams which pick later within the order.

Below is the optimal outcome of a league comprised of ten teams.


While the single point differentiation persists between consecutive teams within the draft order, the differentiation between the first selector, and the last selector, has been reduced to: 9 (856 – 847).

This trend continues across ever smaller league sizes: 7 (1024 – 1017).


In each instance, we should expect the total differentiation of points between the first draft participant, and last draft participant, (if optimal drafting occurred), to be equal to: N – 1. Where N = the number of total draft participants within the league.

All things being equal, if each team is managed optimally, we should expect the first team within each draft to finish first within each league. Second place would belong to the team which drafted second. Third place belonging to the team which drafted third, and so on, etc.

If all players are equally at risk of being injured on each fantasy team, then this occurrence does little to upset the overall ranking of teams by draft order. It must be remembered that teams which drafted earlier within the order, will also possess better replacement players as compared to their competitors. Therefore, when injuries do occur, later drafting teams will be disproportionately impacted.

I would imagine that as AI integration begins to seep into all aspects of existence, that the opportunity for each team owner to draft with consistent optimization, will further stratify the inherent edge attributed to serpentine draft position. As it stands currently, there is still an opportunity for lower draft order teams to compete if one or more of their higher order competitors blunder a selection. 

In any case, I hope that what I have written in this article helped to describe what I like to refer to as the, “Odd Man Out” phenomenon. I hope to see you again soon, with more of the statistical content which you crave.

-RD

Monday, July 17, 2023

(R) Daily Fantasy Sports Line-up Optimizer (Basketball)

I’ve been mulling over whether or not I should give away this secret sauce on my site, and I’ve come to the conclusion that anyone who seriously contends within the Daily Fantasy medium, probably is already aware of this strategy. 

Today, through the magic of R software, I will demonstrate how to utilize code to optimize your daily fantasy sports line-up. This particular example will be specific to the Yahoo daily fantasy sports platform, and to the sport of basketball.

I also want to give credit, where credit is due.

The code presented below is a heavily modified variation of code initially created by: Patrick Clark.

The original code source case be found here: http://patrickclark.info/Lineup_Optimizer.html

Example:

First, you’ll need to access Yahoo’s Daily Fantasy page. I’ve created an NBA Free QuickMatch, which is a 1 vs. 1 contest against an opponent where no money changes hands.



This page will look a bit different during the regular season, as the NBA playoffs are currently underway. That aside, our next step is to download all of the current player data. This can be achieved by clicking on the “i” bubble icon.



Next, click on the “Export players list” link. This will download the previously mentioned player data.

The player data should resemble the (.csv) image below:



Prior to proceeding to the subsequent step, we need to do a bit of manual data clean up.

Any player who is injured or not starting, I removed from the data set. I also concatenated the First Name and Last Name fields, and placed that concatenation within the ID variable. Next, I removed all variables except for the following: ID (newly modified), Position, Salary, and FPPG (Fantasy Points Per Game).

The results should resemble the following image:



(Specific player data and all associated variables will differ depending on the date of download)

Now that the data has been formatted, we’re ready to code!

###################################################################

library(lpSolveAPI)

library(tidyverse)

# It is easier to input the data as an Excel file if possible #

# Player names (ID) have the potential to upset the .CSV format #

library(readxl)

# Be sure to set the played data file path to match your directory / file name #

PlayerPool <- read_excel("C:/Users/Your_Modified_Players_List.xlsx")

# Create some positional identifiers in the pool of players to simplify linear constraints #

# This code creates new position column variables, and places a 1 if a player qualifies for a position #

PlayerPool$PG_Check <- ifelse(PlayerPool$Position == "PG",1,0)

PlayerPool$SG_Check <- ifelse(PlayerPool$Position == "SG",1,0)

PlayerPool$SF_Check <- ifelse(PlayerPool$Position == "SF",1,0)

PlayerPool$PF_Check <- ifelse(PlayerPool$Position == "PF",1,0)

PlayerPool$C_Check <- ifelse(PlayerPool$Position == "C",1,0)

PlayerPool$One <- 1

# This code modifies the position columns so that each variable is a vector type #

PlayerPool$PG_Check <- as.vector(PlayerPool$PG_Check)

PlayerPool$SG_Check <- as.vector(PlayerPool$SG_Check)

PlayerPool$SF_Check <- as.vector(PlayerPool$SF_Check)

PlayerPool$PF_Check <- as.vector(PlayerPool$PF_Check)

PlayerPool$C_Check <- as.vector(PlayerPool$C_Check)

# This code orders each player ID by position #

PlayerPool <- PlayerPool[order(PlayerPool$PG_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$SG_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$SF_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$PF_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$C_Check),]

# Appropriately establish variables in order to perform the "solver" function #

Num_Players <- length(PlayerPool$One)

lp_model = make.lp(0, Num_Players)

set.objfn(lp_model, PlayerPool$FPPG)

lp.control(lp_model, sense= "max")

set.type(lp_model, 1:Num_Players, "binary")

# Total salary points avalible to the player #

# In the case of Yahoo, the salary points are set to ($)200 #

add.constraint(lp_model, PlayerPool$Salary, "<=",200)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$PG_Check, "<=",3)

add.constraint(lp_model, PlayerPool$PG_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$SG_Check, "<=",3)

add.constraint(lp_model, PlayerPool$SG_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$SF_Check, "<=",3)

add.constraint(lp_model, PlayerPool$SF_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$PF_Check, "<=",3)

add.constraint(lp_model, PlayerPool$PF_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type (only require one (C)enter) #

add.constraint(lp_model, PlayerPool$C_Check, "=",1)

# Total Numner of Players Needed for the entire Fantasy Line-up #

add.constraint(lp_model, PlayerPool$One, "=",8)

# Perform the Solver function #

solve(lp_model)

# Projected_Score provides the projected score summed from the optimized projected line-up (FPPG) #

Projected_Score <- crossprod(PlayerPool$FPPG,get.variables(lp_model))

get.variables(lp_model)

# The optimal_lineup data frame provides the optimized line-up selection #

optimal_lineup <- subset(data.frame(PlayerPool$ID, PlayerPool$Position, PlayerPool$Salary), get.variables(lp_model) == 1)


If we take a look at our:

Projected_Score

We should receive an output which resembles the following:

> Projected_Score
    [,1]
[1,] 279.5

Now, let’s take a look at our:

optimal_lineup

Our output should resemble something like:

PlayerPool.ID PlayerPool.Position PlayerPool.Salary
3 Marcus Smart PG 20
51 Bradley Beal SG 43
108 Tyrese Haliburton SG 16
120 Jerami Grant SF 27
130 Eric Gordon SF 19
148 Brandon Ingram SF 36
200 Darius Bazley PF 19
248 Steven Adams C 20

With the above information, we are prepared to set our line up.

You could also run this line of code:

optimal_lineup <- subset(data.frame(PlayerPool$ID, PlayerPool$Position, PlayerPool$Salary, PlayerPool$FPPG), get.variables(lp_model) == 1)

optimal_lineup


Which provides a similar output that also includes point projections:

PlayerPool.ID PlayerPool.Position PlayerPool.Salary PlayerPool.FPPG
3 Marcus Smart PG 20 29.8
51 Bradley Beal SG 43 50.7
108 Tyrese Haliburton SG 16 26.9
120 Jerami Grant SF 27 38.4
130 Eric Gordon SF 19 30.7
148 Brandon Ingram SF 36 43.2
200 Darius Bazley PF 19 29.7
248 Steven Adams C 20 30.1

Summing up PlayerPool.FPPG, we reach the value: 279.5. This was the same value which we observed within the Projected_Score matrix.

Conclusion:

While this article demonstrates a very interesting concept, I would be remiss if I did not advise you to NOT gamble on daily fantasy. This post was all in good fun, and for educational purposes only. By all means, defeat your friends and colleagues in free leagues, but do not turn your hard-earned money over to gambling websites.

The code presented within this entry may provide you with a minimal edge, but shark players are able to make projections based on far more robust data sets as compared to league FPPG. 

In any case, the code above can be repurposed for any other daily fantasy sport (football, soccer, hockey, etc.). Remember, only to play for fun and for free. 

-RD 

Sunday, July 9, 2023

(R) Bedford's Law

In today’s article, we will be discussing Benford’s Law, specifically as it is utilized as an applied methodology to assess financial documents for potential fraud:

First, a bit about the phenomenon which Benford sought to describe:

The discovery of Benford's law goes back to 1881, when the Canadian-American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages (that started with 1) were much more worn than the other pages. Newcomb's published result is the first known instance of this observation and includes a distribution on the second digit, as well. Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log(N + 1) − log(N).

The phenomenon was again noted in 1938 by the physicist Frank Benford, who tested it on data from 20 different domains and was credited for it. His data set included the surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader's Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death rates. The total number of observations used in the paper was 20,229. This discovery was later named after Benford (making it an example of Stigler's law).


Source: https://en.wikipedia.org/wiki/Benford%27s_law

So what does this actually mean in laymen’s terms?

Essentially, given a series of numerical elements from a similar source, we should expect certain leading digits to occur, and correspond to, a particular distribution patter. 



If a series of elements perfectly corresponds with Benford’s Law, then the elements within the series should follow the above pattern as it pertains to leading digit frequency. Ex. Numbers which begin the number “1”, should occur 30.1% of the time. Numbers which begin with the number “2”, should occur 17.6% of the time. Numbers which begin with the number “3”, should occur 12.5% of the time.

The distribution is derived as follows:



The utilization of Benford’s Law is applicable to numerous scenarios:

1. Accounting fraud detection

2. Use in criminal trials

3. Election data 

4. Macroeconomic data

5. Election data 

6. Price digit analysis 

7. Genome data 

8. Scientific fraud detection 

As it relates to screening for financial fraud, if the application of methodology related to the Benford’s Law Distribution returns a result in which the sample elements do not correspond with the distribution, then fraud is not necessarily the conclusion which we would immediately assume. However, the findings may indicate that additional data scrutinization is necessary. 

Example:

Let’s utilize Benford’s Law to analyze Cloudflare’s (NET) Balance Sheet (12/31/2021).



Even though it’s an un-necessary step as it relates to our analysis, let’s first discern the frequency of each leading digit. These digits are underlined in red within the graphic above.



What Benford’s Law will seeks to assess, is the comparison of leading digits as they occurred within our experiment, to our expectations as they exist within the Benford’s Law Distribution.



The above table illustrates the frequency of occurrence of each leading digit within our analysis, versus the expected percentage frequency as stated by Benford’s Law.

Now let’s perform the analysis:

# H0: The first digits within the population counts follow Benford's law #

# H1: The first digits within the population counts do not follow Benford's law #

# requires benford.analysis #

library(benford.analysis)

# Element entries were gathered from Cloudflare’s (NET) Balance Sheet (12/31/2021) #

NET <- c(2372071.00, 1556273.00, 815798.00, 1962675.00, 815798.00, 134212.00, 791014.00, 1667291.00, 1974792.00, 791014.00, 1293206.00, 845217.00, 323612.00, 323612.00)

# Perform Analysis #

trends <- benford(NET, number.of.digits = 1, sign = "positive", discrete=TRUE, round=1)

# Display Analytical Output #

trends

# Plot Analytical Findings #

plot(trends)


Which provides the output:

Benford object:

Data: NET
Number of observations used = 14
Number of obs. for second order = 10
First digits analysed = 1

Mantissa:

Statistic Value
    Mean 0.51
    Var 0.11
Ex.Kurtosis -1.61
    Skewness 0.25

The 5 largest deviations:

digits absolute.diff
1 8 2.28
2 1 1.79
3 2 1.47
4 4 1.36
5 7 1.19

Stats:


Pearson's Chi-squared test

data: NET
X-squared = 14.729, df = 8, p-value = 0.06464


Mantissa Arc Test

data: NET
L2 = 0.092944, df = 2, p-value = 0.2722

Mean Absolute Deviation (MAD): 0.08743516
MAD Conformity - Nigrini (2012): Nonconformity
Distortion Factor: 8.241894

Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

~ Graphical Output Provided by Function ~



(The most important aspects of the output are bolded)

Findings:

Pearson's Chi-squared test

data: NET
X-squared = 14.729, df = 8, p-value = 0.06464
Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!




A chi-square goodness of fit test was performed to examine whether the first digit of balance sheet items from the company Cloudflare (12/31/2021), adhere to Benford's law. Entries were found to be in adherence, with non-significance at the p < .05 level, χ2 (8, N = 14) = 14.73, p = 0.07.

As it relates to the graphic, in ideal circumstances, each blue data bar should have its uppermost portion touching the broken red line.

Example(2):

If you’d prefer to instead run the analysis simply as a chi-squared test which does not require the “benford.analysis” package, you can effectively utilize the following code. The image below demonstrates the concept being employed.



Model <- c(6, 1, 2, 0, 0, 0, 2, 3, 0)

Results <- c(0.30102999566398100, 0.17609125905568100, 0.12493873660830000, 0.09691001300805650, 0.07918124604762480, 0.06694678963061320, 0.05799194697768670, 0.05115252244738130, 0.04575749056067510)


chisq.test(Model, p=Results, rescale.p = FALSE)

Which provides the output:

    Chi-squared test for given probabilities

data: Model
X-squared = 14.729, df = 8, p-value = 0.06464


Which are the same findings that we encountered while performing the analysis previously.

That’s all for now! Stay studious, Data Heads! 

-RD

Tuesday, December 13, 2022

(R) Stein’s Paradox / The James-Stein Estimator


Imagine a situation in which you were provided with data samples from numerous independent populations. Now what if I told you, that combining all of the samples into a single equation, is the best methodology for estimating the mean of each population.

Hold on.

Hold on.

Wait.

You’re telling me, that combining independently sampled data into a single pool, from independent sources, can provide assumptions as it pertains to the source of each sample?

Yes!

And this methodology provides a better estimator than other available conventional methods?

Yes again.

This was the conversation which divided the math world in 1956.

Here is an article detailing the phenomenon and findings of Charles Steins from Scientific America (.PDF Warning): 

https://efron.ckirby.su.domains//other/Article1977.pdf

Since we have computers, let’s give the James-Stein’s Estimator a little test-er-roo. In the digital era, we are no longer forced to accept hearsay proofs.

(The code below is a heavily modified and simplified version of code which was originally queried from: https://bookdown.org/content/922/james-stein.html)

##################################################################################

### Stein’s Paradox / The James-Stein Estimator ###

## We begin by creating 5 independent samples generated from normally distributed data sources ##

## Each sample is comprised of random numbers ##

# 100 Random Numbers, Mean = 500, Standard Deviation = 155 #

Ran_A <- rnorm(100, mean=500, sd=155)

# 100 Random Numbers, Mean = 50, Standard Deviation = 22 #

Ran_B <- rnorm(100, mean=50, sd= 22)

# 100 Random Numbers, Mean = 1, Standard Deviation = 2 #

Ran_C <- rnorm(100, mean=1, sd = 2)

# 100 Random Numbers, Mean = 1000, Standard Deviation = 400 #

Ran_D <- rnorm(100, mean=1000, sd=400)

# I went ahead and sampled a few of the elements from each series which were generated by my system  #

testA <- c(482.154, 488.831, 687.691, 404.691, 604.8, 639.283, 315.656)

testB <- c(53.342841, 63.167245, 47.223326, 44.532218, 53.527203, 40.459877, 83.823073)

testC <-c(-1.4257942504, 2.2265732374, -0.6124066829, -1.7529138598, -0.0156957983, -0.6018709735 )

testD <- c(1064.62403, 1372.42996, 976.02130, 1019.49588, 570.84984, 82.81143, 517.11726, 1045.64377)

# We now must create a series which contains all of the sample elements #

testall <- c(testA, testB, testC, testD)

# Then we will take the mean measurement of each sampled series #

MLEA <- mean(testA)

MLEB <- mean(testB)

MLEC <- mean(testC)

MLED <- mean(testD)

# Next, we will derive the mean of the combined sample elements #

p_ <- mean(testall)

# We must assign to ‘N’, the number of sets which we are assessing #

N <- 4

# We must also derive the median of the combined sample elements #

medianden <- median(testall)

# Sigma2 = mean(testall) * (1 – (mean(testall)) / medianden #

sigma2 <- p_ * (1-p_) / medianden

# Now we’re prepared to calculate the assumed population mean of each sample series #

c_A <- p_+(1-((N-3)*sigma2/(sum((MLEA-p_)^2))))*(MLEA-p_)

c_B <- p_+(1-((N-3)*sigma2/(sum((MLEB-p_)^2))))*(MLEB-p_)

c_C <- p_+(1-((N-3)*sigma2/(sum((MLEC-p_)^2))))*(MLEC-p_)

c_D <- p_+(1-((N-3)*sigma2/(sum((MLED-p_)^2))))*(MLED-p_)

##################################################################################

# Predictive Squared Error #

PSE1 <- (c_A - 500) ^ 2 + (c_B - 50) ^ 2 + (c_C - 1) ^ 2 + (c_D - 1000) ^ 2

########################

# Predictive Squared Error #

PSE2 <- (MLEA- 500) ^ 2 + (MLEB - 50) ^ 2 + (MLEC - 1) ^ 2 + (MLED - 1000) ^ 2

########################

1 - 28521.5 / 28856.74

##################################################################################

1 - 28521.5 / 28856.74 = 0.01161739

So, we can conclude, through the utilization of MSE as an accuracy assessment technique, that Stein’s Methodology (AKA The James-Stein Estimator), provided a 1.16% better estimation of the population mean for each series, as compared to the mean of each sample series assessed independently.

Charles Stein really was a pioneer in the field of statistics as he discovered one of the first instances of dimension reduction.

If we consider our example data sources below: 


Applying the James-Stein Estimator to the data samples from each series’ source, removes the innate distance which exist between each sample. In simpler terms, this essentially equates to all elements within each sample being shifted towards a central point.

                     

Series elements which were already in close proximity to the mean, now move slightly closer to the mean. Series elements which were originally far from the mean, move much closer to the mean. These outside elements still maintain their order, but they are brought closer to their fellow series peers. This shifting of the more extreme elements within a series, is what makes the James-Stein Estimator so novel in design, and potent in application.

This one really blew my noggin when I first discovered and applied it.

For more information on this noggin blowing technique, please check out:

https://www.youtube.com/watch?v=cUqoHQDinCM


That's all for today.

Come back again soon for more perspective altering articles.

-RD