Does Wealth Matter in Tennis?

Sidharth Hejamadi
7 min readMay 17, 2022

A Statistical Analysis — By Sidharth Hejamadi

My Connection To tennis

As a little kid, I was always fascinated by the sport of Tennis. I picked up the sport while playing with my friends. As I started to get better, I seriously considered trying out for my school’s tennis team. With lots of hard work and a tad bit of luck, I successfully made it into my high school’s junior varsity team. As the season progressed, we played many different teams throughout the Chicago land area. We won some games and lost others, but as we played more games, we noticed a pattern. We always lost to schools with fancy houses. Whenever we lost by a large margin, the school would boast a large median household income. This sparked an interesting question, does wealth affect tennis?

Test for Regression Slope

To test this hypothesis, we will be using a methodology called Test for Regression Slope. This test allows us to see if there is a proper correlation between two different quantitative variables. The variables we will use to test this hypothesis are the rank a certain school achieved and the school’s city’s median income. This will allow us to see if we can use a school’s median income to predict its position in the IHSA State Tennis Tournament.

When using this test, we have to have two different hypotheses that will allow us to achieve a definitive answer. For this, we have to understand the different types of hypotheses that exist. First exists a null hypothesis. The purpose of this hypothesis is to have a default hypothesis that we agree with. Generally, the null hypothesis for a Test for Regression Slope will equal 0.

Ho: Β1 = 0

If two variables have 0 association, the slope of the points will be 0. The alternative hypothesis in the case of this study will be that the slope of the least square regression line will not be 0.

Ha: Β1 ≠ 0

In addition to these hypotheses, we have to come up with a set alpha level that we adhere to. When the probability of our least squares regression line (lsrl) is higher than the alpha level, we fail to reject the null hypothesis, but in the case that the probability is lower than the alpha level, then we choose the alternative hypothesis. Traditionally, the alpha level chosen for most studies is 0.05

α=0.05

If the probability given by the test is lower than this value, we reject the null hypothesis.

Sampling Methods

To come to a conclusion, we have to sample a population of tennis playing schools based in our home town of Illinois. Additionally, we have to find a reliable source of median income for each of the schools.

For rankings of the schools, we will utilize the IHSA tennis website to note down the score of all the schools. This source allows us to see the proper ranking of all the schools.

For the median income of schools, we utilized the US Census to find the median income of cities that the schools are from. The cities provide a proper median income as the schools can be seen as samples of the city.

Can we use this test?

There are a couple of different tests that we have to conduct before assessing the correlation between the different quantitative variables. This allows us to accurately give an answer to this problem without bias or inaccuracy.

Linear

The first test that we have to conduct is a linear test. If the data set in question doesn’t look linear or has another pattern, this hypothesis test doesn’t accurately give accurate results.

Graph Without Trendline
Graph With Trendline

Through this graph we can get a glimpse of how this graph has linear attributes which hint to the association being that of a negative rank. This data is sufficient is showing that the data in question possess linear qualities.

Residual Plot

A residual displays the difference between the actual statistic and the predicted statistic. When testing for regression slope, we have to make sure that the residual plot of the sample is random. When a pattern is noticed, this test is inconclusive.

Independence

The second condition that we have to assess is the test for independence. This test allows us to make sure that our sample in question is independent of each other. To do this, we have to make sure that 1/10 of all participating cities is greater than the sample size.

1/10(All participating cities)>19

(All participating schools in the IHSA Tournament)> 190

Normality

The third test we have to utilize get a conclusive answer from this test is the normality test. As the data set contains utilizing medians, a quantitative value, we would use the Central Limit Theorem to assess normality. The Central Limit Theorem states that a sample is normal if the sample size is greater than 30. Sadly, we only have 19 data entries. This prompts us to use a histogram of the residuals to see if the data set is roughly normal.

The histogram doesn’t provide us with any strong outliers or strong skew which allows us to safely assume the normality of this distribution.

Equal Standard Deviations of Residuals

The standard deviation of the residuals calculates how much the data points spread around the regression line. The result is used to measure the error of the regression line’s predictability. As we plotted the residuals before hand, we can see that the residual plot displays similar scatter throughout the plot. This allows us to properly assume that the Standard Deviation of the residuals is is roughly equal throughout plot.

Random

The last test which is most definitely not the least is choosing a random sample. Unfortunately, due to the lack of data available in the IHSA Tennis Bracket the sample selected isn’t random. This being said, we will PROCEED WITH CAUTION. This will allow us to see trends and look at our results with some skepticism.

Data

This data might seem like random numbers, but they possess important information that can allow us to come to a conclusion. First, we can use the a and b values to come up with an equation that can predict the income of the team given the rank they got. This equation turns out to be

y=169864.5–5295x

Y is the predicted median income and X is the rank that the certain team got. This model is the predicted median income using our data model.

So…

Using the p-value given above, we can confidently reject the null hypothesis — there is an association between the rank of a team and their city’s median income. This trend clearly demonstrates that teams/schools with more money generally do better in tennis tournaments.

History

Tennis, from the beginning, has been a sport played by royalty. In the 16th century, Tennis was spread throughout the world through royalty in Europe. This trend had continued for several years. Only recently, has the spread of this sport reached the hands of common people. This has left a disparity between the wealthy and the not so wealthy. Costing thousands of dollars in coaching, Tennis is a sport that can be mastered by people who have the time and the money to afford.

Change

Although it may seem hard, there are different things that we can do to battle this disparity. Donating to non-profit organizations such as National Tennis Foundation can help underprivileged kids get proper training. Additionally, sponsoring local tennis players to get to a greater level is another way people can help.

Want to learn more?

This video is a great resource to learn about the fundamentals of this test.

Sources

--

--

Sidharth Hejamadi

Hi my name is Sidharth Hejamadi, and I am an aspiring engineering working in many robotics related fields. I also have a passion for math and statistics.