DataScience310

For this project, I will be exploring the world of baseball. In this research I hope to examine the role launch angle( and exit velocity have in the importance of baseball games. The goal of this project is to both find the most optimal launch angle and exit velocity for the best outcome possible for a batter. Once these results are found, I hope to explore the predictive power of multiple machine learning techniques to see which is the best at predicting the outcome of an at-bat. This will not be the easiest with the data I have selected, but I hope to determine a way for these prediction. I will explore this more in the data analysis section. This exploration is important because the launch angle and exit velocity data has only just come into being in Major League Baseball. There is still a lot of practical knowledge that can be gained from studying new information. My central research question is “What is the most optimal launch angle and exit velocity for a batter, and which machine learning model is the best at predicting the outcomes of at-bats?”

The data I am using for this project is from baseballsavant.mlb.com, a leader in storing launch angle and exit velocity statistics. There are two main tables I will be using, a table with all the launch angles from 2019 and the at-bats’ outcomes and a table with all the exit velocities from 2019 with the outcomes. With these I hope to determine the optimal launch angle and exit velocity. Each of these tables have the different batting outcomes, single, double, triple, home run, batting average, and weighted on-base average. For the launch angle table, the range of possible launch angles is -89 degrees to +89 degrees, but some of these may be lacking as no one hit a ball at that specific angle. For the exit velocity, each row will be the different speeds in miles per hour. My goal with the predictions from the models is to split rows into training and testing, and seeing how well the models are able to predict the quantities of hit types.

Due to the fact that I will be comparing models on their effectiveness, I plan on using three different models. I plan on using a simple linear regression model, a random forest, and a neural network. I hope these give me a range of different types of models with different predictive strengths. They will each be compared in how well they are able to predict the slugging percentage(1B + 2 * 2B + 3 * 3B + 4 * HR)/AB. This will be an important measure to see how good these models are doing and which are the best models.