Analyzing MLB Savant Data with Google Colab and Python Programming, Introduction
The best dataset in the history of sports is undoubtedly the Baseball Savant pitch-by-pitch dataset that MLB updates daily.
This massive dataset provides the basis for us to derive countless insights from the data.
In this series of posts, I wanted to provide some Google Colab notebooks to anybody interested to help them learn how to use this dataset themselves.
What Is Google Colab?
https://research.google.com/colaboratory/
In their own words,
Colaboratory, or “Colab” for short, allows you to write and execute Python in your browser, with zero configuration required, free access to GPUs, and easy sharing.
Whether you’re a student, a data scientist or an AI researcher, Colab can make your work easier. Watch Introduction to Colab to learn more, or just get started below!
Google Colab gives us a free environment to code, and if you map your Google Drive, you have a free 10 gigabytes to save data off to as well. It’s super simple to get going and requires no technical knowledge or installations.
What Is The Baseball Savant Datset
Basically, it is just one giant table of data. Every row represents one pitch thrown in the MLB, with every pitch recorded. At its base, it gives us 93 pieces of information about every pitch thrown. You can find all of the columns listed with short descriptions of each at this link:
https://baseballsavant.mlb.com/csv-docs
It is too big to show all 93 columns, but here’s a subset of the data. This shows you Shohei Ohtani’s first ten homers of the 2021 season:
It will be very helpful to read through that CSV documentation link above just so you have a general idea of all the columns we have to play with.
Sample Notebook
Here’s the notebook I pieced together with much more, including some basics about Python and Google Colab as well as how to load and view the dataset. You can save the notebook to your own Google Drive and then open it from there in order to run the code and save off the data to your own account! Do that with this button:
You can also use this option to download the file:
and then you can upload that file into your own Google Colab from the intro screen (URL above) with this button:
Here is the link to the notebook:
Contact Me
The easiest way to contact me is on Twitter @JonPGH. My DMs should be open, so feel free to reach out!