Ohio State nav bar

Sports Data Sets

Sports Data Sets

Football

NFL

pro-football reference Pro-football reference includes NFL data, dating back to 1967. This data includes player statistics, all-time leaders, draft history, coaches, and much more. Statistics are updated by every week, no later than Tuesday at 6pm. Additional data can be found behind a paid subscription.

NFLSavant This website provides a csv of NFL play by play data from 2013-2022. The data is divided into several categories including team, play, down, formation, playtype, etc. This data can be very useful if you’re looking to do an NFL related project.

NFL Concussions This website provides a csv data set on concussions in the NFL from 2012-2014. This data includes the player, the game, position, how many weeks missed, etc. This information is very useful in researching NFL concussion data. This data could be compared with youth football participation rates to see how concussions impact the youths participation in football.

NFL and betting This website provides a data set comparing NFL game scores to the betting projections. This data shows the betting favorite, as well as the final score. This data set would be useful in trying to compare the outcome to whatever the betting odds were.

NFL PBP 2009-2016 This website provides csv data sets on play-by-play data in the NFL from 2009-2016. This data includes the basic statistics such as down, quarter, and yard line. It also includes more specific data like QB hit, expected points result, and air yards. This data would be useful in looking at NFL play-by-play trends over from 2009-2016.

NFLscrapR This github leads to an R package that makes scaping in game NFl data much easier. A lot of this data is based on play-by-play data. This package has data from almost every statistic tracked during an NFL game.

NFL Data Analytics This website provides a few basic csv data sets for the NFL. This data includes games, plays, players, and weekly data. This data is pretty basic, it would mostly be used if you’re trying to do some basic NFL research.

NFL Play This website provides NFL data since 2004. This data includes combine, tackles, kicks, and passer. This detailed data set allows for you to research a wide range of NFL statistics since 2004.

College

sports-reference college Sports-reference college has data from college football data since 1956. This data includes statistics, heisman winners, all-time leaders, bowl history and more. This website allows you to filter out individual schools you want to look at.

College Football Data This website provides data for college football. This data consists of play-by-play data, drive results, and historical ratings. As well as predictive statistics like EPA and WPA. This website is very useful in providing a variety of college football data.

cfbscrapR This is an R package that provides data on College Football. This data includes play-by-play data, as well as team statistics and records. This package also includes predictive statistocs like EPA and WPA. ‘cfbscrapR’ also includes general data of teams and conference.

Baseball

Baseball Reference Baseball Reference is a source of baseball data dating back to 1871. This data includes players, teams, statistics, all-time leaders, and much more. Baseball reference has a feature where you can filter out specific players or teams to look at their individual data. Additional data can be found behind a paid subscription.

Baseball Savant Baseball Savant is a source for various baseball data. This includes more advanced statistics such as xwOBA, barrell%, and much more. Baseball Savant also allows you to create a csv file and a visual with select statistics of your choice.

BaseballOdds This website provides data on baseball games in comparison to the betting odds entering the game. The data provided includes the opening and closing lines. This is a really interesting data source if you want to compare baseball scores with what the projected result was.

Baseball Databank This website provides csv data sets on baseball. This data covers a wide range of baseball topics. It includes data specifically for a players performance in the postseason. This would be very useful for any research related to baseball.

2016 Sportradar This website provides a csv data set for the 2016 MLB season. This data includes every pitch, steal, and lineup event for the regular and postseason in 2016. This data would be very useful if you wanted to research baseball events from the 2016 season.

Retrosheet This website provides data for several different baseball events. This includes postponed games, ejections, protested games, and no hitters. There is a lot of data here, it includes a variety of very specific topics. This would be a great resource to research specific events that have happened in baseball.

2016 Baseball This website provides csv data sets from MLB 2016 season. This data is very broad. This mostly just includes general information of each game such as teams, attendance, and duration. MLB is planning to implement a pitch clock, it would be useful to use this data compared to more modern data to compare the difference in duration.

KBO Pitching Data This website provides csv data sets of the KBO 1982-2021. The statistics used are very similar to those used in MLB. ERA, BB, K, etc. The KBO became more popular in the United States during 2020, while MLB was not going on due to the pandemic. This data set could be used to research and collect data about the KBO.

KBO Batting Data This website provides a csv data set on KBO batting data from 1982-2021. These statistics are very similar to those used in MLB. These include runs, hits, home runs, and ops. The KBO became more popular in the United States during 2020, while MLB was not going on due to the pandemic. This data set could be used to research and collect data about the KBO.

Baseball Height/Weight This website provides a csv data set on over 1000 MLB players height and weight. This data also includes their position. I think this allows for us to see a correlation between height/weight and position.

Fangraphs Fangraphs is a website that provides baseball data. This data includes standings, projections, scores, and teams. Fangraphs also has more player based data such as AVG, K%, and WOBA. This is one of the most well known baseball data sources. This source would be very helpful in researching any baseball data.

PyBaseball This website provides data for baseball that can be used in Python. This source uses statistics from Statcast to analyze baseball. This data includes pitch type, launch angle, and WOBA. This datasource uses advanced baseball analytics. This website would be helpful if you are a python user and are very familar with baseball analytics.

Baseballr Baseballr is an R package than can be used to analyze baseball data. Baseballr has scaped data from various sources and created an R package so it can easily be used. This data includes box scores, standings, sabermetrics, etc.

Hitters This website provides a csv data set MLB hitters. This data includes hits, home runs, RBIs, years, and more. This data set only covers more basic MLB statistics. This website would be good if you’re researching MLB hitter statistics.

Basketball

Professional

Basketball Reference Basketball-reference has data from basketball, both NBA and WNBA. The data has been tracked since 1946. This data includes statistics, leaders, teams, and draft history. Additional data can be found behind a paid subscription.

Basketball Dataset This website provides data sets on basketball. This data includes drafts, players, salary, game officials, etc. This website also provides data on the draft combine, so you can compare a players combine score to their draft position. This source would be great to research basketball data and statistics.

NBA Play-by-Play This website provides individual data sets for NBA play-by-play data from 2000-2020. This data is very detailed in going over each play from each of these seasons. This data would be useful if you were trying to compare shot and score data trends over the last 20 years. These data sets would also allow you to compare a players trends over the last 20 years.

NBA Games Data This website provides csv data sets on NBA game data. This data includes teams, rankings, games, and players. These data sets provide details about positions, minutes played, and conference of the teams. This data would be helpful in researching NBA game data.

NBA Player Stats This website provides data for NBA players since 1999. This data can be separated into regular season or playoffs. The data set provides players name, team, field goals made, three points made, etc. This data would be useful in comparing a players statistics over their career.

Player Statistics This website provides data for NBA player statistics since 2008. This data includes the basic statistical data such as PPG, APG, and RPG. This data also includes more advanced statistics such as eFG%, USG%, and VI. This source would be great to research NBA player data.

MJ, Kobe, and Lebron This website provides csv data sets on Michael Jordan, Lebron James, and Kobe Bryant. This data compares these players statistics based on their age. Some of these statistics include TS%, USG%, PER, etc. This source would be very useful if you wanted to compare some of the all time great NBA players.

Steph Curry This website provides data on Steph Curry since 2009. Widely considered the best shooter of all-time, this source provides data on Steph Curry since 2009. The data is divided into preseason, regular season, and postseason. This source allows you to see his progression as a player.

Player Statistics This website provides csv data sets for NBA players individual statistics. This data mainly consists of more basic statistics such as points, rebounds, assists, steals, blocks, and fouls. While these statistics aren’t very advanced, they’re very useful in evaluating NBA players at a basic level.

WeHoop This website provides data set for the WNBA and women’s CBB. This data includes Play-by-play data, which can be very useful in analysing WNBA or women’s CBB. This data would be useful if you are researching women’s basketball.

NBA Travel Data This website provides data on NBA teams travel schedules. This data includes time zones, cities, and rest days. NBA schedules have been a growing issue over the past few years, this data would allow for a researcher to analyze the NBA travel schedule.

Basketball This website provides data on games and players within the NBA. This includes games, draft, players, and teams. This data is mostly information based, not as analytically based. This website would be helpful if you’re researching NBA information.

NBA 1991-2021 This website provides csv data sets on NBA data since 1991. This data includes MVPs, teams, and players. This data set also includes player statistics for their MVP season. This website would allow you to research NBA data since 1991.

College

Sports-reference CBB Sports-reference CBB has data from college basketball since 1892. This data includes scores, leaders, tournament history, awards, and more. This page allows you filter out what conference and school you want to look at.

NCAA Basketball This website provides data on NCAA mens basketball teams. This data includes mascots, teams, play-by play data, historical games, etc. This data would be very useful in researching information about mens college basketball.

Soccer

Sports-reference Soccer Sports-reference Soccer has data soccer data going back to the 1800s. This website has data from all the major soccer leagues in the world and it allows you to filter out the league you want to view. This data includes statistics, standings, all-time leaders and more.

Who Scored Who Scored is a source that provides soccer based data. This includes live scores, offensive statistics, defensive statistics, their own player grades, and more. This source also provides information on upcoming games and events.

World Cup This source provides datasets on Fifa World Cup tournaments from 1930-2014. This data includes game stadium, result, city, etc. This data can be useful for general research about history of World Cup games.

2022 Fifa World Cup This website provides csv data sets for the 2022 Fifa World Cup. These statistics are in the form of team and individual. The data tracked includes goals, assists, yellow cards, shots, etc. This data would be helpful in researching or analyzing games and statistics from the 2022 Fifa World Cup.

Metrica Sports This github source provides tracking and event data for soccer data. The data comes in the form of a csv, as well as a glossary of the definitions for the data labels. The data can come in the form of an entire game summary, or isolated based on each team.

World FootballR This source provides information to an R package commonly used for soccer data. The package is titled ‘worldfootballR’. You can download the CRAN version of ‘worldFootballR’ and download the package of ‘worldfootballR’ that JaseZiv has already created.

Tyrone Mings Tyrone Mings uses this github source to provide data in an R package he created. The goal of this package is to help make data more easily accessible. The information on this package includes players, clubs, leagues, and market value.

Euro Soccer This website provides data sets for European Soccer. This data includes country, league, match, and team. This website also provides basic information on the players. This website would be good if you’re researching European Soccer data.

Indian Premier League This website provides csv data sets on the Indian Premier League. These data sets include match, player, season, and team. This data set has data ranging from 2008-2006. This website would be helpful if you’re researching Indian Premier League Data.

Hockey

Hockey Reference Hockey reference provides hockey data since 1917. This data includes players, statistics, records, awards and more. Hockey reference provides a way to filter out specific teams and players if you want to look at their individual data. Additional data can be found behind a paid subscription.

Hockey database This website provides several csv data sets for professional hockey. These data sets include teams, scoring, goalies, hall of fame, etc. This data covers a wide range of hockey topics. This website would be very useful in researching professional hockey data.

NHL Salaries Predictions This website provides csv data sets on NHL players used to predict their salary. This data set uses several pieces of data including goals, assists, and position to predict their salary. This website would be great to use if you’re researching NHL players salaries. These data sets can allow you to see if there is a correlation between player statistics and their salaries.

NHL Playoffs This website provides a csv data set on the Stanley Cup Playoffs from 1918-2022. These include the team, playoff win %, year, and goal differential. This website would be helpful in researching data on the Stanley Cup Playoffs.

NHL Draft This website provides a csv data set of NHL draft data. This data includes team, year overall pick, etc. This data would be helpful if you are researching NHL draft data.

NHL Play-by-Play data This website provides csv data sets on NHL play-by-play data since 2007. This data includes a description of the play to better envision what happened. This data also includes all the players on the ice from each team. This data would be good if you’re researching NHL play-by-play data.

Money Puck This website provides data on a wide variety of hockey data. This data includes playoff odds, teams, and players. This data is from 2008-2023. All of this data comes in a csv file. This website would be useful if you’re researching any hockey data over the past 15 years.

Stat Trick This website provides data on NHL games as they occur. This data includes scores, shots, and expected goals. This website also tracks high danger chances. You can also view more in depth data of each game. This website would be good if you’re researching NHL data each day.

NHL Game Data This website provides csv data sets on NHL game data. This data includes team, player, and game statistics. This data also includes venue of the game. This website would be useful if you’re researching NHL data.

Tennis

ATP World Tour This website provides several datasets related to ATP since 1877. This data includes rankings, match statistics, match scores, and tournaments. The dataset also provides information on each match such as surface and tournament. This source would be great to analyze tennis matches and tournaments.

ATP Matches This website provides csv data sets on ATP matches from 2000-2017. This data includes the tournament, seeds, winner, rank, etc. This data also includes more specific data such as games won and 1st serve %. This data would be useful if you’re researching ATP match data.

WTA Matches This website provides csv data sets on WTA matches from 2000-2016. This data includes tournament, surface, and winner. This data is very useful if you’re researching WTA match data.

Australian Open 2019 This website provides csv data sets on the 2019 Australian Open. This data includes statistics from every rally for the tournament. While this data set only includes data from the 2019 tournament, this data is very detailed. This data would be very useful if you’re researching data from the 2019 Australian Open.

Tennis Betting Odds This website provides csv data sets for mens and womens tennis. These data sets provide the betting odds entering the match. These data sets would be useful in comparing tennis match results to the betting odds.

Motor Sports

F1 Race Data This website provides csv datasets on Formula 1 races. This data includes circuits, drivers, races, results, seasons, etc. This data is very precise, it includes data such as pit stop time and drivers birth date. These datasets are very useful if you’re doing research on F1 data.

F1 World Championships This website provides csv datasets on the F1 World Championships since 1950. These datasets include drivers, lap times, results, etc. These datasets can be used for a variety of reasons involving F1 research.

MMA

Ultimate UFC Datasets This website provides csv data on UFC fights. This data includes the fighters, betting odds, winner, etc. There is also a data set that compares the winner to the betting odds. These data sets can be used for research on UFC results.

UFC Fight Statistics This website provides 3 csv datasets on UFC. One of the data sets pairs UFC fighters with their nickname. This can make it easier for newer fans to identify the fighters. The other data sets provide information on statistics for each specific fight.

MMA Grappler Github This github provides csv data sets on MMA fighters. This data includes fighters, rankings, and data on each fight. These data sets would be useful for someone doing research on MMA fighters and their rankings and fighting results.

MMA Predictions This website provides a csv data set on MMA fights compared to the closing betting odds. The data set doesn’t provide a lot of data. It just names the winner, loser, date, odds, bookmaker, and if the bookmaker was correct or not. This data would be helpful if you’re researching MMA fights compared to the betting odds.

Connor Mcgregor This website provides a csv data set on Connor Mcgregor. This data includes time, round, fighter, etc. This data set only includes 10 variables. This data set can be used to research Connor Mcgregors fight history.

UFC Refactored This website provides a csv data set of UFC fights. There are over 400 variables with this data set. This data set provides many advanced UFC statistics. You can use all of this data to find a correlation of the winner of the fight based on statistics during the fight.

UFC Data This website provides UFC data from 1993-2021. This data set has 144 variables covering a wide range of topics, including data, weight class, fighters, and referee. This data would be useful if you’re researching UFC data.

Golf

PGA Tour 2015-2022 This website provides a csv data set for the PGA Tour from 2015-2022. This data includes golfers name, hole, strokes, etc. This data can be useful if you’re hoping to research about golfers on the PGA tour.

PGA Tour Data This website provides a csv data set for the PGA Tour. This data is mostly based around golf statistics. This data includes fairway percentage, average score, and wins. This data is very useful if you are trying to research golfers from the PGA Tour based on their statistics.

PGA Tour Rankings This website provides a csv data set on PGA Tour golfers based on their rankings. This data shows rankings, events played, and points gained and lost. This is a great data set to look at trends of golfers and how their ranking has moved.

Outside the Game

NSASS National survey on sports and society

EADA EADA provides equity data for school athletics. This source allows you to compare multiple schools, view trend data, get data for a specific school, and download custom data. The data includes number of participants, number of teams, as well as financial information for the sports.

HS Participation This source provides data on high school participation in sports. The data does back to 1969. This data includes every single sport. The data is divided into several categories including state, gender, and number of schools that offer each sport.

NBA vs. WNBA salary This github provides data comparing NBA and WNBA players salaries. The data comes via a csv file, so it can easily be applied to Rstudio or any other language. This can allow for you to compare and see how NBA players are paid compared to WNBA players.

NIL Data This website provides data for colleges based on NIL revenue. NIL allows for college athletes to profit off of their name without suffering on field consequences. This data is separated into categores by sport, method of compensation, and position.

Highest Paid NIL This website provides data on the most valuable college athletes. The data provided includes their school, sport, and endorsement potential. NIL allows for college athletes to profit off of their name. This data would be useful if you are researching the most valuable college athletes.

Injuries in Sports This website provides data to sports related injuries. The data is divided into a few categories. This is by injury rates, where the injury takes place, and sport. This data was mostly focused on children ages 5-14.

Participation This website provides data on sports participation. These sports include football, baseball, basketball, cross country, and volleyball. Data was also gathered on how many years the sport has been played and how many hours a week were spent on the sport. This website would be great to use for researching participation in various sports.

Covid Participation This website provides data for sports participation during 2020. This data was mostly compared to 2019. This data allows us to see the negative impact the pandemic had on sports. The only sport to see an increase in participation was ultimate frisbee. This data would be great to use in researching sports and the pandemic.

Olympic Athletes This website provides data on all the athletes who have participated in the olympics over 120 years. The data is very broad, it just names the person, country, event, medal, etc. Nothing very specific is mentioned. This data would be useful in researching olympic athletes.

Injury Analytics This website provides data sets on injuries in sports. This website takes into account workload when providing data on the injuries. The analysis in the injuries is very advanced, it includes data such as hip mobility, groin squeeze, and rest period. This website would be good to use for advanced sports injury research analysis.

Injuries by Sport This website provides data on injuries by each sport. These sports included cycling and softball. The data also showed which body part was injured most often by each sport. This data mostly focused on people over the age of 25. This website would be helpful if you’re researching injuries in sports.

Injuries by age This website provides data on injuries by sport and by age. This data includes a wide range of sports such as ATVs, fishing, and trampolines. The age range begins at 5 and younger, and it goes up to 65 and older. This data set would be very useful if you’re trying to research sports injuries based on sport and age.

Stadiums This website provides data on MLB, NBA, NHL, NFL, and MLS stadiums. This data includes team, league, division, latitude, and longitude. This data would be helpful if you’re researching stadium locations.

NCAA Academics This website provides data on NCAA athletes 2004-2014. This data includes their sport, school, and conference. This data also provides academic scores. This data set looks at the athletes academic progress rate. This data would be useful if you’re comparing schools and their academic score.

High School Women Soccer This website provides data on womens High School Soccer participation. This data includes sport, state, year, and participation. This data allows you to compare trends of girls high school soccer participation over the years.

Female Olympians This website provides data on Female Olympians. This data includes sports, events, and percent of women’s participants. This data would be useful if you’re researching female Olympians.

Biathlon This website provides data on Olympic Biathlon from 1960-2022. The Biathlon is an olympic event that skiing and rifle shooting. This data includes athlete, country, medal, and year. This data would be helpful if you’re researching Biathlon data.