Pandas example

In [1]:
import pandas as pd
%matplotlib inline

Pandas can load data from many sources, such as text files, csv, excel. Here we load data from a web page.

In [2]:
tables = pd.read_html("https://www.basketball-reference.com/leagues/NBA_2018_games-november.html")

tables contains a list with all the HTML tables found in the web page (in this case just one).

In [3]:
games = tables[0]
# games.to_csv("games.csv")  # <- this can save the table in CSV format
games.head()
Out[3]:
Date Start (ET) Visitor/Neutral PTS Home/Neutral PTS.1 .1 Attend. Notes
0 Wed, Nov 1, 2017 7:30 pm Sacramento Kings 86 Boston Celtics 113 Box Score NaN 18624 NaN
1 Wed, Nov 1, 2017 7:00 pm Milwaukee Bucks 121 Charlotte Hornets 126 Box Score NaN 15655 NaN
2 Wed, Nov 1, 2017 7:00 pm Indiana Pacers 124 Cleveland Cavaliers 107 Box Score NaN 20562 NaN
3 Wed, Nov 1, 2017 9:00 pm Toronto Raptors 111 Denver Nuggets 129 Box Score NaN 14072 NaN
4 Wed, Nov 1, 2017 10:30 pm Dallas Mavericks 98 Los Angeles Clippers 119 Box Score NaN 13487 NaN
In [4]:
# Columns renaming
games.columns = ['date', 'start', 'away_team', 'away_points', 'home_team', 'home_points',
                'd1', 'd2', 'attend', 'd3']
games.head()
Out[4]:
date start away_team away_points home_team home_points d1 d2 attend d3
0 Wed, Nov 1, 2017 7:30 pm Sacramento Kings 86 Boston Celtics 113 Box Score NaN 18624 NaN
1 Wed, Nov 1, 2017 7:00 pm Milwaukee Bucks 121 Charlotte Hornets 126 Box Score NaN 15655 NaN
2 Wed, Nov 1, 2017 7:00 pm Indiana Pacers 124 Cleveland Cavaliers 107 Box Score NaN 20562 NaN
3 Wed, Nov 1, 2017 9:00 pm Toronto Raptors 111 Denver Nuggets 129 Box Score NaN 14072 NaN
4 Wed, Nov 1, 2017 10:30 pm Dallas Mavericks 98 Los Angeles Clippers 119 Box Score NaN 13487 NaN
In [5]:
# Drop unused columns and convert dates from string to datetime
games = (games.drop(columns=['d1', 'd2', 'd3'])
        .assign(date=lambda x: pd.to_datetime(x['date'], format="%a, %b %d, %Y")))
games.head()
Out[5]:
date start away_team away_points home_team home_points attend
0 2017-11-01 7:30 pm Sacramento Kings 86 Boston Celtics 113 18624
1 2017-11-01 7:00 pm Milwaukee Bucks 121 Charlotte Hornets 126 15655
2 2017-11-01 7:00 pm Indiana Pacers 124 Cleveland Cavaliers 107 20562
3 2017-11-01 9:00 pm Toronto Raptors 111 Denver Nuggets 129 14072
4 2017-11-01 10:30 pm Dallas Mavericks 98 Los Angeles Clippers 119 13487
In [6]:
# Univariate analysis: histogram of away_points
# Note that pandas uses matplotlib under the hood
games.hist(column='away_points')
Out[6]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fcaa253c9e8>]], dtype=object)
In [7]:
# Distributions of home_points vs. away_points
games.boxplot(['home_points', 'away_points'])
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcaa256c710>
In [8]:
# Attendees over time for the home games of the Dallas Mavericks
games.loc[games['home_team'] == "Dallas Mavericks"].plot(x='date', y='attend')
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcaa258eba8>
In [9]:
# Group-by example
games.groupby(['home_team']).mean()
Out[9]:
away_points home_points attend
home_team
Atlanta Hawks 109.000000 105.875000 14153.750000
Boston Celtics 96.125000 103.875000 18624.000000
Brooklyn Nets 115.250000 114.000000 16301.250000
Charlotte Hornets 109.166667 111.333333 17223.000000
Chicago Bulls 105.000000 98.400000 20683.000000
Cleveland Cavaliers 111.142857 113.000000 20562.000000
Dallas Mavericks 99.625000 98.750000 19898.000000
Denver Nuggets 105.888889 114.666667 16452.666667
Detroit Pistons 103.142857 109.857143 17258.142857
Golden State Warriors 99.142857 118.000000 19596.000000
Houston Rockets 105.625000 119.375000 17517.250000
Indiana Pacers 109.333333 106.666667 15518.333333
Los Angeles Clippers 110.833333 109.333333 16038.833333
Los Angeles Lakers 111.571429 115.142857 18997.000000
Memphis Grizzlies 102.500000 94.833333 15918.000000
Miami Heat 102.750000 97.250000 19626.000000
Milwaukee Bucks 96.750000 98.750000 15669.250000
Minnesota Timberwolves 100.750000 106.000000 16409.750000
New Orleans Pelicans 107.714286 107.714286 16079.428571
New York Knicks 100.909091 108.090909 18854.363636
Oklahoma City Thunder 96.666667 104.000000 18203.000000
Orlando Magic 107.666667 97.666667 18316.333333
Philadelphia 76ers 105.875000 112.875000 20607.625000
Phoenix Suns 115.222222 106.333333 16593.111111
Portland Trail Blazers 97.125000 100.125000 19253.625000
Sacramento Kings 100.142857 97.428571 17583.000000
San Antonio Spurs 99.200000 107.100000 18289.500000
Toronto Raptors 104.500000 111.666667 19800.000000
Utah Jazz 97.777778 103.555556 17053.666667
Washington Wizards 105.625000 108.000000 17196.250000