In [1]: import pandas as pd In [2]: import matplotlib.pyplot as plt # use the pandas main colors In [3]: import matplotlib as mpl In [4]: mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=["#150458", "#FFCA00", "#E70488"])
For this tutorial, air quality data about \(NO_2\) is used, made available by openaq and using the py-openaq package. The air_quality_no2.csv data set provides \(NO_2\) values for the measurement stations FR04014, BETR801 and London Westminster in respectively Paris, Antwerp and London.
air_quality_no2.csv
In [5]: air_quality = pd.read_csv("data/air_quality_no2.csv", ...: index_col=0, parse_dates=True) ...: In [6]: air_quality.head() Out[6]: station_antwerp station_paris station_london datetime 2019-05-07 02:00:00 NaN NaN 23.0 2019-05-07 03:00:00 50.5 25.0 19.0 2019-05-07 04:00:00 45.0 27.7 19.0 2019-05-07 05:00:00 NaN 50.4 16.0 2019-05-07 06:00:00 NaN 61.9 NaN
Note
The usage of the index_col and parse_dates parameters of the read_csv function to define the first (0th) column as index of the resulting DataFrame and convert the dates in the column to datetime objects, respectively.
index_col
parse_dates
read_csv
I want a quick visual check of the data.
In [7]: air_quality.plot() Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d4d843150>
With a DataFrame, pandas creates by default one line plot for each of the columns with numeric data.
I want to plot only the columns of the data table with the data from Paris.
In [8]: air_quality["station_paris"].plot() Out[8]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d0e8b8090>
To plot a specific column, use the selection method of the subset data tutorial in combination with the plot method. Hence, the plot method works on both Series and DataFrame.
plot
Series
DataFrame
I want to visually compare the \(N0_2\) values measured in London versus Paris.
In [9]: air_quality.plot.scatter(x="station_london", ...: y="station_paris", ...: alpha=0.5) ...: Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d4d843890>
Apart from the default line plot when using the plot function, a number of alternatives are available to plot data. Let’s use some standard Python to get an overview of the available plot methods:
line
In [10]: [method_name for method_name in dir(air_quality.plot) if not method_name.startswith("_") ] Out[10]: ['area', 'bar', 'barh', 'box', 'density', 'hexbin', 'hist', 'kde', 'line', 'pie', 'scatter']
In many development environments as well as ipython and jupyter notebook, use the TAB button to get an overview of the available methods, for example air_quality.plot. + TAB.
air_quality.plot.
One of the options is box, which refers to a boxplot. Also the box method is applicable on the air quality example data:
box
In [11]: air_quality.plot.box() Out[11]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d0e801a50>
For an introduction to the other plot methods, see Other plots.
I want each of the columns in a separate subplot.
In [12]: axs = air_quality.plot.area(figsize=(12, 4), subplots=True)
Separate subplots for each of the data columns is supported by the subplots argument of the plot functions. The builtin options available in each of the pandas plot functions that are worthwhile to have a look.
subplots
Some more formatting options of the pandas plot functionalities are explained in Plot Formatting.
I want to further customize, extend or save the resulting plot.
In [13]: fig, axs = plt.subplots(figsize=(12, 4)); In [14]: air_quality.plot.area(ax=axs); In [15]: axs.set_ylabel("NO$_2$ concentration"); In [16]: fig.savefig("no2_concentrations.png")
Each of the plot objects created by pandas are a matplotlib object. As Matplotlib provides plenty of options to customize plots, making the link between pandas and matplotlib explicit enables all the power of matplotlib to the plot. This strategy is applied in the previous example:
fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty matplotlib Figure and Axes air_quality.plot.area(ax=axs) # Use pandas to put the area plot on the prepared Figure/Axes axs.set_ylabel("NO$_2$ concentration") # Do any matplotlib customization you like fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing matplotlib method.
The .plot methods are applicable on both Series and DataFrames
.plot
By default, each of the columns is plotted as a different element (line, boxplot,…)
Any plot created by pandas is a matplotlib object.
Further details about indexing is provided in Visualization.