I wish to display a barchart over a time series canvas, where the bars have width that match the duration and where the edges connect the first value with the last value. In other words, how could I have slanted bars at the top to match the data?
I know how to make barcharts using either the last value (example 1) or the first value (example 2), but what I’m looking for are polygons that would follow the black line shown.
Example 1
Example 2
Code:
import pandas as pd
from pandas import Timestamp
import datetime
import matplotlib.pyplot as plt
import numpy as np # np.nan
dd = {'Name': {0: 'A', 1: 'B', 2: 'C'}, 'Start': {0: Timestamp('1800-01-01 00:00:00'), 1: Timestamp('1850-01-01 00:00:00'), 2: Timestamp('1950-01-01 00:00:00')}, 'End': {0: Timestamp('1849-12-31 00:00:00'), 1: Timestamp('1949-12-31 00:00:00'), 2: Timestamp('1979-12-31 00:00:00')}, 'Team': {0: 'Red', 1: 'Blue', 2: 'Red'}, 'Duration': {0: 50*365-1, 1: 100*365-1, 2: 30*365-1}, 'First': {0: 5, 1: 10, 2: 8}, 'Last': {0: 10, 1: 8, 2: 12}}
d = pd.DataFrame.from_dict(dd)
d.dtypes
d
# set up colors for team
colors = {'Red': '#E81B23', 'Blue': '#00AEF3'}
# reshape data to get a single Date | is there a better way?
def reshape(data):
d1 = data[['Start', 'Name', 'Team', 'Duration', 'First']].rename(columns={'Start': 'Date', 'First': 'value'})
d2 = data[['End', 'Name', 'Team', 'Duration', 'Last']].rename(columns={'End': 'Date', 'Last': 'value'})
return pd.concat([d1, d2]).sort_values(by='Date').reset_index(drop=True)
df = reshape(d)
df.dtypes
df
plt.plot(df['Date'], df['value'], color='black')
plt.bar(d['Start'], height=d['Last'], align='edge',
width=list(+d['Duration']),
edgecolor='white', linewidth=2,
color=[colors[key] for key in d['Team']])
plt.show()
plt.plot(df['Date'], df['value'], color='black')
plt.bar(d['End'], height=d['First'], align='edge',
width=list(-d['Duration']),
edgecolor='white', linewidth=2,
color=[colors[key] for key in d['Team']])
plt.show()
4
I would take the following approach:
- Make a regular bar plot using, for each bar, the maximum extent of its two height values (this to get the correct y axis limits set automatically).
- Convert all bars of the bar plot, which are
matplotlib.patches.Rectangle
instances, tomatplotlib.patches.Polygon
instances, adjust the necessary corners and copy over all other attributes (color, line width, etc.). - In the plot, replace the rectangle bars with the polygon bars.
The function polybar()
in the following code achieves this (it also allows for **kwargs
to be passed through to plt.bar()
:
from matplotlib.patches import Polygon
import matplotlib.pyplot as plt
import numpy as np
def polybar(x, y_left, y_right, **kwargs):
def poly_from(rect, yl, yr):
(x, y), w = rect.get_xy(), rect.get_width()
p = Polygon([(x, y), (x + w, y), (x + w, yr), (x, yl)], closed=True)
p.update_from(rect) # Copy over properties from rectangle
return p
ax = plt.gca()
# Create regular bar plot with maximum y extent
height = np.where(np.abs(y_left) > np.abs(y_right), y_left, y_right)
bars = ax.bar(x, height, **kwargs)
ylim = ax.get_ylim()
# Convert rectangle bars to polygon bars
polys = [poly_from(*blr) for blr in zip(bars, y_left, y_right)]
# Replace rectangle bars with polygon bars
for bar in bars:
bar.remove()
for poly in polys:
ax.add_patch(poly)
ax.set_ylim(ylim)
In your own code, you could use this as follows:
# TODO: Prepend imports and `polybar()` from above here
import pandas as pd
from pandas import Timestamp
dd = {'Name': {0: 'A', 1: 'B', 2: 'C'}, 'Start': {0: Timestamp('1800-01-01 00:00:00'), 1: Timestamp('1850-01-01 00:00:00'), 2: Timestamp('1950-01-01 00:00:00')}, 'End': {0: Timestamp('1849-12-31 00:00:00'), 1: Timestamp('1949-12-31 00:00:00'), 2: Timestamp('1979-12-31 00:00:00')}, 'Team': {0: 'Red', 1: 'Blue', 2: 'Red'}, 'Duration': {0: 50*365-1, 1: 100*365-1, 2: 30*365-1}, 'First': {0: 5, 1: 10, 2: 8}, 'Last': {0: 10, 1: 8, 2: 12}}
d = pd.DataFrame.from_dict(dd)
colors = {'Red': '#E81B23', 'Blue': '#00AEF3'}
def reshape(data):
d1 = data[['Start', 'Name', 'Team', 'Duration', 'First']].rename(columns={'Start': 'Date', 'First': 'value'})
d2 = data[['End', 'Name', 'Team', 'Duration', 'Last']].rename(columns={'End': 'Date', 'Last': 'value'})
return pd.concat([d1, d2]).sort_values(by='Date').reset_index(drop=True)
df = reshape(d)
polybar(d['Start'], d['First'], d['Last'], align='edge',
width=list(+d['Duration']),
edgecolor='white', linewidth=2,
color=[colors[key] for key in d['Team']])
plt.show()
The result looks as follows:
Some edge cases are currently not covered. These are the ones that I am aware of:
- The conversion assumes that the
bottom
values of the bar plot are all zeros. To handle other values, the rectangle-to-polygon conversion would need to be adjusted. - The setting of the y limits is not correct in the case that, for one and the same bar,
y_left
is the overall largest negative value andy_right
is the overall largest positive value (or vice versa). - I did not test whether the code works with units.
2
You can use Matplotlibs Axes.fill_between
to generate these types of charts. Importantly this will accurately represent the
gap between your rows where they exist, whereas the approach with the bars will
make that gap appear to be wider than they truly are unless you set the edgewidth
of the bars to 0.
Additionally, for your data transformation this is a pandas.lreshape
which is similar to performing multiple melts operations at the same time.
import pandas as pd
from pandas import Timestamp
import matplotlib.pyplot as plt
dd = pd.DataFrame({
'Name': ['A', 'B', 'C'],
'Start': pd.to_datetime(['1800-01-01', '1850-01-01', '1950-01-01']),
'End': pd.to_datetime(['1849-12-31', '1949-12-31', '1979-12-31']),
'Team': ['Red', 'Blue', 'Red'],
'Duration': [50*365-1, 100*365-1, 30*365-1],
'First': [5, 10, 8],
'Last': [10, 8, 12]
})
df = (
pd.lreshape(dd, groups={'Date': ['Start', 'End'], 'Value': ['First', 'Last']})
.sort_values('Date')
)
colors = {'Red': '#E81B23', 'Blue': '#00AEF3'}
fig, ax = plt.subplots()
for team in df['Team'].unique():
ax.fill_between(
df['Date'],
df['Value'],
where=(df['Team'] == team),
color=colors[team],
linewidth=0,
)
ax.set_ylim(bottom=0)
plt.show()
5