I have a use-case where I will regularly receive data in the same format and awnt to quickly compare individual distributions of each variable as well as pairwise / bivariate distributions. I originally was using sns.pairplot out of box, but I realized the scatter plots are not great for my application: I sometimes have different amounts of data, and the bounds of the data will always be different. So instead, I want to create all histograms with consistent bin sizes selected based on my knowledge of the variable.
I found a great answer which helped me use sns.pairgird to do a custom histogram function along the diagonal with the bins I want (/a/56387759). Now I’d like to do the same with the lower joint distributions, and I’m not sure how to do this, since the individual function gets handed the data without knowledge of which variable it corresponds to. Here’s the example code below, based on the answer linked above.
iris = sns.load_dataset("iris", cache=True)
col_list = ['petal_length', 'petal_width', 'sepal_length', 'sepal_width']
cols = iter(col_list)
bins = {'sepal_length' : 10, 'sepal_width' : 5,
'petal_length' : 35, 'petal_width' : 12}
def myhist(x, **kwargs):
b = bins[next(cols)]
plt.hist(x, bins=b, **kwargs)
def pairgrid_heatmap(x, y, **kws):
# how to retrieve correct bins here, given only x,y?
cmap = sns.light_palette(kws.pop("color"), as_cmap=True)
plt.hist2d(x, y, cmap=cmap, cmin=1, **kws)
g = sns.PairGrid(iris, vars=col_list)
g = g.map_diag(myhist)
g = g.map_offdiag(pairgrid_heatmap)
plt.show()