I’m writing code to perform exploratory dataset analysis and as part of this I would like to plot some of the variables in the dataset. I would like to have a function to generate the plot objects that then can be called upon and displayed as needed in the Jupyter Notebook. In R I can something like this:
# install.packages("tidyverse")
# install.packages("ggpubr")
supress_all <- function(e) {suppressPackageStartupMessages(suppressWarnings(e))}
supress_all(library(tidyverse))
supress_all(library(ggpubr))
# Adjust size of the plots in jupyter
options(repr.plot.width = 10, repr.plot.height = 4)
make_me_a_plot <- function(data, x_name, y_name) {
res <- ggplot() +
geom_point(aes(x = data[[x_name]], y = data[[y_name]])) +
labs(title = paste0(x_name, " vs ", y_name), x = x_name, y = y_name)
return(res)
}
p1 <- make_me_a_plot(mtcars, "mpg", "hp")
p2 <- make_me_a_plot(mtcars, "mpg", "wt")
p3 <- make_me_a_plot(mtcars, "mpg", "qsec")
Then when I want to call my plots I can do something like this.
# Plot just 2 plots and ignore the last one generated - p3
ggarrange(p1, p2, ncol = 2, nrow = 1)
This will look something like this in Jupyter.
Plots p1
, p2
, p3
are still available and can be used multiple times or modified on the fly. How do I achieve the same thing in Python? Example code that is NOT working below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
plt.ioff()
mtcars = sns.load_dataset('mpg')
x_name = 'mpg'
y_name = 'horsepower'
def make_me_a_plot(data, x_name, y_name):
res = plt.scatter(x=data[[x_name]], y=data[[y_name]])
return res
p1 = make_me_a_plot(mtcars, 'mpg', 'horsepower')
p2 = make_me_a_plot(mtcars, 'mpg', 'weight')
p2 = make_me_a_plot(mtcars, 'mpg', 'acceleration')
# What next? plt.show() will just draw all of the plots on the same figure.
Calling plt.show()
will just show ALL plots. Not what is desired.