Here’s my data structure:
betterTrue = [random.randint(0,1) for x in range(500)]
betterFalse = [(x + 1) % 2 for x in betterTrue]
data = {
"model": ["A" for x in range(500)] + ["B" for x in range(500)],
"safety": [random.randint(0,5) for x in range(1000)],
"honesty": [random.randint(0,5) for x in range(1000)],
"quality": [random.randint(0,5) for x in range(1000)],
"better": betterTrue + betterFalse
}
I’d like to generate count plots comparing each model’s performance in each of the safety
, honesty
, quality
, and better
columns. For the first three, the data comes in integer values from 0 to 5, and for better
, the data is either 0
or 1
.
But for the first three columns, I only care if the data point is greater than or equal to 3 or less than 3. Is there a way to generate a count plot which throws the data into two bins >= 3
and < 3
?
For reference, this is what it looks like when we don’t do that, and instead just discretely bin by each possible value
fig = sns.countplot(x = 'safety', hue='model', data=data, stat='count')