I encountered an unexpected discrepancy while working on a statistics problem using SQLite3. I have a dataset from a CSV file containing seasonal energy efficiency ratios (SEER). My objective was to find the number of occurrences of values greater than or equal to 21.
Here is the data from my .csv file:
SEER |
---|
24 |
40 |
26 |
32 |
13 |
26 |
26 |
28 |
29 |
17 |
25 |
10 |
23 |
22 |
13 |
31 |
31 |
36 |
25 |
27 |
35 |
34 |
17 |
39 |
18 |
21 |
23 |
22 |
26 |
31 |
26 |
25 |
22 |
21 |
34 |
26 |
29 |
24 |
27 |
27 |
24 |
30 |
38 |
31 |
32 |
9 |
26 |
19 |
24 |
29 |
28 |
26 |
19 |
29 |
39 |
26 |
30 |
32 |
28 |
30 |
22 |
25 |
27 |
29 |
24 |
30 |
29 |
20 |
30 |
25 |
26 |
29 |
27 |
25 |
20 |
18 |
15 |
25 |
23 |
27 |
23 |
27 |
24 |
28 |
27 |
40 |
27 |
12 |
30 |
24 |
34 |
33 |
29 |
22 |
40 |
32 |
27 |
35 |
23 |
39 |
I attempted to solve this using SQLite3 with the following queries:
sqlite3 heat_data.db
.mode csv
.import ex02-115heat.csv heat
.schema
SELECT COUNT(*) FROM heat WHERE SEER >= 21;
That last query returned 87. To verify this result was correct (and since I am very inexperienced with SQL), I used Excel’s ‘COUNTIF’ function, expecting it to return 87 as well. Surprisingly, Excel returned 86. Doubting the discrepancy, I manually counted the occurrences and indeed found 86.
I then consulted with ChatGPT, which also confirmed the count as 86.
I’m puzzled by this inconsistency. Are there any issues with my SQL queries or could there be another explanation for this discrepancy?
Samson Wheelock is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.