I am trying to run this query and it takes around 76 seconds without any index, now I added an index in it and now it takes around 80 seconds.
This table has 24k records as of now.
Here is the DDL :
CREATE TABLE personal_details (
pno VARCHAR(20) PRIMARY KEY, -- Personal number or unique ID
first_name VARCHAR(20), -- First name
middle_name VARCHAR(20), -- Middle name (nullable if not always used)
last_name VARCHAR(20), -- Last name
dob DATE, -- Date of birth (YYYY-MM-DD format)
gender CHAR(1), -- Gender (M/F/O)
email VARCHAR(50), -- Email address
phone_number VARCHAR(20), -- Phone number (variable format, max 20 chars)
address VARCHAR(255), -- Address (full address string)
city VARCHAR(50), -- City
state VARCHAR(50), -- State
zip_code VARCHAR(10), -- Postal or zip code
country VARCHAR(50), -- Country
marital_status VARCHAR(20), -- Marital status (Single/Married/etc.)
nationality VARCHAR(50), -- Nationality
occupation VARCHAR(50), -- Job title or occupation
salary DECIMAL(10, 2), -- Salary (up to 10 digits, 2 decimal places)
hire_date DATE, -- Hire date (YYYY-MM-DD format)
department VARCHAR(50), -- Department name
is_active BOOLEAN -- Status flag for active/inactive (True/False)
);
Here is the query that takes 76 seconds:
SELECT
pd1.pno AS pno1,
pd1.first_name AS first_name1,
pd1.last_name AS last_name1,
pd2.pno AS pno2,
pd2.first_name AS first_name2,
pd2.last_name AS last_name2,
pd3.pno AS pno3,
pd3.first_name AS first_name3,
pd3.last_name AS last_name3
FROM
personal_details pd1
JOIN
personal_details pd2 ON pd1.city = pd2.city
JOIN
personal_details pd3 ON pd2.state = pd3.state
WHERE
pd1.dob BETWEEN '1980-01-01' AND '1990-12-31'
AND pd2.gender = 'M'
AND pd3.salary > 50000
ORDER BY
pd1.pno, pd2.pno, pd3.pno;
Here are the indexes that I created (tried and tested with few )
CREATE INDEX idx_pno ON personal_details (pno);
CREATE INDEX idx_composite_query ON personal_details (city, state, dob, gender, salary);
CREATE INDEX idx_city ON personal_details (city);
CREATE INDEX idx_state ON personal_details (state);
CREATE INDEX idx_dob ON personal_details (dob);
CREATE INDEX idx_gender ON personal_details (gender);
CREATE INDEX idx_salary ON personal_details (salary);
Even after creating these indexes the query took a lot of time( around 80 seconds which is more time than the normal query)
I am trying to understand how indexes work, that why I created this query which self-joins itself, I was expecting the execution time to go down but it increased.
I tried creating single indexes, and composite indexes but all gave me almost the same results.
Tried reading a few articles and made changes according to it but to no avail, Kind of lost here.
I also tried to Analyze the query using EXPLAIN ANALYZE
EXPLAIN
"-> Sort: pd1.pno, pd2.pno, pd3.pno (actual time=19130..19461 rows=2.25e+6 loops=1)
-> Stream results (cost=346436 rows=105833) (actual time=129..1905 rows=2.25e+6 loops=1)
-> Inner hash join (pd3.state = pd2.state) (cost=346436 rows=105833) (actual time=129..634 rows=2.25e+6 loops=1)
-> Filter: (pd3.salary > 50000) (cost=50.4 rows=813) (actual time=0.0498..23.2 rows=22872 loops=1)
-> Table scan on pd3 (cost=50.4 rows=24394) (actual time=0.0433..19.1 rows=24000 loops=1)
-> Hash
-> Nested loop inner join (cost=7124 rows=3905) (actual time=0.105..62.5 rows=4916 loops=1)
-> Filter: ((pd2.gender = 'M') and (pd2.city is not null)) (cost=2633 rows=2439) (actual time=0.058..20.3 rows=7968 loops=1)
-> Table scan on pd2 (cost=2633 rows=24394) (actual time=0.0541..17.6 rows=24000 loops=1)
-> Index lookup on pd1 using idx_composite_query (city=pd2.city), with index condition: (pd1.dob between '1980-01-01' and '1990-12-31') (cost=0.4 rows=1.6) (actual time=0.00432..0.00515 rows=0.617 loops=7968)
"
This is what was shown to me. This query is just for educational purpose for my self learning
3