I am creating a data base using PostgreSQL where I’m storing data coming from .txt
files.
Here is the situation:
I have 19 different sources (I call them scenarios), each scenario has two distinct state (I call them categories), so 38 different sources (scenario-category) containing different .txt
files containing data in the following format
Date Id_1 Id_2 Id_3 Id_4 Id_5 Id_6 Id_7
"01/01/2006" 0.9769 0.7730 1.9524 0.5831 3.489 598.3 0.4691
"02/01/2006" 1.3631 0.9308 2.3897 0.5989 4.479 566.1 0.4727
"03/01/2006" 1.7644 1.0639 3.3321 0.7387 5.545 547.6 0.6055
"04/01/2006" 1.3637 0.8740 2.8223 0.4926 4.670 517.6 0.4298
"05/01/2006" 0.9011 0.6898 1.8587 0.3751 3.268 485.8 0.3292
"06/01/2006" 0.6890 0.5966 1.3966 0.3225 2.568 494.3 0.2813
"07/01/2006" 0.5710 0.5395 1.1457 0.2982 2.179 550.9 0.2575
"08/01/2006" 0.4972 0.5000 0.9920 0.2861 1.934 602.8 0.2447
"09/01/2006" 0.4474 0.4704 0.8893 0.2794 1.766 605.7 0.2371
"10/01/2006" 1.2444 0.8804 2.0466 0.5396 3.986 567.9 0.4376
"11/01/2006" 2.0569 1.3623 3.8678 0.9508 6.800 522.5 0.8141
As you can see the first line is the header with the first column being date and the rest are IDs. What you are seeing here is just a small excerpt of the data file, the number of columns goes up to 1102 and the number of lines can reach thousands (e.g. 40000). The format of the data files remain the same but the physical meaning of the data is different for different files, some files represent temperature, some represent outflow, etc.
The following is the way I created my tables
-- Create the scenarios table
CREATE TABLE scenarios (
scenario_id SERIAL PRIMARY KEY,
scenario_name VARCHAR(100) NOT NULL,
description VARCHAR(255)
);
-- Create the categories table
CREATE TABLE categories (
category_id SERIAL PRIMARY KEY,
category_name VARCHAR(100) NOT NULL
);
-- Create the data table
CREATE TABLE data (
data_id SERIAL PRIMARY KEY,
date DATE NOT NULL,
scenario_id INTEGER NOT NULL REFERENCES scenarios(scenario_id),
category_id INTEGER NOT NULL REFERENCES categories(category_id),
id_name VARCHAR(50) NOT NULL
);
-- Create the temperature_values table
CREATE TABLE temperature_values (
temperature_value_id SERIAL PRIMARY KEY,
data_id INTEGER NOT NULL REFERENCES data(data_id),
value FLOAT NOT NULL
);
-- Create the pressure_values table
CREATE TABLE pressure_values (
pressure_value_id SERIAL PRIMARY KEY,
data_id INTEGER NOT NULL REFERENCES data(data_id),
value FLOAT NOT NULL
);
-- Create the etp_values table
CREATE TABLE etp_values (
etp_value_id SERIAL PRIMARY KEY,
data_id INTEGER NOT NULL REFERENCES data(data_id),
value FLOAT NOT NULL
);
-- Create the outflow_values table
CREATE TABLE outflow_values (
outflow_value_id SERIAL PRIMARY KEY,
data_id INTEGER NOT NULL REFERENCES data(data_id),
value FLOAT NOT NULL
);
Some comments on the table creation.
The IDs remain the same for all the data, for instance for Id_10 I can have different values in different files (temperature, outflow, etc.) the same is applicable to the dates, for a specific date I can have different type of data (temperature, outflow, etc.).
Now the question is how can I load my data into the tables in an efficient way, at this point I’m only interested in loading one file, only and only one .txt
file let’s say outflow into the outflow_values
table. How am I supposed to do that?
P.S.
I’m not using pgAdmin, I’m working on Linux (debian based) and I use command line.