A lot of my work is done with .csv extracts (reports) from databases. As I have been programming in Clojure, I’ve received comments that relying on vector indexes creates dependencies. I understand why, and concur.
I am rewriting one of my programs to take advantage of the fact that each report’s first row contain the column headings, and I could go after each row of data I want by map key. I am rewriting some code to zipmap the headings made into map keys and one row of data (at a time) so I can access the data I want. Here is an example.
(def bene-csv-inp (fetch-csv-data "benetrak_roster.csv"))
(def bene-csv-cols (first bene-csv-inp))
(def bene-csv-data (rest bene-csv-inp))
(def zm1 (zipmap
(map #(keyword %1) bene-csv-cols)
(first bene-csv-data)))
(zm1 :EmploymentStartDate)
"21-Jun-82"
Does a higher level of extraction exist, and if so, what is it that would allow my code not to have to have to hard-code :EmploymentStartDate
? If my code has to know these keys, then how is that also not a dependency like an index?
Personally, I like going after the data with map keys, because it’s less confusing and more informative than indexes. However, I believe I still have a dependency.
Thanks.
Well, you’re no longer dependent on the order of the columns in the data, but you’re now dependent on the column names. If someone adds a column to an extract, even if it’s in the middle, you’re better off, because your code won’t break. If someone deletes or renames a column, you’ll still have an issue. If it’s more likely that someone will rename a column than delete or move it, you’d be better off sticking with indexes. I do agree that it’s usually easier to understand code that refers to (hopefully) meaningful names than raw column numbers. It should make debugging easier too, since if a column is missing you can report “No such column named ‘EmploymentStartDate'” rather than “Missing column: 23”.
1