I’m looking for some information on how to increase the performance of reading large text files for use in an ETL process.
We’re currently using a proprietary program that is more of a scripting language that gets converted into C++ and compiled into an executable. The only redeeming factor that this program has is it’s ability to cycle through a 19gig file and populate 250+ fields very quickly.
Where it bogs down is the transformations which, due to the scripting nature of it, tend to be inefficient, cumbersome, and difficult to maintain.
I’m able to parse the text file using .NET (VB or C#) but I’m unable to even come close to the efficiency of the C++ for reading the file, literally hours vs minutes.
I also considered using the data warehouse which loads the same extract I’m reading but for some reason the data warehouse team created one large table that contains millions of rows and all 250+ fields. Since I need about 150 or so of the fields for the transformations, the queries are excessively slow.
7