I’m going to make an application that will be given data to put in a database. The data will for the most part be the same, but the way it is formatted will vary a lot (could be in anything from text files to .xls to .doc).
I’m not a very experienced developer, but I can see some potential issues and I want to minimize them.
First off I have decided to use the DAO pattern, so that I can easily support new file formats or file suddenly formatted in different ways.
What I really wonder about though, is how I should manage the data itself within my application. I’m thinking that the database DAO should have models representing each table of the database with the same relations between them, to make the uploading process easy. But should the filesystem DAO’s have to use the same models? I can imaging that when the database changes, the change will suddenly propagate throughout the entire system, all DAOs and models alike. And that is obviously a bad thing.
I’m a little bit tired and out of time. Will update with what ever questions you have.
Thanks!
I believe that it may actually be better to abstract the organization of your database away from the organization of the code. If you mimic your database design, this will create problems when you will have to make changes to the data storage, for example if you later need some additional information like metadata, etc. So I would go for separating your objects from the databse plus using JPA or Hibernate to persist them into the database.
Also, why use a huge single object? I think it might be better to use some some basic class where you can concentrate your general-purpose code and then represent each file format by a different subclass of the basic class to allow for better code organization, flexibility and maintainability.
Are you intending to manipulate the data in the database? If the answer is not an immediate yes, then store files as a binary blob.
If the answer is still a yes, then you may still want to store the original files as a blob. But also try to break out an appropriate data structure for your analysis, and try to automate extraction of that from the original file. Keep track of when automation had trouble, and what trouble it had. Because you always have the original, when you figure out how to address that problem, you can reanalyze that file properly.
1