Somebody explained to me that since in agile development, policy and the application logic should be more important than details such as persistence method, persistence decision should be taken at the end. So it might be a good idea to start with simpler persistence such as flat files, until we reach the point the weakness of this method become apparent, and only then we change the persistence, e.g. by using relational database.
Is this true, or did I misunderstood the concept? Is this how agile team usually build application? What are the rationales for it and when I should not take this approach?
1
The concept being conveyed is something which is definitely part of agile and relevant, the idea of pushing things off to the last responsible moment.
However the example taken actually relies on a completely false assumption to begin with:
that it is easier/less work to implement a flat-file database than use an RDBMS. — Often completely false
The example should be: The persistence layer is kept to the simplest possible implementation until a decision is made that it is inadequate.
For many development teams, getting a database stood up to do this is a matter of an hour or two (or 15 minutes for small database with ORM) while a flat-file database that they don’t keep needing to meddle with may be an enormous pain and annoyance because they have to hand-write all the seek and data-table construction type code manually, when a database may be as simple as creating a table in a UI, adding a few columns, and then having an ORM generate everything else you need.
Furthermore, if you don’t know your persistence layer to begin with it is an even more appropriate act to start with a common RDBMS that your team knows well, as to make the change later from flat-file to RDBMS is much larger than later changing from one RDBMS to another. There are many tools for converting from most any common RDBMS to other such ones, and tips and such because it is a well-travelled path. There are extremely few tools for converting from a flat-file to any given RDBMS where your flat-file has some proprietary format that tooling hasn’t previously been written for aside from your own libraries.
In summary:
The concept is correct and accurate, but the example is based on a terribly large and often(almost always) inaccurate assumption.
2
Since you know you will be using a DB, there’s not much point in writing code to handle flat files. You might get through a few iterations using some read-only CSVs but you’ll quickly find yourself writing code that you know you will throw away.
One thing you can do to simplify the initial complexity by using something like SQLite (a library that gives you a serverless SQL database that’s stored in a file). It’s got a flexible type system so you don’t have to seriously commit to a schema, no server to configure/connect to & rebuilding your DB is as simple as deleting a file. It also allows you to simply include the DB image along with the code in version control if needed.
4
It’s an example, used to demonstrate a concept, rather than a concept in and of itself.
The concept is that you don’t make an architectural decision until the last responsible moment, but no later. But, in reality, you often do have a decision on the database you’re going to use very early on. That might not be perfect, but it’s a fact.
Once that decision is made, you don’t actively avoid it. Storing something in an existing database is often as easy as storing it in a flat file.
But changing from MySql on Linux to SQL Server on Windows might not be as simple as changing from a flat file anywhere to SQL Server on Windows. That’s the real point. So, while there’s doubt as to the final solution, take the simplest possible option, accounting for change. Once a decision is made, stick to it.
2
What are you persisting?
I am on an agile team and for one application, we persist almost everything to the database. Mind you, if we didn’t then there wouldn’t be much for this application to do – persisting things to a database is a large part of its raison d’être.
So: What are you persisting and what does your application do? If the application doesn’t actually care where its data is persisted, then you can write a small layer that makes the decision (that decision can be stored in a config file somewhere) of flat-file vs. database. I think you need to evaluate your application and your data and decide if it makes sense in your specific case to invest time in database persistence, or just read/write flat files (which will probably be faster in terms of development time).
4
A lot of people misconstrue that aspect of agile as meaning they shouldn’t plan ahead for future features. Nothing could be further from the truth. What you don’t do is allow planning for future features to delay delivering value to your customers now.
How that applies to something like persistence depends very much on your application. One of my current hobby projects is a calculator. Eventually, I would like to be able to store user defined constants and functions, and save the state when I close the program. That requires persistence, but I haven’t even started to think about what form that would take. My program will be very usable without persistence, and adding it now will add significant delay to my first release. I would rather have a working calculator with fewer features than none at all while I wait for it to be perfect.
Another hobby project of mine is for video and photograph color correction. This application will be completely unusable without being able to save my work in progress, and the code needed to do so is pervasive throughout the entire program. If I don’t get it right from the start, then changing it could be prohibitively difficult, so I’ve spent quite a bit of effort on persistence up front.
Most projects would fall somewhere between, but you should never feel bad about planning for future features if it adds little to no extra effort now. Databases may be complex, but most of that complexity is hidden behind a solid, well-tested interface. The work you will have to do in your application may very well be less for a database than a flat file, because of all the features a database gives you for free. There are “hybrid” options like SQLite if you don’t want to deal with the hassle of a database server yet.
1
I think you are putting focus on the wrong values. In agile, business value is in focus. You create a product in order to deliver business value to some end users.
If you create the persistence layer late, or make it up along the way is your strategy for delivering business value to the customer. I don’t believe that the term “agile” itself dictates if you should do one or the other.
The point of view about deferring data storage strategy is advocated in this presentation by Robert C. Martin (one of the authors of the agile manifesto).
It is a very good presentation, I can recommend that you watch it.
But I disagree with it! At least to a degree.
I do not believe that you can call a user story for “Done”, if the user story involves data that should be persisted, and you don’t actually have any type of persistence implemented.
If the product owner decides that now is the time to go live, you are unable to do that. And if you haven’t started implementing persistence until late in the project, you also have no information about how long it would take to implement the persistence layer, leaving it a major project risk.
The agile projects I have worked on have not deferred the data access strategy. But it has been decoupled, allowing us to change it along the way. And the entire database schema is not designed up front. Tables and columns are created along the way as they are required in order to implement the user stored that, in the end, deliver business value.
It takes good judgement and experience to see what questions need answering first when embarking on a new project.
If the end product is still unknown then building/prototyping quickly well help you figure that out, and yes iterating in an agile way will help. That of course will introduce risk such as finding out late in the process that the persistence implementation is going to take longer than you communicated to your stakeholders.
If the end product is well known then understanding how persistence will work in your application could be more important to know early on. The risk there is that find problems with your product specification later in the development process.
Flat files are not simple!
They allow storage and that is all. The structure of the data, access paths etc. are all up to you, and, there are numerous ways to get this wrong.
There are reasons databases exist and one of them is to make things simpler for developers.
Most of my development is done for large systems within large companies. We always have a complete and well thought out data model before we procede with any further design or development. A data model helps you understand your problem and allows for a clean implementation.
Forgotten data items, mismatched data structures and other nightmares can all be avoided by producing a data model up front.
You can leave your actual choice of database technology till after the data model. But most “persistence” technologies are closely bound programming and even design styles. If you code for a relational database, and later decide to switch to a cloudy key value thingy you will need to completely re-write half your code.
If you start with flat files and switch to a relational DB you will probably end up throwing away half your code as your developers will have wasted their time implementing a piss poor database.
Should you try persistence in a flat file before database?
Yes, you should try it. Especially if you’ve never tried it before. No matter how it turns out, you’ll learn something.