I need a key-value storage in a simplest form we can think of.
Keys should be some fixed-length strings, values should be some texts.
This key-value storage should have an HTTP-backed API.
That’s basically it. As you can see, there is no big difference between such storage and some web application with some upload functionality.
The thing is – it’ll take few hours (including tests and coffee drinking) to write something like this.
“Something like this” will be fully under my control and can be tuned on demand.
Should I, in this specific case, not try to reinvent bicycles? Is it better to use some of existing NoSQL solutions. If yes, which one exactly?
If, say, I’d needed something SQL-like, I won’t ask and won’t try to write something by myself. But with NoSQL I just don’t know what is adequate and what is not.
6
Keep in mind, once you accumulate enough data, your simple home-grown approach will be very slow when it comes to retrieval unless you then implement some sort of indexing system. So if thats a potential issue, I’d stick with using a dbms.
In addition to using a NoSQL server, you can also use an SQL database to store key pairs. There’s nothing inherent in that type of data that prevents that. So if you already have MySQL running, that might be the simpler option.
Yes – if you can’t find a solution that solves your problem exactly, then I would consider writing my own. As other have mentioned: consider scalability. If you foresee that it will need to scale big very soon, then perhaps it might be worth the investment to set something up that is designed to handle this.
One other thing I would mention, is I would make it as OO friendly as possible. That is, I would have a base key/value (abstract) class/interface, and extend your own key/value implementation on top of this. This is purely because later on down the line, you will find it a lot easier to (say) backup your key/value with memcached if you do it this way, or even swap it out entirely – because the interface will be the same. If you have global functions for get
and set
then that would seem equivalent, but you still have the issue of how quickly you can hotswap the implementation without downtime.
I would make a quick review of what is out there, if you find one of the current NoSQLs get the job done then I would use one of them. Not because you couldn’t write something yourself fairly quickly but because it allows you to offload the support burden. If all you need is key value storage one of Dynamo’s children(Cassandra, Riak, …), or redis is probably the right tool for the job.
1
The problem seems basic enough to be done yourself without too much pain. Of course, you need testing, but it remains reasonable compared to, say, authentication.
When the wheel is easy, don’t reinvent the wheel:
-
If you already know a solution which fits exactly your needs. By “know”, I mean that you have already tried the product, know that it is easy to install, know the possible caveats, etc.
-
Or if you expect adding more features in the future, using a enterprise-scale product which provides more features than you need right now can be a better solution. Redis, by the way, comes into mind as an excellent key-value store solution.
If you don’t know any existent solution and you’re sure that you won’t need to add features later, it would be probably faster to make your own key-value store rather than finding an existent one which will work in your context, on your platform, etc.
I think you’re actually bikeshedding here. Whatever it is that you are doing, in all likelihood this question is way beside the point.
- As you’ve said yourself, the storage will have an HTTP based API. Define that first.
- Use the most simple implementation you could (e.g. on top of redis). If this takes longer than an hour, you’re doing it wrong.
- Write the actual application that consumes the data through said API.
- Add reasonably realistic example data.
- Measure performance under load. If the performance is satisfactory, you are done.
- Use the information you’ve collected to make a better implementation of the API.
- Go to 5.
As for “Isn’t XYZ
overkill?”: Probably not.
Unless you actually have hard evidence to prove that it significantly impacts performance, why bother? It has more features than you need? You don’t need to use them. Certainly, you are using a programming language with more features than you actually put to use.
That’s not “overkill”. It’s “maintaining an option” 😉
Lastly, regarding your statement on making this work in just a few hours. There’s two options really:
- Either it is reasonable to assume that the storage may become a bottle neck, in which case you are far better off using a well maintained, optimized, scalable and reliable solution, because it will take you weeks to provide one and there is no telling how much data loss or data corruption you will cause in the mean time.
- Or, if the storage is not a bottle neck, why not just use something that gets the job done? Where is the value of having the full control that you are after?
If you can’t measure the benefits of dismissing all readily available 3rd party solutions in favor of a home made one, you’re probably suffering from Not Invented Here.
On Linux and Posix systems I would recommend using the dbm
API or the GNU gdbm library (implementing a superset of dbm
)
That key-value storage API is pretty common and exists since the previous century. It is in POSIX (using <ndbm.h>
). The implementations are quite good and able to deal with both small and huge data.
In general, you should know that NoSQL databases have some characteristics to consider:
1-It does not use SQL as its query language
This will add a layer of complexity in your solution if you are already using a SQL database in the same solution.
2-It may not give full ACID guarantees
That may be undesirable in critical applications.
3-It has a distributed, fault-tolerant architecture
This is very cool, but do you really need it?
4-Additional Security Model
If you are already using another RDBMS, you will have to build/use another security mechanism.
In my opinion, and as @MainMa, already mentioned, for common database applications, the task should be trivial if you already have an RDBMS database in your solution. Do the work required on your own, it is almost trivial. However, if your solution has no database already, then it may be worthwhile because some vendors offer hosting (and virtually installing) the database for you. That saves you resources and possibly money.