Possible Duplicate:
Writing Web “server less” applications
So, let’s say I’m going to build a Stack Exchange clone and I decide to use something like CouchDB as my backend store. If I use their built-in authentication and database-level authorization, is there any reason not to allow the client-side Javascript to write directly to the publicly available CouchDB server? Since this is basically a CRUD application and the business logic consists of “Only the author can edit their post” I don’t see much of a need to have a layer between the client-side stuff and the database. I would simply use validation on the CouchDB side to make sure someone isn’t putting in garbage data and make sure that permissions are set properly so that users can only read their own _user data. The rendering would be done client-side by something like AngularJS. In essence you could just have a CouchDB server and a bunch of “static” pages and you’re good to go. You wouldn’t need any kind of server-side processing, just something that could serve up the HTML pages.
Opening my database up to the world seems wrong, but in this scenario I can’t think of why as long as permissions are set properly. It goes against my instinct as a web developer, but I can’t think of a good reason. So, why is this a bad idea?
EDIT: Looks like there is a similar discussion here: Writing Web “server less” applications
EDIT: Awesome discussion so far, and I appreciate everyone’s feedback! I feel like I should add a few generic assumptions instead of calling out CouchDB and AngularJS specifically. So let’s assume that:
- The database can authenticate users directly from its hidden store
- All database communication would happen over SSL
- Data validation can (but maybe shouldn’t?) be handled by the database
- The only authorization we care about other than admin functions is someone only being allowed to edit their own post
- We’re perfectly fine with everyone being able to read all data (EXCEPT user records which may contain password hashes)
- Administrative functions would be restricted by database authorization
- No one can add themselves to an administrator role
- The database is relatively easy to scale
- There is little to no true business logic; this is a basic CRUD app
11
Doing as you suggest creates a tight(er) coupling between your client side language and your database.
That can be okay – there’s less code to write and maintain, and in theory debugging could / should go a little quicker.
On the other hand, it makes other aspects more difficult. If / when you need to change either of those technologies, you’ll have a harder time because of the tight coupling between them.
Protecting yourself against attacks will be (quite) a bit more difficult. You’re assuming that the client will always present nicely formatted requests to the database. That presumes no one will ever hack the client side code to insert malicious statements. In other words, they’ll “borrow” your authentication mechanisms and replace the normal client code with theirs.
I wouldn’t recommend it, and many would vehemently tell you not to. But it can be done.
10
It’s probably not a great idea. And the first and strongest reason I can give is that a database server isn’t designed to be a public web server. To the contrary, conventional wisdom says you should hide your database behind a firewall.
If you need supporting evidence, there are plenty of concerns — not all insurmountable, but plenty to think about. In no particular order, here are a few:
- It can’t perform query sanitization, because you’re sending it the queries directly.
- Database permissions tend to work differently than web server and application permissions. Web servers and application frameworks start you with nothing, and you need to explicitly create and expose individual resources, endpoints, and actions. Databases, on the other hand, ask you to grant roles at a high level.
- It’s probably not well-optimized to sustain a web server’s workload; you can’t benefit from connection pooling.
- The more popular web servers have been broken into a lot. And they’ve received a lot of security patches. Your DBMS has basically been designed to hide behind a firewall, so it probably hasn’t been tested by even a fraction of a percent of the potential threats it will face on the public web.
- You must use the database’s query language to protect private data. Depending on your DBMS, that can be challenging.
- You must use the database’s query language to filter large data sets — something you might strive to do anyway; but something that can become burdensome for more complicated business rules.
- Limited or no support for third party libraries.
- Very limited (potentially zero) community support for many of the problems you’ll encounter.
… And I’m sure there are other concerns. And I’m sure there’s a solution to most — if not all of these concerns. But, there’s a list to get you started!
1
The best single reason I can imagine is: because this method is not directly supported or recommended by any involved party.
Browser vendors, EcmaScript standards, database system developers, networking equipment companies, hosting/infrastructure architects, and security specialists do not actively support (or perhaps even consider) your proposed use case. This is a problem, because your proposed method requires all of these entities – and more – to work appropriately for your application, even though none of the involved systems were designed to support this.
I’m not saying it’s not possible. I’m just saying this is less like “re-inventing the wheel” and more like re-inventing browser-based client-server interaction.
At best, you’ll be doing a ton of work to even get the systems to work at the most basic possible level. Modern popular databases are not RESTful or built to work over HTTP, so you’ll be building your own WebSocket-based (I presume) client drivers.
Even if you get everything to technically work, you will be giving up many of the most powerful features of modern architectures. You will have no defense in depth – everyone can easily connect directly to the primary target of most website hacking attempts. But the scenario you propose is much, much worse than that.
The proposed model is not just exposing the server – it’s exposing valid connection strings. Attackers can’t just ping the server – they can actively login and feed it commands. Even if you can limit data access, I’m not aware of sufficient tooling in DBMS systems to protect from Denial of Service scenarios and their ilk. When working in enhanced versions of SQL, like TSQL, it’s often trivially easy to produce bombs that run effectively infinitely (a few unrestricted joins to produce a Cartesian product and you’ll have a SELECT that’ll run for ever, doing heavy work). I’d imagine you’d need to disable most of the features of SQL, even eliminating basic SELECT queries with JOINs and perhaps only allow calling stored procedures? I don’t even know if you can do that, I’ve never been asked to try. It doesn’t sound good.
Database scalability also tends to be one of the hardest problems in working at large scales, while scaling out multiple HTTP servers – especially with static or cached pages – is one of the easiest parts. Your proposal makes the database do more work by being responsible for basically 100% of server-side activity. That’s a killer flaw all by itself. What you gain from moving work onto the client you lose by moving more work onto the database.
Finally, I’d just like to point out that the heart of what you propose is not new, but actually goes back decades. This model is called the “fat database” model, which basically moved most server-side logic into the database just as you propose. There’s many reasons that model has gone by the wayside on the mass internet, and it would probably be informative to look more into that history yourself. Note also that even then there was little consideration of having completely untrusted users able to login to the system and run commands, as access would still be controlled to select internal (known) users who weren’t supposed to be attacking the system constantly.
The fact is that you’ll still need an HTTP server to serve files, as database systems just don’t do that. At the same time, everything you propose can be obtained by using a thin server model (such as with Nodejs) to expose a RESTful interface to your database. This is popular for a reason – it works, keeps the database hidden away behind layers of protection, is extremely scalable, and yet allows you to build your database as thick or as thin as you care to.
Since this is basically a CRUD application and the business logic consists of “Only the author can edit their post” I don’t see much of a need to have a layer between the client-side stuff and the database. I would simply use validation on the CouchDB side to make sure someone isn’t putting in garbage data and make sure that permissions are set properly so that users can only read their own _user data.
Well, placing your authorization (the security concerns) and logic validation away from Database provides separation of concerns in your software system. Thus you may test, maintain, scale and reuse your logical code blocks with less risks of braking the functionality in the system.
Providing ability for client input directly communicate with Database has very big potential to screw up the data.
This also means that avoiding/removing tight coupling make your software system more maintainable and SOLID.
7
Letting the user interact with the database directly seems really dangerous to me.
Is the authentication mechanism of CouchDB really so sophisticated that you can isolate the read- and write access of a user only to the data it is supposed to read and write (we are talking about per-document, maybe even per-document-field access privileges here)? What about “communal” data which is shared by multiple users? Doesn’t this exist at all in your application design?
Do you really want the user to be able to change its data in ANY way? What about XSS injections, for example? Wouldn’t it be better to have a server layer to filter those before they get into the database?
1
You’ve gotten a number of reasons, but here’s one more: future-proofing. Sooner or later, as your application evolves, you’ll be presented with some requirement that can’t be readily or securely achieved in client-side JS or as a stored procedure in your database.
For instance, you’re told that all new registrations need to have a CAPTCHA verification to be valid. This would be really easy enough with pretty much any modern web application framework. Just slap a reCAPTCHA on the registration form, pass reCAPTCHA’s response token back to the backend, and add a couple lines of code to your backend to verify the token’s validity with Google’s API (or better yet, use a library someone else wrote to do that for you).
If you’re using a two-tier system and relying on the database for all your server-side logic, how are you going to verify the token? Yes, I suppose it might be theoretically possible depending on the DBMS to write a stored procedure that somehow calls a shell and invokes curl with the proper arguments. That is also almost certainly a horrible idea: input filtering and protecting against security vulnerabilities would be terrible; you’d have a mess dealing with error handling and timeouts; and you’d have to parse the response yourself. Not to mention that a DBMS isn’t intended to do this, so there’s no reason to think performance, stability, thread-safety, etc… won’t be issues. See, for instance, this thread, which discusses some of these issues for Postgres.
And that’s just the issues around adding one simple CAPTCHA to a form. What are you going to do if you want to add SMS verification, or a background job that emails inactive users to remind them about your app, or add a file upload feature so people can set a profile picture? Maybe you decide your application should have some automated tests someday? Or that you’d like to track changes to your procedures in a version control system? There are numerous libraries and tools for most any useful language to handle most of these tasks for you, but few to none will be available for your DBMS, because it’s not meant to do this.
Eventually, you’ll want to do something that you can’t reasonably do directly in your DBMS, and then you’ll be stuck. Because you’ll have built your entire application in your DBMS, you won’t have any alternative but to get a web server and start rebuilding pieces in another language, just to add a simple feature.
And that would be a real shame, because we already have a name for the place where you put your application logic, and it’s called “your application’s source code” rather than “database stored procedures” for a reason.
If your security checks and business logic are contained in your client side javascript, they can be overridden by a malicious user. As an alternative, you can leverage a JavaScript based server-side technology (like Node.JS) to handle validation, authorization, and the like.
1
Edit page in firebug and at some point put a line similar to this:
ExecDbCommand("DROP TABLE Users")
Run it.
Edit:
The question was in fact about CounchDB so no sql to run here. Yet the idea is the same. I would presume that any non trivial application depends on data to respect some consistency rules that are checked/enforced by the application code. A malicious user can modify the client code to save data in a form that violates your business rules and might cause havoc in your application.
If your site considers all possible data states to be valid from a business perspective then by all means go this route but if this is not the case (likely) then you would want to have the guarantee that any data that gets stored is generated by your code and according to your rules.
4
Old question, I know, but I wanted to chime in because my experience is quite different to the other responses.
I’ve spent many years writing real-time, collaborative applications. The general approach for these applications is to replicate data locally and synchronise changes with peers as fast as possible. All operations on data are local, so all data storage, data access, business logic and user interface are layers are local. The “offline first” movement (http://offlinefirst.org/) has adopted this approach for building offline web applications and may have some relevant resources. These sorts of use cases not only mandate that you open up your data access layer to clients, but also data storage! I know, I know. Seems crazy, right?
The concerns for such offline-first apps are similar to what you’ve asked, just one level removed. It seems relevant to me. Given that you’re opening up direct data access to clients, the question becomes, how can you limit the effects of a malicious user? Well, there are a lot of strategies, but they’re not obvious if you come from a more traditional development background.
The first misconception is that exposing the database means exposing all the data. Take CouchDB for example; databases in CouchDB are lightweight, so you wouldn’t have a second thought about creating hundreds of thousands of separate databases on a server. Users can only access databases that they are granted permission to access as a reader or writer (let alone validation features and what-not of CouchDB), so they can only access a subset of the data.
The second misconception is that a user crapping on data is a problem! If users are given a replica of a database then they can crap on it all they like without affecting other users. But, you should validate their changes before replicating their data back to the “central” store. Think about Git – users can do what ever they like in branches, forks and local repositories without affecting the master branch. Merging back to master involves a lot of ceremony and is not done blindly.
I’m building a system presently using CouchDB where users need to collaborate on data to build a dataset that is then “published” via a QA/QC workflow. The collaboration is carried out on a replica of the data (we call this a staging or working database), and once complete, a responsible person performs QA/QC on the data and only after that is it replicated back into the main repository.
Many benefits flow from this that are hard to achieve in other systems – like version control, replication and collaboration (let along offline working!) for traditional three tier CRUD applications is super hard.
My advice – if your application is “traditional” then do it the traditional way. If any of the things I mentioned above (though there’s lots more…) apply to you then consider alternative architectures and be prepared to think laterally.
I think that, given all of your assumptions, it’s feasible to go directly from the client to the database. However, it’s reasonable to look at whether your assumptions are valid, and are likely to remain so in the future.
I would be worried that in the future it might not be OK for everyone to read all the data, and especially that it might develop more business logic in the future. Both of these are more likely if the project is successful.
As long as you leave yourself a way to deal with these problems in the future when and if you actually need to deal with them I think your design will work. I think you will need to be extra careful to separate concerns in the JavaScript code, and some of it might end up be rewritten on the server later.
But I could definitely see where it might be worth a risk of maybe doing that later vs. the benefit of fewer moving parts today.
1
First of all thanks for the OUT OF THE BOX question…. 🙂
But what i would suggest is; Always try to maintain a segregation between your 3 layers. which are Presentation/Business and Database or DAO because that will be the best practice in those kinds of requirements and setups where there will be lots of changes every day.
In simple worlds your Presentation layer should not know about Database layer i.e. Format of some date type fields might be different from presentation layer and database layer so that user can have a freedom to select suitable format of the date as per his/her needs.
And Business Logic must act like a coupling between presentation layer and database/Dao layer like casting of the fields, some business validations, etc should be handled on Business Layer rather than in Javascript section as per your question.
This segregation will provide you great ease and comport during complex scenarios, functionality-es and even complex validations.
The best advantage is: you can have different technologies to implement these layers and can be changed as per the business needs or scope.
Thanks
If you would like to build SQL in JavaScript and send it to the database, which verify the rights etc., than from security reason it would be a disaster. Simply because when you build an API, and build queries yourself, you have to analyse from the point of view of the security only the limited number of queries. If the queries are built outside your system, you have potentially unlimited number of tricks that someone could do.
But it’s not the case since you are using key-value database (as fair as I understand, CouchDB generally falls into that category). The database interface itself is a kind of middle layer, and it is tested for security reasons by Apache team. Because of relatively simple JavaScript API this is even easier to analyse for potential flaws than such complicated interfaces that JSF applications have.
This can be a secure solution, if you do complex security tests. This can be even easier as when using frameworks such as JSF, which are often using hardly readable API behind. Security by obscurity is considered no solution.
Relating to your question, it won’t be direct access to database anyway. Direct access would be the building SQL queries in JavaScript (unfortunatelly, I’ve seen such solutions). In your case the CouchDB itself provides the isolation layer. You could of course wrap it in your API to harden it, but as long as you can easily test what can particular user do and if the security constraints are working for you, you will have secure and robust solution without additional layers.
I see two problems:
1. Tight Coupling: Change your DB option? Well, now you gotta change all your client-side code too. Trust me. We don’t need more problems on the client-side.
2. TMI Security Issue: Reveals way too much about how stuff works. Auth might still be an obstacle but finding an exploit is a Hell of a lot easier when you know exactly what’s happening on the server-side.
A very-very-thin middle-tier might be a better way to go.
Your client cannot use your web app if javascript is disabled (or not supported on the browser of his device) if javascript is the only database-access-layer.
1