“Never do in code what you can get the SQL server to do well for you” – Is this a recipe for a bad design?

It’s an idea I’ve heard repeated in a handful of places. Some more or less acknowledging that once trying to solve a problem purely in SQL exceeds a certain level of complexity you should indeed be handling it in code.

The logic behind the idea is that for the large majority of cases, the database engine will do a better job at finding the most efficient way of completing your task than you could in code. Especially when it comes to things like making the results conditional on operations performed on the data. Arguably with modern engines effectively JIT’ing + caching the compiled version of your query it’d make sense on the surface.

The question is whether or not leveraging your database engine in this way is inherently bad design practice (and why). The lines become blurred further when all the logic exists inside the database and you’re just hitting it via an ORM.

In layman’s words:

These are things that SQL is made to do and, believe it or not, I’ve seen done in code:

joins – codewise it’d require complex array manipulation
filtering data (where) – codewise it’d require heavy inserting and deleting of items in lists
selecting columns – codewise it’d require heavy list or array manipulation
aggregate functions – codewise it’d require arrays to hold values and complex switch cases
foreign key integrity – codewise it’d require queries prior to insert and assumes nobody will use the data outside app
primary key integrity – codewise it’d require queries prior to insert and assumes nobody will use the data outside app

Doing these things instead of relying in SQL or the RDBMS leads to writing tons of code with no added value, meaning more code to debug and maintain. And it dangerously assumes the database will only be accessed via the application.

I would rephrase that to “Never do in code what SQL Server can do for you well“.

Things like string manipulation, regex work and such I would not do in SQL Server (barring SQL CLR).

The above tends to talk about things like – joins, set operations and queries. The intention behind it is to delegate much of the heavy lifting to SQL Server (at things it is good at) and reduce the amount of IO as much as possible (so let SQL do the joins and filter down with a WHERE clause, returning a much smaller data set than otherwise).

Never do in code what you can get the SQL server to do well for you (emphasis is mine)

The key to the answer is you need to look for SQL doing something well, as opposed to simply doing something, for you. SQL is an amazingly powerful language. Coupled with built-in functions, it can potentially do a lot of things. However, the fact that you can do something in SQL should not be an excuse for actually doing it in SQL.

My specific criteria to make a decision is to look at the amount of data that you get back and the number of round-trips: if you can cut the amount of data by shipping a task to the server, without increasing the number of round-trips, then the task belongs on the server; if the amount of data remains the same or increases without a simultaneous drop in the number of round-trips, the task belongs in your code.

Consider these examples:

You store a birth date, and you need to calculate the age for a group of users. You can have SQL server do the subtraction, or you can do it in your code. The number of round-trips stays the same, and the amount of data sent back to you goes up. Therefore, a code-based solution wins
You store a birth date, and you need to find users of ages between 20 and 30. You can load all users back on the client, do the subtraction to find the age, and then do the filtering, but shipping the logic to SQL Server would reduce the amount of data without requiring a additional round-trips; therefore, SQL-based solution wins.

In short, it would be correct to say that: “Never perform database specific operations in your code base” as they are better addressed in your database.

Look at example of the set base operations. As you may know, RDBMS are build to handle a common data storage and manipulation operations.

In addition, the project choice of database plays important role. Having a RDBMS (MS SQL, Oracle, etc..) is different than NoSQL databases like RavenDB.

As a rule, your DB has more information to work with than your application, and can do common data operations more efficiently. Your database maintains indices, for example, while your application would have to index the search results on the fly. So all else being equal, your overall workload can be decreased by pushing the work to the database rather than the application.

But as your product scales, it typically becomes easier to scale your app than to scale your db. In large installations, is not uncommon to see application servers outnumber database servers by a factor of 10 to 1 or more. Adding more application servers is often a simple matter of cloning an existing server onto new hardware. Adding new database servers, on the other hand, is dramatically more difficult in most cases.

So at this point, the mantra becomes protect the database. It turns out that by caching the database results in memcached or by queueing updates in a application-side log, or by fetching the data once and calculating your statistics in your app, you can dramatically reduce your database workload, saving you from having to resort to an even more complicated and fragile DB cluster configuration.

I think it would be poor design to not use the database for the things it is meant for. I have never seen any database where the rules were enforced outside the database that had good data. And I have looked at hundreds of databases.

So things that must be done in a database:

Auditing (application only auditing will not track all changes to the
database and thus is worthless).
Data ingerity constrainsts including default values, foreign key
constraints and rules which must always be applied to all data. All
data is not always changed or inserted through an application, there
are one-time data fixes especially of large data sets that are no
practical to do one record at a time (please update these 100,000
records that got mismarked as status 1 when they should be 2 due to
an application code bug or please update all records from client A to
client B because company B bought company A) and data imports and
other applications which might touch the same database.
JOINS and where clause filtering (to reduce the number of records
sent across the network)

“Premature optimization is the root of all evil (most of it, anyway) in computer programming” – Donald Knuth

The database is exactly that; the data layer of your application. Its job is to provide your application with the data asked for, and store the data given to it. Your application is the place to put code that actually works with the data; displaying it, validating it, etc.

While the sentiment in the title line is admirable, and accurate to a point (the nitty-gritty of filtering, projecting, grouping etc should in the overwhelming number of cases be left to the DB), a definition of “well” might be in order. The tasks that SQL Server can execute with a high level of performance are many, but the tasks that you can demonstrate that SQL Server does correctly in an isolated, repeatable manner are very few. SQL Management Studio is a great database IDE (especially given the other options I’ve worked with like TOAD), but it has its limitations, first among them being that pretty much anything you use it to do (or any procedural code you execute in the DB underneath) is by definition a “side effect” (altering state lying outside the domain of your process’s memory space). In addition, procedural code within SQL Server is only just now, with the latest IDEs and tools, able to be measured the way managed code can using coverage metrics and path analysis (so you can demonstrate that this particular if statement is encountered by tests X, Y, and Z, and test X is designed to make the condition true and execute that half while Y and Z execute the “else”. That, in turn, assumes you have a test that can set the database up with a particular starting state, execute the database procedural code through some action, and assert the expected results.

All of this is much more difficult and involved than the solution provided by most data access layers; assume the data layer (and, for that matter, the DAL) know how to do their job when given the correct input, and then test that your code provides correct input. By keeping procedural code like SPs and triggers out of the DB and instead doing those types of things in application code, said application code is much easier to exercise.

One of the things people don’t seem to realize is that doing all of your processing on the SQL server is not necessarily good, regardless of the effects on code quality.

For instance, if you need to grab some data and then compute something from the data and then store that data in the database. There are two choices:

Grab the data into your application, compute within your application, and then send the data back to the database
Craft a stored procedure or similar to grab the data, compute across it, and then store it all from a single call to SQL server.

You may think that the second solution is always the fastest, but this is definitely not true. I’m ignoring even if SQL is a bad fit for the problem(ie regex and string manipulation). Let’s pretend you have SQL CLR or something similar to have a powerful language in the database even. If it takes 1 second to make a round trip and get the data and 1 second to store it, and then 10 seconds to do the computation across it. You’re doing it wrong if you’re doing it all in the database.

Sure, you shave off 2 seconds. However, had you rather waste 100% of (at least) one CPU core on your database server for 10 seconds, or had you rather waste that time on your web server?

Web servers are easy to scale up, databases on the other hand are extremely expensive, especially SQL databases. Most of the time, web servers are “stateless” as well and can be added and removed at whim with no additional configuration to anything but the load balancer.

So, think not just about shaving 2 seconds off of an operation, but also think about scalability. Why waste an expensive resource like database server resources when you can use the much cheaper web server resources with a relatively small performance impact

I like to look at it as SQL should only deal with the data itself. The business rules that decide what the query may look like can happen in code. The regex or validation of the informaiton should be done in code. SQL should be left to just join your table, query your data, insert clean data, etc.

What gets passed into SQL should be clean data and SQL should not really need to know anything more than it needs to store it, update it, delete it or retrieve something. I have seen way too many developers want to throw their business logic and coding in SQL because they think of the data as their business. Decouple your logic from your data and you will find your code gets cleaner and easier to manage.

Just my $0.02 though.

Generally I agree that the code should control the business logic and the DB should be a logic free hash. But here are some counter points:

Primary, foreign key, and required (not null) constraints could be enforced by code. Constraints are business logic. Should they be left out of the database since they duplicate what code can do?

Do other parties outside of your control touch the database? If so having constraints enforced close to the data is nice. Access could be restricted to a web-service which implements logic, but this assumes you were there “first” and have the power to enforce the use of the service on the other parties.

Does your ORM perform a separate insert/update for each object? If yes, then you will have severe performance problems when batch processing large data sets. Set operations is the way to go. An ORM will have trouble accurately modeling all the possible joined sets which you could perform operations on.

Do you consider a “layer” to be a physical split by servers, or a logical split? Running logic on any server could theoretically still fall under it’s logical layer. You might organize the split by compiling into different DLL’s rather than splitting servers exclusively. This can dramatically increase response time (but sacrificing througput) while maintaining separation of concerns. A split DLL could later be moved to other servers without a new build to increase throughput (at the cost of response time).

The idiom is more to do with keeping the business rules, to do with the data, together with the relations (the data and structure and relationships.) It’s not a one-stop-shop for every problem but it helps to avoid things like manually maintained record counters, manually maintained relationship integrity etc, if these things are available at the database level. So if someone else comes along and extends the programs or writes another program that interacts with the database, they won’t have to figure out how to maintain database integrity from previous code. The case of a manually maintained record counter is particularly pertinent when someone else wants to author a new program to interact with the same database. Even if the newly created program has exactly the right code for the counter, the original program and the new one running at approximately the same time are likely to corrupt it. There’s even code out there that retrieves records and checks conditions before writing a new or updated record (in code or as separate queries), when if possible this can often be achieved right in the insert or update statement. Data corruption can again result. The database engine guarantees atomicity; an update or insert query with conditions is guaranteed to affect only the records meeting the conditions and no external query can change the data half way through our update. There’s many other circumstances where code is used when the database engine would better serve. It’s all about data integrity and not about performance.

So it’s actually a good design idiom or rule of thumb. No amount of performance is going to help in a system with corrupt data.

As mentioned before, the goal is to send to and receive as little as possible from the database because the round trips are very costly time-wise. Sending SQL statments over and over again is a waste of time especially in more complex queries.

Using stored procedures in the database allows developers to interact with the database like an API, without worrying about the complex schema on the back. It also reduce the data sent to the server since only the name and a few parameters are sent. In this scenario, most of the bussines logic can still be in the code but not in the form of SQL. The code would essentially prepare what is to be sent or requested from the database.

There are a few things to remember:

A relational database should ensure referential integrity through foreign keys
Scaling one database can be difficult and expensive. Scaling a web server is a lot easier simply by adding more web servers. Have fun trying to add more SQL server power.
With C# and LINQ, you can do your “joins” and whatnot through code so you kind of get the best of both worlds in many cases

“Premature optimization is the root of all evil” – Donald Knuth

Use the tool most appropriate for the job. For data integrity, this is often the database. For advanced business rules, this is a rule-based system like JBoss Drools. For data visualisation, this would be a reporting framework. etc.

If you have any performance issues, you should then afterwards look whether any data can be cached, or whether an implementation in the database would be quicker. In general, the cost of buying extra servers or extra cloud power will be far lower than the added maintenance cost and the impact of extra bugs.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 23:00

Thẻ: design-patterns, sql

Thiết kế website giá rẻ

Danh mục

“Never do in code what you can get the SQL server to do well for you” – Is this a recipe for a bad design?