Should database queries be abstracted out of the page itself?

When writing page generation in PHP, I often find myself writing a set of files littered with database queries. For example, I might have a query to fetch some data about a post directly from the database to display on a page, like this:

$statement = $db->prepare('SELECT * FROM posts WHERE id=:id');
$statement->bindValue(':id', $id, PDO::PARAM_INT);
$statement->execute();
$post = $statement->fetch(PDO::FETCH_ASSOC);
$content = $post['content']
// do something with the content

These quick, one-off queries are usually small, but I sometimes end up with large portions of database interaction code which begins to look pretty messy.

In some cases, I’ve solved this problem by creating a simple library of functions to handle my post-related db queries, shortening that block of code to a simple:

$content = post_get_content($id);

And that’s great. Or at least it is until I need to do something else. Maybe I need to get the five most recent posts to display in a list. Well, I could always add another function:

$recent_posts = post_get_recent(5);
foreach ($recent_posts as $post) { ... }

But that ends up using a SELECT * query, which I usually really don’t need anyway, but is often too complicated to reasonably abstract. I eventually end up with either a massive library of database interaction functions for every single use case, or a series of messy queries inside every page’s code. And even once I’ve built these libraries, I’ll find myself needing to do one tiny join that I hadn’t used before, and I suddenly need to write another highly-specialized function to do the job.

Sure, I could use the functions for general use cases and queries for specific interactions, but as soon as I start writing raw queries I begin to slip back into direct access for everything. Either that, or I’ll get lazy and will start doing things in PHP loops that should really be done directly in the MySQL queries, anyway.

I’d like to ask those more experienced with writing internet applications: is the maintainability boost worth the extra lines of code and possible inefficiencies that the abstractions may introduce? Or is simply using direct query strings an acceptable method for handling database interactions?

When you have too many specialized query functions you could try breaking them into composable bits. For instance

$posts = posts()->joinWithComments()->orderBy("post.post_date")->first(5);

There is also a hierarchy of abstraction levels you might find useful to keep in mind. You have

mysql API
your mysql functions, such as select(“select * from posts where foo = bar”); or maybe more composable as select("posts")->where("foo = bar")->first(5)
functions that are specific to your application domain, for instance posts()->joinWithComments()
functions that are specific to a particular page, such as commentsToBeReviewed($currentUser)

It pays a lot in terms of ease of maintenance to respect this order of abstractions. The pages scripts should use only level 4 functions, level 4 functions should be written in terms of level 3 functions, and so on. It’s true that this takes a bit more time upfront but it will help keep your maintenance costs constant over time (as opposed to “oh my gosh they want another change!!!”)

Separation of Concerns is a principle worth reading about, see the Wikipedia article on it.

http://en.wikipedia.org/wiki/Separation_of_concerns

Another principle worth reading about is Coupling:

http://en.wikipedia.org/wiki/Coupling_(computer_science)

You have two distinct concerns, one is the marshaling of the data from the database and the second is the rendering of that data. In a really simple applications there’s probably little to worry about, you’ve tightly coupled your database access and management layer with your rendering layer but for small apps this is no big deal. The problem is that web applications tend to evolve and if you ever want to scale a web app up, in any way ie performance or functionality, then you run into some problems.

Lets say that you’re generating a web page of user-generated comments. Along comes the pointy haired boss and asks you to start supporting Native Apps ie iPhone/Android etc. We need some JSON output, you now need to extract the rendering code out that was generating HTML. When you’ve done this you’ve now got a data access library with two rendering engines and everything’s fine, you scaled functionally. You may even have managed to keep everything separate ie business logic from rendering.

Along comes the boss and tells you that he has a customer who wants to display the posts on their website, they need XML and they need about 5000 requests per second peak performance. Now you need to generate XML/JSON/HTML. You can separate out your rendering again, same as before. However you now need to add 100 servers to comfortably get the performance you need. Now your database is being hit from 100 servers with possibly dozens of connections per server, each of which is directly exposed to three different apps with different requirements and different queries etc. Having the database access on each frontend machine is a security risk and a growing one but I won’t go there. Now you need to scale for performance, each app has different caching requirements ie different concerns. You can try and manage this in one tightly coupled layer ie your database access/business logic/rendering layer. The concerns of each layer are now starting to get in the way of each other ie the caching requirements of the data from the database could be very different than the rendering layer, the logic that you have in the business layer is likely to bleed into the SQL ie move backwards or it could bleed forward into the rendering layer, this is one of the biggest problems I’ve seen with having everything in one layer, it’s like pouring reinforced concrete into your application and not in a good way.

There are standard ways to approach these types of issues ie HTTP caching of the web services (squid/yts etc). Application level caching within the web services themselves with something like memcached/redis. You will also run into issues as you start to scale out your database ie multiple read hosts and one master, or sharded data across hosts. You don’t want 100 hosts managing various connections to your database that differ based on write or read requests or in a sharded database if a user “usera” hashes into “[table/database]foo” for all write requests.

Separation of concerns is your friend, choosing when and where to do it is an architectural decision and a bit of an art. Avoid tight coupling of anything that will evolve to have very different requirements. There are a bunch of other reasons for keeping things separate ie it simplifies testing, deployment of changes, security, reuse, flexibility etc.

I assume that when you say “the page itself” you mean the PHP source file that dynamically generates HTML.

Do not query the database and generate HTML in the same source file.

The source file where you query the database is not a “page”, even though it’s a PHP source file.

In the PHP source file where you dynamically create the HTML you just make calls to the functions that are defined in the PHP source file where the database is accessed.

The pattern I use for most middle-scale projects is the following one:

All the SQL queries are put apart from the server-side code, in a separate location.

Working with C#, it means using partial classes, i.e. putting the queries in a separate file, given that those queries will be accessible from a single class (see the code below).
Those SQL queries are constants. This is important, since it prevents the temptation of building SQL queries on the fly (thus increasing the risk of SQL injection and at the same time making it more difficult to review the queries later).

Current approach in C#

Example:

// Demo.cs
public partial class Demo : DataRepository
{
    public IEnumerable<Stuff> LoadStuff(int categoryId)
    {
        return this
            .Query(Queries.LoadStuff)
            .With(new { CategoryId = categoryId })
            .ReadRows<Stuff>();
    }

    // Other methods go here.
}

public partial class Demo
{
    private static class Queries
    {
        public const string LoadStuff = @"
select top 100 [StuffId], [SomeText]
    from [Schema].[Table]
    where [CategoryId] = @CategoryId
    order by [CreationUtcTime]";

        // Other queries go here.
    }
}

This approach has a benefit of having the queries are in a separate file. This allows a DBA to review and modify/optimize the queries and commit the modified file to source control without conflicting with the commits made by the developers.

A related benefit is that source control may be configured in a way to limit the access of DBA to only those files which contain the queries, and deny the access to the rest of the code.

Is it possible to do in PHP?

PHP lacks both partial classes and inner classes, so as is, it cannot be implemented in PHP.

You can create a separate file with a separate static class (DemoQueries) containing the constants, given that the class will be accessible from everywhere. Moreover, in order to avoid polluting the global scope, you can put all query classes in a dedicated namespace. This will create a rather verbose syntax, but I doubt you can avoid the verbosity:

namespace Data {
    public class Demo inherit DataRepository {
        public function LoadStuff($categoryId) {
            $query = QueriesDemo::$LoadStuff;
            // Do the stuff with the query.
        }

        // Other methods go here.
    }
}

namespace Queries {
    public static class Demo {
        public const $LoadStuff = '...';

        // Other queries go here.
    }
}

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 12:06

Thẻ: abstraction, database, php, sql

Thiết kế website giá rẻ

Danh mục

Should database queries be abstracted out of the page itself?

Current approach in C#

Is it possible to do in PHP?