Help Improve DB model/strategy, for use in refactoring SQL-ridden legacy code

Question:

Which design pattern do I use to help me refactor legacy SQL code into separate classes?

Goal

Bring in concepts of data separation and database separation and also latest modern techniques I may not yet be aware of (if any), to bring quality to spaghetti-coded test-less legacy PHP code base.

Details:

I am looking for a good model to help me refactor out DB code in a large legacy php project. I thought at first to use TableModule, but seeing how my SQL is complex, I think DomainModel may be better. But still I do not know how to implement it as description is too abstract. Maybe Service Layer is what I need… but to be honest, I am kind of lost.

Sample SQL that I deal with (one of many):

        SELECT
            db1.item.quantity,
            db1.sheet.serial,
            db2.work.id AS work,
            db3.part.id AS partid,
            db3.part.number,
            db3.part.description,
            db3.part.current_inv
        FROM db1.item
            LEFT JOIN db1.partial ON db1.partial.id = db1.item.db1_partial_id
            LEFT JOIN db1.sheet ON db1.sheet.id = db1.partial.db1_sheet_id
            LEFT JOIN db2.work ON db2.work.id = db1.sheet.db2_work_id
            LEFT JOIN db3.part ON db3.part.id = db1.item.db3_part_id
        WHERE
            db1.sheet.status != 'VOID' AND
            db1.item.id = "$itemid" AND
            db1.item.record = 0

It ties up 3 databases and multiple tables. Considering that I want to decouple SQL from my app, and decouple app from data handling, and having complex SQL queries like above, what do you suggest I do?

My current sample plan is

find next available stray SQL inside legacy code
determine what it is trying to do – i.e. save/update what data on what object or model.
create or add to the class that best encompasses SQL. For example, above SQL checks if inventory has been used up for an item. So I can place it into some class called CheckInventoryData, and be happy with that.. until maybe I learn something that makes me want to change the name of the class, for example. i.e. HandleItem, and put that class in some directory structure for database handling, i.e. /db_handling_code
have a function in that class saying isInventoryUsedUpForItem($item), or the like
repeat….

While I think this will work, I do not feel too confident about my plan. I am hoping to find solidification of it in a more modern/industry accepted practice. And finding that plan is my question — is there a better plan available. — is there a model I should aspire towards?

I’ll attempt if not answer your question, at least give some hints. As a side note, I’ve contributed to refactoring of several medium-sized projects (although not as big as yours), especially the ones which had data access directly mixed with business code, so the answer is partially based on my personal experience.

Identify the problem

The first thing I note in your question and comments is that it doesn’t seem clear whether you have an actual problem you want to solve:

Right now, I am looking to more so add modern concepts to my code. Things like looser coupling, data injection, database injection, and whatever else modern techniques relating to data codification are.

Adding modern concepts shouldn’t be your goal. Neither do lose coupling or data injection. The first thing to do is to identify the actual problem, i.e. something which blocks you right now and should be solved (eventually through lose coupling, data injection, or anything else).

That the actual problems might be:

Problem 1■ the difficulty to unit test the code. Whenever you want to test a simple function, you can’t do it in isolation, because it relies heavily on the database.
Problem 2■ the difficulty to find the relevant SQL query, because instead of following a clear architecture, queries are buried deep in business logic.

Thus, I’ll address those two problems below. Remember that it is imperative, before you search for the solution, to identify the problem. Otherwise, you may find yourself doing a lot of unneeded work or making things worse.

Study the environment
(Identify the scope of the problem)

Don’t overly focus on the location of SQL queries, but watch the overall architecture of the application. It might be that the problem is particularly visible when it comes to SQL queries, but applies to the overall code base.

Three years ago, I had to maintain a small application. It had SQL queries mixed with business logic, but the previous developer did a great work defining the architecture of the application. I don’t know why he refused to use n-tier, but the fact is that every time I had to find an SQL query, it was a matter of seconds, because the code was clear, classes were separated enough and naming conventions were excellent. While Cart class was dealing at the same time with data access, business logic and presentation, it was easy to guess that the SQL query which changes the quantity of a product in a cart would be in a function similar to $cart->changeQuantity($customer, $product, $delta).

If you have a good grasp on the scope of the problem, you may target better the parts of the code base which should be changed. In your case, if you’re concerned particularly about the location of the SQL queries (Problem 2■), but have found that the scope is larger (i.e. the lack of architecture concerns the whole code base), you would be better refactoring everything, not just the database-related code.

Don’t overly rely on a tool

When searching for a solution, you’ll find promising tools, as well as people recommending a specific tool, pattern or technique which will magically solve your problems.

Just use ORM! It will make data access much simpler!

Just use <some pattern here>, and your code will become easy to maintain!

While those tools, patterns or techniques are helpful, they won’t magically solve anything. At best, they will push the code base in a good direction with little impact. In worst case, they will require weeks or months of hard work and will have practically no positive impact.

Take an ORM. Yes, this will make your code simpler. In many cases, at least. But this doesn’t make the code easier to test (Problem 1■) and may make it even harder to find the location where a specific SQL query is generated (Problem 2■), since you won’t be able to grep the query itself any longer. It some cases, it might even have a negative impact of having two completely different ways of accessing data: through an ORM, for simple cases, and directly, for more difficult ones.

Solve the problem pragmatically

Once you identified the problem and its scope, the goal is now to solve the problem pragmatically, without overly relying on a single tool, pattern or technique (while using tools, patterns and techniques when needed).

The way I approached the problem in the past is the following in order to solve the two problems identified above.

Find next available stray SQL inside legacy code.
Determine what it is trying to do, i.e. save/update what data on what object or model.
_{Those two steps are exactly the same as the ones in your question.}
Create either functional or integration and system tests corresponding to this part of the code. Functional, integration and system tests usually run on a test database, specifically filled with test data.

This will serve as regression tests when you start making changes. Since those tests are at a high level (for example they may do an HTTP request to a web app instead of dealing with the underlying classes and interfaces), they should not be affected by the later changes in the code itself.
Check the coherence of the architecture/design of the class where the concerned SQL query was found. For example, if I find a query which changes a name of the product from Cart class, that’s not right. Refactoring, in this case, is mandatory. (Problem 2■)
Ensure the query is doing what it is supposed to do. Untested legacy code contains lots of surprises, and it is not unusual to find functions with misleading names, selection queries with side effects which always return an empty result, or update queries which don’t change anything. (Problem 2■)

This step also involves dealing with legacy horror such as dynamically generated queries (should be replaced by parametrized queries).
Ensure the query is as simple as it can be. Make it simpler, if possible.
Ensure you use all capabilities of your database engine you need to make things as simple as possible. For example, as I previously commented on your question, the query from your question might be replaced by a view.
Study the queries refactored in the past in order to identify potential code duplication. This may be as obvious as finding the exact same query somewhere else, or may be more difficult if only a part of a query is duplicated (for example same query but with an additional where criteria). (Problem 2■)

If duplication found, deduplicate.
Find a name for the query. (Problem 2■)
(Create a type which represents what the query returns or what it needs. The type should basically have one property per column in select. Since you’re using a dynamic language, you may not need to do this step.)
Extract the query into a function and locate a class which should contain this function.
If the class doesn’t exist:
- Find an explicit name. You may use a naming convention such as ...Repository (ProductsRepository, CustomersRepository).
- Create the class.
- Create the corresponding interface. (Problem 1■)
- Create the default mock which implements the interface. This mock returns zero rows in data retrieval functions and does nothing in data altering functions.
Create one or several mocks and/or stubs relevant in the context of the business code which uses the SQL query. Create the corresponding unit tests. They should be red at this stage, since your business code doesn’t use the stubs/mocks yet. (Problem 1■)
Change the business code to use the newly created function instead of a direct query. Unit tests should be green now.

I wouldn’t care about design patterns here (well, that would be a repository pattern, but again, I don’t care). Once you get your 2 200 SQL queries out of your business code, you can start playing with ORMs, design patterns, etc. The main benefit is that you’ll have your business logic completely separated from data access, and also covered by unit tests. With data access layer (DAL) and business layer (BL) communicating through interfaces, you have more room for experimentation.

A practical example is that I can now create a different implementation of DAL which uses an ORM while my team continues working with the previous implementation. Then, moving from one to another should consist of a tiny change of configuration (if dependency injection is used).

More importantly, experimentation becomes easier at this step. You can start implementing an ORM, find it to be a mistake, and abandon this particular implementation without having to revert code base to previous revision.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:08

Thẻ: domain-model, enterprise-architecture, legacy-code, php, refactoring

Thiết kế website giá rẻ

Danh mục

Help Improve DB model/strategy, for use in refactoring SQL-ridden legacy code

Identify the problem

Study the environment
(Identify the scope of the problem)

Don’t overly rely on a tool

Solve the problem pragmatically

Danh mục

Identify the problem

Study the environment(Identify the scope of the problem)

Don’t overly rely on a tool

Solve the problem pragmatically

Study the environment
(Identify the scope of the problem)