Predicting advantages of database denormalization

I was always taught to strive for the highest Normal Form of database normalization, and we were taught Bernstein’s Synthesis algorithm to achieve 3NF. This is all very well and it feels nice to normalize your database, knowing that fields can be modified while retaining consistency.

However, performance may suffer. That’s why I am wondering whether there is any way to predict the speedup/slowdown when denormalizing. That way, you can build your list of FD’s featuring 3NF and then denormalize as little as possible. I imagine that denormalizing too much would waste space and time, because e.g. giant blobs are duplicated or it because harder to maintain consistency because you have to update multiple fields using a transaction.

Summary: Given a 3NF FD set, and a set of queries, how do I predict the speedup/slowdown of denormalization? Link to papers appreciated too.

7

You would have to know the dataflows between the tables to be able to see how the DB model performs.
Once you have that you can calculate the change in performance for a given denormalization (e.g if you decide to duplicate data)

Some rough estimates can be deduced by how many new indexes you would need after the denormalization steps. Each new index must be updated and queried separately which will incur a performance hit proprtional to the number of new indexes.

Big blobs of binary data should in any case be stored in a separate table and not copied around. They are (usually) not queried but returned as part of the final result set after a query against some other set of tables.

I’m not sure there is any academical research on when denormalizing can help (IMHO there is quite a big difference between what is taught about DB normalization and how it works in practice).

However, there are several interesting articles and blog entries about this _ Jeff Atwood speaks about normalizing in his blog, and there’s a “reply” to him at high scalability.

When denormalizing, I suggest you pay attention to

  • the number and type of queries per unit of time; if you use insert and/or update more than read, denormalizing wouldn’t be of much help.
  • how often the duplicated information will be updated
  • the characteristics of the DBMS you will use
  • how many times the information is duplicated; if you have the same information in 4-5 tables, it might be faster to keep it in a separate table rather than copy it so many times
  • the expected amount of data kept in DB; what might work for small amounts of data, can lead to a disaster if the number of records increases. And vice verse (I mean the KISS principle and not fixing what ain’t broken).

I imagine that de-normalizing too much would waste space and time

Space is not to worry about in most medium size Line of Business OLTP applications. So leave space aside. Time, and by time I assume you mean performance of query, that is something that can usually be enhanced and does not cause a real issue unless you have a bad design, insufficient resources, extremely large database, very large number of transactions or all of the above. Most applications using today’s databases would rarely have a performance problem only because the database is Normalized.

giant blobs are duplicated or it because harder to maintain consistency because you have to update multiple fields using a transaction.

Normalizing your database assures you that you design will:

  1. Not have redundant data.

  2. Not cause huge number of log enteritis to be created (ex with a table of 2 Million customers: UPDATE Customer Set Country=”USA” WHERE Country=”US”)

  3. Be supported fully be SQL Queries. This point is very important.

  4. Will drive clean application code.

  5. Force a high degree of data consistency via the database without burdening the application.

  6. Share business rules defined in the database by different applications without coding the same code in different applications.

Having said that, Normalization produces optimal structure for all columns and tables. This you may not always need in your particular application, you then could determine, given your understanding of your domain and your application, to de-normalize some of the tables/columns as a trade-off for speed. However, that would be a conscious decision rather than an oversight.

Given a 3NF FD set, and a set of queries, how do I predict the speedup/slowdown of de-normalization?

You can’t predict performance accurately without testing (which you can do before you write the application code). However, you can eliminate and detect factors that would lead to bad performance by design. For example, you can identify what index strategy to use as follows (other techniques may exist):

  1. Build a matrix of queries and columns affected by those queries.

  2. Find columns that are used the most.

  3. Consider building indexes on those columns.

This is mainly a job where your DBA could assist you with. There is more to performance than Normalization. There are aspects of data distribution over disk volumes, vertical table splitting, partitioning, index types and index buffering to name a few. All of such techniques should be addressed in books and vendor documentation under “Database Design” and “Database Performance Tuning” subjects. All of the above discussion assumes your application is an OLTP applications.

One of the several major reasons to normalize is that it optimizes for general use cases while denormalization tends to optimize performance for specialized use cases (with significant penalties for other use cases). This is one reason why usually OLTP workloads benefit primarily from normalization (there are exceptions here but they are rare).

In order to predict advantages, what you really have to know is what exactly you are denormalizing and for what workflows. There are also questions about the size of your data set and what the impacts of caching are likely to be. So the answer is likely to depend on a very large number of things including database size, what portion is likely to still be in memory, planning overhead of complex queries, and the like. This is a very complicated, implementation-specific matter, and it depends very much on both your database and your RDBMS. These advantages will be greatest in OLAP workloads, and typically the disadvantages will be greatest in OLTP workloads.

So I don’t see that there is a single answer here other than to watch query plans, and consider the possibility of materialized views for denormalized data. In my view the best approach is to have relatively normalized OLTP database and denormalize for reporting purposes as needed only.

Normally you den-normalize your data model to optimize performance for a particular use case. This will usually have an adverse effect on the performance of other use cases. e.g. repeating data in several rows may speed up query processing by eliminating a join, but, your update processing will be slow down.

In effect 3NF gives optimum performance for any number of arbitrary accesses to your database, but, for particular joins and selects there may be a better models.

So treat de-normalization like you would any other optimization. i.e. don’t do it unless you actually have a performance problem, and, make sure your ‘fix’ does not cause more problems than it solves.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật