There are a few questions around this subject:
- Is creating all indexes at the start (design) over engineering?
- In reality, I’ve added indexes after performance issues were found and we ‘explain plan’ and add indexes as needed (i.e. not for all relationships). Should I just skip creating indexes before it’s needed because performance issues will show up anyway and probably the indexes that’s already been created aren’t being used?
- Should indexes be treated as part of requirement/specification? Sometimes I see performance requirements, which is usually vague or random. If they’re part of specification, then should managing indexes be part of implementation?
I want to make it a bit clearer and more background. In development, usually the different phases involve different kinds of people (e.g. architects, seniors and juniors). It’d be unusual for juniors to be responsible for the design, for example. So, when some tasks overlap, it becomes unclear who’s responsible if, for example, a query takes 2 second.
Defining development phases and responsibilities are also important for non-programmers, e.g. project managers or product owners, because they are needed for planning and costing. A project manager might want to higher an architect to do all database related task only at the start and for a short time. If after product launch, it’s found that performance is poor, the PM would like to know who’s responsible. I hope you get the idea.
For sake of discussion, let’s be specific in what we mean (and presume you’re doing TDD):
- Design is the process of specifying the total shape of your system. You can (and sometimes should) design without directly or indirectly writing a single line of database or program code. You define your Acceptance Tests as the last stage of design.
- Implementation is the process of turning said design into a “working” piece of software. If you’re being dogmatic, you wouldn’t write a single database index; you’d just define the tables, write the interface object, and rely upon your lower-level platform to load the data. You write Unit Tests and Integration Tests during the implementation, adn the implementation is done when the Acceptance Tests all pass.
- Optimization is making the “working” software you did in Implementation and speeding it up where it runs unacceptably slow. The three layers of test allow you to make changes without breaking anything.
Of course, that’s how it works in theory. But as a practical matter, since time and money are both limited, you’ll often to the steps out of order. If you expect to need an index on a particular field, there’s no reason not to index it as you design it. If while on a binge of coding you realize how you can elegantly solve an alternate usage situation, you can go ahead and code a draft first and write the unit test second. (And when you realize your design is wrong, you’ll have go to back and revise your tests anyway.)
As to your particular questions:
Is creating all indexes at the start (design) over engineering?
No, if all are useful indexes and they don’t take a significant amount of time.
Yes, if you wind up defining indexes for tables that are days away from even the first rough prototype. (If you create indexes in your initial design, subject them to tests latter on. They may be more trouble than they’re worth.)
Should I just skip creating indexes before it’s needed because performance issues will show up anyway and probably the indexes that’s already been created aren’t being used?
No, if it’s clear that you’ll need to have some columns indexed. If you plan on a feature that searches frequently for stock bids over or under a certain price, you’ll probably want an index on the bid value in addition to the record’s unique key.
Yes, if it’s not clear that the index is going to grant the overall system a performance improvement. The one query a month for bids made by state probably doesn’t need a specific index.
Should indexes be treated as part of requirement/specification?
NO. Indexes are a component of the software design. They have no more place in requirements or specifications than usage of a particular framework method or transient variable name.
(It’s possible for all or part of the programming to come from your customer, but it’s foolish to make use of such a requirement. An index servers no purpose if the server is just going to send the whole table anyway.)
2
Creating indices is an implementation detail, I wouldn’t consider them to be part of a specification. Your database structure design usually only involves the table & column names and data types, not the DDL that will be used to create it – and that is where CREATE INDEX
would be included.
For most complicated table relationships, you won’t have indices on all possible combinations of keys as that doesn’t make sense w.r.t. the business logic. You should create them at start on all the commonly used entity keys & JOIN
s, and imho just use Explain Plan to catch the edge cases.
Yes, creating database indexes is part of design, implementation, and optimization.
- A large set of indexes are part of the design. Primary key, and foreign key indexes would fall in this category. Primary and Unique keys are expensive to enforce without an index. If you understand the design well enough you can omit some foreign key indexes. You can also substitute indexes with additional columns for the foreign key index. If access patterns are well understood additional indexes may be specified. The primary key and foreign key indexes are essential to join performance and query optimization. Design will specify keys and indexes which are created during the implementation.
- During implementation you will create the indexes resulting from the design. The logical model from the design may be modified. This may result in new indexes. Generally, implementation should result in few new indexes.
- Performance issues may uncover access paths for which an appropriate index is not available. A more complete design or implementation may have uncovered some of these cases. In other cases, actual use or data distribution may differ from what was expected during design and implementation. Scale and index value distribution can have a significant impact on the value of an index to a query, and the optimized query plan.
Adding all the query indexes which might be, or are expected to be, needed at design or implementation time would be overkill. Each index carries an overhead, and too many indexes can be costly.
Some reasons an index may not be useful include:
- Access path is rarely or never used;
- The query optimizer may choose a different access path; or
- A partial index provides a sufficiently small set of records.
5