I would like some pointers to information which would help me understand how to go about providing the ability to filter a list of entities by their attributes as well as by attributes of related entities.
As an example, imagine a web app which provides order management of some kind. Orders and related entities are stored in a relational database. And imagine that the app has an interface which lists the orders.
The problem is: how does one allow the list to be filtered by, for example:-
- order number (an attribute)
- line item name (an attribute of a n-n related entity)
- some text in an administrative note related to the order (text found in an attribute of a 1-1 related entity)
I’m trying to discover whether there is something like a standard, efficient way to construct the queries and the filtering form; or some possible strategies; or any theory on the topic; or some example code. My google foo fails me.
3
how to go about providing the ability to filter a list of entities by their attributes as well as by attributes of related entities.
Some of the guidelines to consider are:
1 – What does the end user want and what information the end user has when she uses the system? Answering this should provide you with the primary queries to implement.
2 – Which queries are must have and which ones are nice to have. This classification would help you design your indexes better.
3 – For performance reasons, draw the line between an on-line query and a batch query (see point B below). Ask how frequent a given query is run and how often this is required. For frequent queries on large databases, you must create appropriate indexes. Some times the user may ask for statistical information on periodical basis. It is not always feasible to provide such reports on-line.
4 – Distinguish between a “search criteria” and a “filter criteria”. The first could be used to generate 1 or more rows and the other is used to locate specific details within a specific result of executing a “search criteria”.
In your case:
A – Order Number is good only if it is known. Ask how else could you locate an order? May be the userid of the person who placed the order with a given date range and an amount range would be good?
B – A Line item name/id, may be useful in some businesses but I can’t see that it is a standard. Line item name/id is bad when you have to specify it in full, many items have the same name prefix, the item is common and appears on many orders, and so on. It may be useful as a filter argument for a specific large order (as those used in manufacturing for example). It also may be meaningful to consider searching line items appearing together with other line items. In some businesses, you could discover that certain items are always ordered together and the business could use this information to enhance their business. However, such a discovery process is best handled by off-line reports when the database is large.
C – Searching the Notes on an order is possible but ask why is there a note in the first place? May be the note is used to capture a ‘reason’ or a ‘category’ information that deserve to be placed on 1 or more separate columns. For example, the user may want to list orders that are partially delivered. Some systems use the note field to represent many unstructured information. Searching text may return inconsistent results. Some database allow you search text efficiently, but you still have to live with free form information and may cause the user to ask the system to return large amounts of data to be searched manually. To avoid this, you may want to analyze the notes contents as mentioned earlier or to use standard tags. This board uses both methods.
You should have a look at ORM systems like Hibernate and its Criterias.
Aside from RDBMS you may look at Graph DBs like Neo4J and the Tinkerpop stack especially the Gremlin part.
For a Java solution see these two SO threads on filtering collections:
- Java: Finding objects in collections
- Java: What is the best way to filter a Collection?