Problem:
I have a Cassandra table using Datastax Enterprise (DSE) that stores an id and a data field, where data contains a JSON object. I’ve enabled Solr indexing to optimize search performance.
Example Table Structure:
id: UUID
data: JSON object (e.g., {“textField”: “some text”, “otherField”: “other value”})
Scenario:
Assume the table has 10 million records. I want to search within the data field to find records where the textField contains a substring, e.g., ‘%dd%’.
Questions:
Is it possible to perform this type of search directly on the JSON field using Solr in Cassandra Datastax?
How should the schema be configured to enable this type of search?
Is this method optimized for performance?
Proposed Solution:
I am considering creating a new table where each field from the JSON object is a separate column at the top level. This way, I can perform searches directly on these fields.
Example:
Instead of storing data as a JSON object, create a new table with separate columns for each field in the JSON object.
Questions for the Proposed Solution:
Is this a better approach for performance optimization?
What are the best practices for restructuring data like this in Cassandra Datastax?
Any advice or insights on the best way to achieve efficient querying on JSON fields in this setup would be greatly appreciated. Thank you!
When I enabled Solr mode on DataStax to use Solr queries, I found the “solr_resources” table. I also uploaded my schema.xml file with the following content:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="org.apache.solr.schema.UUIDField" name="UUIDField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="data" type="StrField"/>
<field docValues="true" indexed="true" multiValued="false" name="id" type="UUIDField"/>
</fields>
<uniqueKey>id</uniqueKey>
</schema>
However, this configuration doesn’t seem to be working. I attempted to modify the table directly, but I’m unsure if Solr has accepted these changes. How can I ensure that it adopts the new configuration and allows me to execute the queries I need?
Thần Hoàng is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.