I want to build a simple indexed search for a string I have (I know I could use a database for this – I just want to do it for learning purposes)
The most efficient way to do it that comes to my mind right now:
1.) Split the whole string at all whitespaces.
2.) Take the substrings from the split and order them from longest to shortest substring and store them in an array
3.) When the search query is provided, also split the query at whitespaces and sort the substrings from long to short.
4.) Take the first substring from the search term
5.) Search in the array for the first substring from the search query by comparing the strings case-insensitive but skip those strings that are shorter than the current query substring
6.) Continue this for all the remaining substrings
Further optimizations I could thing of:
sort the index items that share the same length alphabetically and organize those in a binary tree?
How are database search functions usually implemented?
Just a draft so far – want to get the theory right before coding.