I am working on an application that needs to map pdf documents to corresponding json data files (see: Kotlin data structure for efficient lookup of nested data for more details). In trying to implement a solution, I encountered a more broad question about how to estimate the expected performance of a search function.
Let’s suppose I have a nested data structure (see below). For each of my incoming pdfs, I want to find the corresponding data object that matches certain parameters. There are two potential approaches here. The first is to simply use the data as-is, iterating down through each item until a match is found. The other option is to convert the data into an indexed data structure like a HashMap with the fields I want to match against as keys.
The first approach is going to take longer (on average) to find a match. The second approach is going to be more efficient per search but requires the initial overhead of iterating through the entire data structure once to build the map.
I know the exact answer is going to depend on the total size of the dataset, how deeply it is nested, and how many searches need to be done. However, I feel like it should be possible to get at least a rough estimate of which approach is better for a given data set. Is there a technique or rule of thumb for determining when it is worth the initial overhead to construct an indexed data structure?
For reference, here is an example of the class structure of the deserialized data.
internal data class Idoc(@SerializedName("FileId") val fileId: String, @SerializedName("Student") val students: List<Student>)
internal data class Student(@SerializedName("Id") val id: String,
@SerializedName("SchoolId") val schoolId: String,
@SerializedName("Name") val name: String,
@SerializedName("Documents") val documents: List<Document>)
//Although owners is structured as a list, there will only ever be one Owner per Document
internal data class Document(
@SerializedName("Year") val documentYear: String,
@SerializedName("Code") val imageCode: String,
@SerializedName("DocumentOwner") val owners: List<DocumentOwner>,
)
internal data class DocumentOwner(@SerializedName("FirstName") val firstName: String,
@SerializedName("LastName") val lastName: String,
@SerializedName("OwnerType") val type: String)
//Find a document that matches the student id, document image code, and owner type
internal fun lookupDocument(idocs:List<ServiceDocument>,id:String, imageCode:String, owner: IdocsDocumentOwner):ServiceDocument.Document
{
return idocs.asSequence().flatMap { doc -> doc.idoc.students }.first { student ->
student.cbfId == id
}.documents.asSequence().first { doc-> doc.imageCode == imageCode && doc.owners.first().type == owner.ownerType }
}