I’m trying to search data indexed by on-prem Microsoft Graph Connector. There is a bunch of files that have been indexed and I can see them in certain searches, however when trying to find data present in specific folders using the “URL” property, I can’t seem to get any results.
The query that I’m using is as follows:
search_query = {
"requests": [
{
"entityTypes": [
"microsoft.graph.externalItem"
],
"contentSources": [
f"/external/connections/{self.connection_id}"
],
"query": {
"queryString": value,
"queryTemplate": '{searchTerms} url:"O:\_Some&Folder\ClientName\Projects\Area\ProjectName*"'
},
"fields": [
"title", "fileName", "fileExtension", "url", "fileWebLink", "ParentLink", "path", "modifiedBy", "fileID", "documentId"],
"from": 0,
"size": 25,
}
]
}
and the endpoint that I’m sending requests to is: ‘https://graph.microsoft.com/v1.0/search/query’
The end goal is to be able to limit the returned data just to the folders that have a specific path on the on-prem file system. I’ve tried various approaches to querying the data and I suspect that the problem with filtering the url’s by their full path is related to how the index is internally stored (tokenization), but I can’t seem to find a good way around it.
Ideally, I’d be able to use KQL syntax like: (URL: “path*” OR URL:”path*”) but from what I’ve discovered so far, the query only returns data where I specify a part of the path, on either side of the Projects
folder, so either:
O:\_Some&Folder\ClientName\
or Area\ProjectName*
.
I’ve also noticed, that when querying using the URL property within the query string, in scenarios where I don’t enclose the URL within quotation marks, it splits the path into a bunch of smaller tokens and then the results seem to appear, however, I need a way to specify a couple of different project paths.
Does anybody know how to overcome the tokenization problem and filter by full URLs or if it is even possible? I know that technically I could just restrict the search to a smaller portion of the full URL, but then in case where two projects would have the same name, I run into a possibility of getting the wrong results.