I have 200 million documents in Elasticsearch. Each document is structured as follows:
{"url": url,
"username": username}
Whenever I try to search for a specific domain like this:
{
"query": {
"wildcard": {
"url": {
"value": "*test.com*"
}
}
}
}
It successfully returns URLs like:
{'url':'http://sd.test.com/login'}
{'url':'http://test.com/register'}
{'url':'http://admin.test.com:8080/'}
However, it doesn’t return all subdomains. For example, the following URL is not included in the search results:
http://sds.test.com/User/Login
It only returns it when I change the query value to:
{
"query": {
"wildcard": {
"url": {
"value": "*test.com*/"
}
}
}
}
But the others don’t return with it 😀
How can I modify my Elasticsearch query to consistently return all subdomains/domains for a specific domain without hardcoding each one? Any advice or alternative approaches would be greatly appreciated.
I have tried using other query types such as match, query_string, and term, but none of them consistently return the desired subdomains.