I’m trying to parse a file “cached_txids_1week” with around 18,000 entries using Splunk.
I created a cache file in Splunk with the format (release: list of Txids), like a hashmap, for the last 3 months. I wrote a query to read this file for a specific release, retrieve all its Txids, process them in chunks, calculate the average for each chunk, then find the overall average of all chunks, and save the result in a new file for the Splunk dashboard to read instantly.
However, when I ran the query on this 3 months of data (with 18k entries), it has been running for 15 hours without results. Can someone please help me make this query better:
| inputlookup cached_txids.csv
| search to_release="Washington"
| makemv delim=" " txids
| mvexpand txids
| streamstats count as record_num
| eval chunk_num_temp = floor((record_num - 1) / 20)
| streamstats dc(chunk_num_temp) as chunk_num
| stats list(txids) as txids by chunk_num
| mvexpand txids
| map search="search txid=$txids$ | stats earliest(_time) as start_time latest(_time) as end_time by txid chunk_num" maxsearches=1000
| eval duration = end_time - start_time
| eventstats avg(duration) as avg_duration
| eval avg_duration_readable = tostring(avg_duration, "duration")
| stats first(avg_duration_readable) as avg_duration_readable
| fields avg_duration_readable
| outputlookup intermediate_avg_durations_output.csv