I’m using go-elasticsearch/v7 and with a normal scroll:
import es "github.com/elastic/go-elasticsearch/v7"
[...]
type Query struct {
Size int `json:"size"`
Slice *QuerySlice `json:"slice,omitempty"`
Aggs *Aggs `json:"aggs,omitempty"`
Query *QueryFilter `json:"query,omitempty"`
Sort []string `json:"sort,omitempty"`
}
type QuerySlice struct {
ID int `json:"id"`
Max int `json:"max"`
}
[...]
type QueryFilter struct {
Bool QFBool `json:"bool"`
}
type QFBool struct {
Filter Filter `json:"filter"`
}
type Filter []map[string]map[string]interface{}
func (q *Query) AsBody() (*bytes.Reader, error) {
queryBytes, err := json.Marshal(q)
if err != nil {
return nil, err
}
return bytes.NewReader(queryBytes), nil
}
[...]
type Result struct {
ScrollID string `json:"_scroll_id"`
Took int
TimedOut bool `json:"timed_out"`
HitSet HitSet `json:"hits"`
Aggregations Aggregations
}
type HitSet struct {
Total struct {
Value int
}
Hits []Hit
}
[...]
func main() {
query := &Query{
Size: MaxSize,
Sort: []string{"_doc"},
Query: &QueryFilter{Bool: QFBool{Filter: filter}},
}
qbody, err := query.AsBody()
resp, err := client.Search(
client.Search.WithIndex(index),
client.Search.WithBody(qbody),
client.Search.WithSize(maxSize),
client.Search.WithScroll(scrollTime),
)
result, err := parseResponse(resp)
}
I get result.HitSet.Total.Value of 432734 for an example query (that I can’t use aggregations for). I’m then able to use client.Scroll
to get all the hits, but it takes over 1 minute, so I’d like to try slices and do each slice concurrently.
So I tried:
total := 0
for i := 0; i < maxSlices; i++ {
query := &Query{
Size: maxSize,
Sort: []string{"_doc"},
Slice: &QuerySlice{
ID: i,
Max: maxSlices,
},
Query: &QueryFilter{Bool: QFBool{Filter: filter}},
}
qbody, err := query.AsBody()
if err != nil {
return nil, err
}
resp, err := client.Search(
client.Search.WithIndex(index),
client.Search.WithBody(qbody),
client.Search.WithSize(maxSize),
client.Search.WithScroll(scrollTime),
)
if err != nil {
return nil, err
}
result, err := parseResponse(resp)
if err != nil {
return nil, err
}
fmt.Printf("slice total: %d; hits: %dn", result.HitSet.Total.Value, len(result.HitSet.Hits))
total += result.HitSet.Total.Value
}
fmt.Printf("overall total: %dn", total)
When maxSlices = 2, I get expected output:
slice total: 194104; hits: 10000
slice total: 238630; hits: 10000
overall total: 432734
But with maxSlices of 3 I get:
slice total: 125374; hits: 10000
slice total: 80754; hits: 10000
slice total: 125374; hits: 10000
overall total: 331502
And for 6 I get:
slice total: 10117; hits: 10000
slice total: 11253; hits: 10000
slice total: 114486; hits: 10000
slice total: 0; hits: 0
slice total: 0; hits: 0
slice total: 0; hits: 0
overall total: 135856
And despite that my example query should always be returning the same results (and does so day after day with the normal scroll query), what I get back for different maxHits values varies over time. For example yesterday maxScroll of 3 actually worked correctly.
Have I misunderstood sliced scrolls? Am I using it wrong?