We are in the process to try out the opentelemetry colletor with tail sampling processor.
Our goal is to sample 10% of traces for each service.
I already tried the 10% of all traces, but sometimes it takes quite a significant time that a trace for a specific service is sampled.
For example:
policies:
[
{
name: sampling-policy,
type: probabilistic,
probabilistic: { sampling_percentage: 10 },
}
]
Service_1 had 100.000 traces last 3h
Service_2 had 10 traces in the last 3h
Result: Only traces from Service_1 are sampled.
So we need to do something like this in our opinion:
policies: [
{
# Rule 1: sample 10% of service_1
name: service_1_sampling_policy,
type: and,
and:
{
and_sub_policy:
[
{
# filter by service name
name: service-name-policy,
type: string_attribute,
string_attribute:
{
key: service.name,
values: [service-1],
},
},
{
# apply probabilistic sampling
name: probabilistic-policy,
type: probabilistic,
probabilistic: { sampling_percentage: 0.1 },
},
],
},
},
{
# Rule 2: sample 10% of service_2
name: service_2_sampling_policy,
type: and,
and:
{
and_sub_policy:
[
{
# filter by service name
name: service-name-policy,
type: string_attribute,
string_attribute:
{
key: service.name,
values: [service-2],
},
},
{
# apply probabilistic sampling
name: probabilistic-policy,
type: probabilistic,
probabilistic: { sampling_percentage: 0.1 },
},
],
},
},
]
Now the Problem is we have a lot of services and it can happen that service names are change etc. So manual maintaining this list is not possible.
Is there a possibility to do something like:
policies: [
{
for each service in service.name:
{
# Rule 1: sample 10% of each service
name: service_sampling_policy,
type: and,
and:
{
and_sub_policy:
[
{
# filter by service name
name: service-name-policy,
type: string_attribute,
string_attribute:
{
values: service_name,
},
},
{
# apply probabilistic sampling
name: probabilistic-policy,
type: probabilistic,
probabilistic: { sampling_percentage: 0.1 },
},
],
},
},
}
So the services are sampled independently of each other.