I have multiline log files like this one.
2024-05-01 00:00:00.000 [INFO] TEST1
TEST2
TEST3
2024-05-01 00:00:00.005 [DEBUG] TEST4
TEST5
TEST6
2024-05-01 00:00:00.010 [WARN] TEST7
TEST8
TEST9
I am extracting each log message and meta information. So, it works fine.
Here is a config:
receivers:
filelog:
include:
- /output/so.log
start_at: beginning
include_file_name: true
multiline:
line_start_pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}s'
operators:
- type: regex_parser
regex: '^(?P<timestamp>d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3})s*[(?P<severity>.*?)s*]s*(?P<body>.*)'
timestamp:
parse_from: attributes.timestamp
layout: '%Y-%m-%d %H:%M:%S.%L'
location: CET
severity:
parse_from: attributes.severity
- type: remove
field: attributes.severity
- type: remove
field: attributes.timestamp
exporters:
debug:
verbosity: detailed
service:
telemetry:
logs:
level: info
metrics:
level: none
pipelines:
logs:
receivers:
- filelog
exporters:
- debug
Result looks like this
Resource SchemaURL:
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-05-02 07:42:53.846409064 +0000 UTC
Timestamp: 2024-04-30 22:00:00.01 +0000 UTC
SeverityText: WARN
SeverityNumber: Warn(13)
Body: Str(2024-05-01 00:00:00.010 [WARN] TEST7
TEST8
TEST9)
Attributes:
-> log.file.name: Str(so.log)
-> body: Str(TEST7)
Trace ID:
Span ID:
Flags: 0
{"kind": "exporter", "data_type": "logs", "name": "debug"}
My goal is to strip body message and retain only log message excluding the timestamp and severity, since this data is captured in separate attributes.
regex_parser operator should help with that, but it does not work with multiline logs.
Can you suggest how to achive this? Might be there are other processor that can help?
The config for opentelemetry collector is attached.