Given two times in the form HH:MM
, I want to filter logs containing all hours between the given hours.
So for example, given 09:30 and 15:30, a simple regex T(09:[345]|1[01234]|15:[123])
would suffice. There is the assumption that input is in correct format.
Creating a regex like, T(09:30|09:31|09:32|.....|15:30)
is a trivial simple loop, but I wonder what transformations could I do to optimize such a regex and if such transformations are worth it.
I am writing in python and this is my current code. If it simplifies, I am open to any Unix tools.
from dataclasses import dataclass
import datetime
import re
@dataclass
class TimeFilter:
start: datetime.time
stop: datetime.time
def timergx(self):
i = self.start.hour * 60 + self.start.minute
stop = self.stop.hour * 60 + self.stop.minute
return "T(" + "|".join(f"{x // 60:02d}:{x % 60:02d}" for x in range(i, stop)) + ")"
def HHMM2time(txt: str):
return datetime.time(*[int(x) for x in txt.split(":")])
tf = TimeFilter(HHMM2time("9:30"), HHMM2time("15:30"))
assert re.match(rf.timergx(), "T10:30")
How to “generate” the regex so it is “faster”? Or, is the regex in the form T(09:30|09:31|09:32|.....|15:30)
actually “faster” to process than any optimized form? If relevant, GO regex engine will be using the regex.