Here’s my .l file.
%{
# include <stdio.h>
%}
DIGIT [0-9]+
caps [A-Z]+
low [a-z]+
%%
""" printf("#");
"," printf("|");
{low} printf("_");
{caps} printf("A");
{DIGIT} printf("0");
Here’s my input:
" "
ajdljeohfgf
ODSHJOJEHEEW
894930248
"abdef"
"987987"
"EWGGEWR"
"wroegu2948t24t
Here’s what I say I should be getting:
# #
_
A
0
#_#
#0#
#A#
#_0_0_
Here’s what I am getting:
# #
_
A
0
#abdef#
#987987#
#EWGGEWR#
#wroegu2948t24_
That is, the quotes appear to be turning character classes on and off in some weird fashion. As one example, “abdef” ought to become (so I say) #_#, but it doesn’t. I’ve never seen such weirdness. My flex says it’s version 2.5.4. My locale says:
:~$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
and I admit that I’m accustomed to the C locale, but, heavens to Betsy! The first time it sees lower or upper, it matches like I expect, and then decides to stop matching at all?
What is going on?
Ubuntu 24.04, latest patches.
New info, it’s definitely character classes. if I stop using named character classes, it does what i expect.
%%
""" printf("#");
"," printf("|");
[a-z]+ printf("_");
[A-Z]+ printf("A");
[0-9]+ printf("0");
THAT WORKS LIKE I EXPECT. But I cannot understand why using the named char classes fails.
4
It’s working today.
I cannot imagine what the problem could have been. I noticed late yesterday some strange things that I thought were related to my PATH variable, but even those are gone today. I suppose I must have completely mangled my environment, but I cannot come up with even a far-fetched hypothesis as to why, or how.
FWIW, I’m using the flex that gets installed with Ubuntu 24.04, as package “flex”. flex++ would have forced me to rewrite a bunch of existing .l files.
I noticed that my input file had some non-printing characters in it, but I made another test file adding the same characters, that didn’t explain it. (Which is what I would expect in a normal world.)
I apologize for what amounts to a false alarm. But I fought with that thing all Sunday morning to no avail. It was making me crazy. If it happens again I will know not to blame flex. I’ll just reboot and see what happens.
Sorry.