Closed
Description
There is an assumption in the char class implementation that the input char set is known before regular expressions are processed, but this assumption is not enforced by the syntax.
In particular, the char class directives (%7bit
, %8bit
, %unicode
, etc) assume they are called only once and before any character set partitions have been constructed. They reset the partitions to one partition covering the (new) whole input char set.
Example:
%%
x = ab
%unicode
%%
{x} {}
produces
CharClasses:
class 0:
{ [0-1114111] }
...
Miniminal DFA is
State 0:
with 0 in 1
State 1:
with 0 in 2
State [FINAL] 2:
which is wrong and will match any 2-character sequence.