Bug or feature?
Example:
^(?:[^a-zA-Zs(]+)(?: · [^a-zA-Zs(]+)*K[a-zA-Z]+ (?=(?:(?: · [^a-zA-Zs(]+[a-zA-Z]+ )* 【 [^ ]+(?: ·[^ ]+)* 】)?rn)
will match the transliterated parts “shibashiba”, “dare” and “hakkiri” after the initial sequence of hiragana characters in these texts:
しばしばshibashiba 【 屡々 ·屡屡 ·屡 ·数数 ·数々 ·数 】
だれdare · たれtare · たta 【 誰 】
はっきりhakkiri
Those parts are properly highlighted by the search.
When I attempt to replace the matched parts by anything, no replacement happens, and the search skips to the next match. Even if I attempt to replace by empty string.
All characters in the example are within the Unicode Basic Multilingual Plane.
Do I misunderstand something?
I understand that I can’t use a lookbehind
(?<=^(?:[^a-zA-Zs(]+)(?: · [^a-zA-Zs(]+)*)[a-zA-Z]+ (?=(?:(?: · [^a-zA-Zs(]+[a-zA-Z]+ )* 【 [^ ]+(?: ·[^ ]+)* 】)?rn)
because the part
^(?:[^a-zA-Zs(]+)(?: · [^a-zA-Zs(]+)*
is variable length.
Both the part discarded by K
and the lookahead after the match are necessary to avoid matching other lines of text. This pattern does match all variations I show above, and it also works well to avoid matching other parts of text, e.g. “(adv) often; again and again; frequently →Related words: 度々”.
Notepad++ v8.6.5 (32-bit)
BitByBit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.