I’m using the re
module in Python. Let’s say I want to find all Arabic letters in a string. Essentially I want to combine w
with [u0600-u06FF]
.
Is there a way of doing this? Specify both a character range and a class, where both must match?
If not possible with python’s native re
module, is it possible with regex
from pip?
The only answer I could come up with, inspired by this answer for matching any letter, is this:
arabic_letters = re.findall('[^\W\du0000-u05FFu0700-U0010FFFF]', my_string)
This works fine, my only problem with it is it feels wrong. Instead of specifying the range 0600-06FF, I have to specify everything before and everything after this range. It got even more convoluted when I wanted to add other ranges of Arabic characters:
arabic_letters = re.findall('[^\W\du0000-u05FFu0700-uFB4FuFE00-uFE6FuFF00-U0010FFFF]', my_string)
(there may be even more ranges I need to add, haven’t checked yet)