I want to regex and replace only text content(innerText) of an html, and in the end keep all the HTML elements (or restore them as they were).
The regex must not check the HTML elements, but only the text content inside the HTML (innerText, textContent etc..)
Made-up example, a “dialogue highlighter”
string:
<html>
<body>
<h1>Hello, World!</h1>
<p id="aaaa";>"Omae wa moshindeiru."</p>
<p id="aaaa";>"Naani!"</p>
</body>
</html>
Javascript:
elemen = document.querySelector('body');
body.innerText = body.innerText
.replace(/["“”]([^"”“n]+)["”“]/g, '"€€$1××"');
element.innerHTML = element.innerHTML
.replace(/€€/g, '<span style="color: red">')
.replace(/××/g, '</span>')
expected output:
<html>
<body>
<h1>Hello, World!</h1>
<p id="aaaa";>"<span style="color: red;">Omae wa mo shindeiru"</span></p>
<p id="aaaa";>"<span style="color: red;">Naani!</span>"</p>
</body>
</html>
Actual output:
<html>
<body>
Hello, World!<br><br>"<span style="color: red;">Omae wa moshindeiru.</span>"<br><br>"<span style="color: red;">Naani!"</span>
</body>
</html>
Yes i know i could adapt the regex, but no.
I just want to act in the text content and then restore the lost HTML elements.