I am trying to parse an html input using jsoup (v1.18.1), extract elements, extract each attribute value and replace as follows:
-
>
with>
-
<
with<
The method I’m feeding this code into cannot have these symbols inside attribute values.
Below is the code I’m using
Elements elements = htmlDocument.getAllElements();
// Process each element's attributes
for (Element element : elements) {
// Iterate over all attributes of the element
for (Attribute attribute : element.attributes()) {
String originalValue = attribute.getValue();
// Escape only '>' and '<' characters
String escapedValue = escapeSpecificHtmlChars(originalValue);
// Update the attribute with the escaped value
element.attr(attribute.getKey(), escapedValue);
}
}
private String escapeSpecificHtmlChars(final String input) {
if (StringUtils.isBlank(input)) {
return input;
}
// Replace only '>' with '>' and '<' with '<'
return input.replace(">", ">")
.replace("<", "<");
}
Let’s say the element is
<span role="text" aria-label=">test value>">Test value!</span>
Attribute aria-label
has the value ">test value>"
escapedValue would be ">test value>"
But when I set the element using element.attr(attribute.getKey(), escapedValue);
, the attribute value becomes ">test value>"
I want the escapedValue
to stay as is when I set it as the attribute value.
Any help would be appreciated!
5