I’m using the java port of jtidy: https://github.com/jtidy/jtidy
When I set the makeClean flag to true, bgcolor="#ffffff"
is converted to
class=”c1″
. I don’t understand why.
I found this explanation: https://api.html-tidy.org/tidy/quickref_5.0.0.html#clean
This option specifies if Tidy should strip out surplus presentational
tags and attributes replacing them by style rules and structural
markup as appropriate. It works well on the HTML saved by Microsoft
Office products.
I don’t understand, where does the class c1 come from?
import org.w3c.tidy.Tidy;
import java.io.*;
public class TidyBug {
public static void main(String[] args) throws IOException {
String string = """
<table>
<tr>
<td bgcolor="#ffffff">test</td>
</tr>
</table>
""";
String result;
try (StringReader in = new StringReader(string); StringWriter out = new StringWriter();
PrintWriter ignored = new PrintWriter(new ByteArrayOutputStream())) {
Tidy tidy = new Tidy();
tidy.setPrintBodyOnly(true);
tidy.setMakeClean(true);
tidy.parse(in, out);
result = out.toString();
}
System.out.println(result);
assert result.equals("""
<table>
<tr>
<td class="c1">test</td>
</tr>
</table>
""");
}
}