Describe the bug
When overriding a Markdown renderer handler by setting the UNWRAPPED_TAGS option, the override is sometimes applied and sometimes not.
To Reproduce
In the following test, we convert HTML with a definition list to Markdown. As the definition list contains div tag, it is not converted correctly by the default renderer handler in FlexmarkHtmlConverter. I override the handler using the UNWRAPPED_TAGS option, such that the tags dl, dt, and dd get processed in a generic way.
The runs the convertion 10000 times and prints how many times it was correct and incorrect.
@Test
public void testMarkdownDefinitionList() {
String markdown;
int correct = 0;
int incorrect = 0;
DataHolder flexmarkOptions = new MutableDataSet()
.set(UNWRAPPED_TAGS, new String[] { "article", "address", "frameset", "section", "small", "iframe",
"dl", "dt", "dd", })
.toImmutable();
FlexmarkHtmlConverter converter = FlexmarkHtmlConverter.builder(flexmarkOptions).build();
for (int i = 0; i < 10000; i++) {
String html = "<dl id=\"definition-list\">\n" +
"<div>\n" +
"<dt></dt>\n" +
"<dd>Data 1</dd>\n" +
"<span>\n" +
"<dd>Data 2</dd>\n" +
"</span>\n" +
"</div>\n" +
"</dl>";
markdown = converter.convert(html);
if (markdown.contains("Data 2")) {
correct++;
} else {
incorrect++;
}
}
System.out.println("correct: " + correct + ", incorrect: " + incorrect);
assertEquals(0, incorrect);
}
Expected behavior
The test should be successful.
Resulting Output
The test fails and shows a similar number of correct and incorrect conversions.
Additional context
It seems, however I haven't had time to confirm, that this issue may be caused by storing the Markdown renderer handlers in a Set instead of a List (HtmlConverterCoreNodeRenderer.java:66). And then, the following code in FlexmarkHtmlConverter.java:
Set<HtmlNodeRendererHandler<?>> formattingHandlers = htmlNodeRenderer.getHtmlNodeRendererHandlers();
if (formattingHandlers == null) continue;
for (HtmlNodeRendererHandler<?> nodeType : formattingHandlers) {
// Overwrite existing renderer
renderers.put(nodeType.getTagName(), nodeType);
}
.. would pick elements from formattingHandlers in a random way and sometimes fail to override the handler that we wanted to override.
Describe the bug
When overriding a Markdown renderer handler by setting the
UNWRAPPED_TAGSoption, the override is sometimes applied and sometimes not.ParserHtmlRendererFormatterFlexmarkHtmlParserDocxRendererPdfConverterExtensionFlexmarkHtmlConverterTo Reproduce
In the following test, we convert HTML with a definition list to Markdown. As the definition list contains
divtag, it is not converted correctly by the default renderer handler inFlexmarkHtmlConverter. I override the handler using theUNWRAPPED_TAGSoption, such that the tagsdl,dt, andddget processed in a generic way.The runs the convertion 10000 times and prints how many times it was correct and incorrect.
Expected behavior
The test should be successful.
Resulting Output
The test fails and shows a similar number of correct and incorrect conversions.
Additional context
It seems, however I haven't had time to confirm, that this issue may be caused by storing the Markdown renderer handlers in a Set instead of a List (HtmlConverterCoreNodeRenderer.java:66). And then, the following code in FlexmarkHtmlConverter.java:
.. would pick elements from
formattingHandlersin a random way and sometimes fail to override the handler that we wanted to override.