Relative Content

Tag Archive for pythonhtml-parsinglangchainretrieval-augmented-generationunstructured-data

“unstructured” and langchain’s “HTMLHeaderTextSplitter” ignores “pre” and/or “code” HTML tags

I want to read a webpage and split it into chunks to feed a vector database in a RAG pipeline. This webpage has python code examples on it, but I cannot create chunks with that code text, it is ignored by the splitters. I tried both unstructured python package, and HTMLHeaderTextSplitter class (from langchain_text_splitter package) with the same result.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for pythonhtml-parsinglangchainretrieval-augmented-generationunstructured-data

“unstructured” and langchain’s “HTMLHeaderTextSplitter” ignores “pre” and/or “code” HTML tags