Relative Content

Tag Archive for large-language-modelchunkinghuggingface-datasets

Chunking a Tokenized dataset

I am trying to experiment with the databricks-dolly-15k dataset to make it suitable for fine tuning a Llama2 model according to this article by Phil Schmid. The initial part of building the dataset is quite clear.