I’ve read countless articles and the official documentation explaining how to finetune Paddle but I keep running into problems. I find the documentation to also be vague. The default model works well for my use case but there are some characters in my images that doesn’t look like traditional characters (OCR-A Font).
Eg : 1 is recognised as L, and 6 is recognised as b
I’m trying to train it on Colab and it’s coming up with this error.
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.weight_ih not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.weight_hh not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.bias_ih not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.bias_hh not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.embedding.weight not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.prediction.weight not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.prediction.bias not in model
[2024/12/09 03:43:11] ppocr INFO: load pretrain successful from /content/pretrain_models_rec/en_PP-OCRv3_rec_train/best_accuracy
[2024/12/09 03:43:11] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 100 iterations
Exception in thread Thread-1 (_thread_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 187, in reraise
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
batch = fetcher.fetch(indices)
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
data.append(self.dataset[idx])
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
[Previous line repeated 7 more times]
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 161, in __getitem__
raise RecursionError("Maximum recursion depth exceeded in __getitem__")
RecursionError: Maximum recursion depth exceeded in __getitem__
Traceback (most recent call last):
File "/content/PaddleOCR/tools/train.py", line 269, in <module>
main(config, device, logger, vdl_writer, seed)
File "/content/PaddleOCR/tools/train.py", line 222, in main
File "/content/PaddleOCR/tools/program.py", line 312, in train
for idx, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
<code>...
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.weight_ih not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.weight_hh not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.bias_ih not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.bias_hh not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.embedding.weight not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.prediction.weight not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.prediction.bias not in model
[2024/12/09 03:43:11] ppocr INFO: load pretrain successful from /content/pretrain_models_rec/en_PP-OCRv3_rec_train/best_accuracy
[2024/12/09 03:43:11] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 100 iterations
Exception in thread Thread-1 (_thread_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
batch = self._get_data()
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
batch.reraise()
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 187, in reraise
raise self.exc_type(msg)
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
batch = fetcher.fetch(indices)
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
data.append(self.dataset[idx])
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
[Previous line repeated 7 more times]
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 161, in __getitem__
raise RecursionError("Maximum recursion depth exceeded in __getitem__")
RecursionError: Maximum recursion depth exceeded in __getitem__
Traceback (most recent call last):
File "/content/PaddleOCR/tools/train.py", line 269, in <module>
main(config, device, logger, vdl_writer, seed)
File "/content/PaddleOCR/tools/train.py", line 222, in main
program.train(
File "/content/PaddleOCR/tools/program.py", line 312, in train
for idx, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
</code>
...
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.weight_ih not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.weight_hh not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.bias_ih not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.rnn_decoder.1.cell.bias_hh not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.embedding.weight not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.prediction.weight not in model
[2024/12/09 03:43:11] ppocr WARNING: The pretrained params head.sar_head.decoder.prediction.bias not in model
[2024/12/09 03:43:11] ppocr INFO: load pretrain successful from /content/pretrain_models_rec/en_PP-OCRv3_rec_train/best_accuracy
[2024/12/09 03:43:11] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 100 iterations
Exception in thread Thread-1 (_thread_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
batch = self._get_data()
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
batch.reraise()
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 187, in reraise
raise self.exc_type(msg)
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
batch = fetcher.fetch(indices)
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
data.append(self.dataset[idx])
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 163, in __getitem__
return self.__getitem__(rnd_idx)
[Previous line repeated 7 more times]
File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 161, in __getitem__
raise RecursionError("Maximum recursion depth exceeded in __getitem__")
RecursionError: Maximum recursion depth exceeded in __getitem__
Traceback (most recent call last):
File "/content/PaddleOCR/tools/train.py", line 269, in <module>
main(config, device, logger, vdl_writer, seed)
File "/content/PaddleOCR/tools/train.py", line 222, in main
program.train(
File "/content/PaddleOCR/tools/program.py", line 312, in train
for idx, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
As for the config.yml file, I copied this one and edited it according to my dataset.
I used PPOCRLabel to annotate the images into Label.txt. I’ve trained Tesseract before but apart from that I don’t have much experience working with OCR models
I’ve tried fixing my image paths, text file paths and so on.