I am fine tuning a tesseract-best model on some handwritten images. I am trying to run the following command
& "C:Program FilesTesseract-OCRlstmtraining.exe" `
>> --continue_from "C:UsersDell7420DesktopKerasOCRKerasOCRtesstraindataeng.lstm" `
>> --model_output "C:UsersDell7420DesktopKerasOCRKerasOCRtesstraindatafine_tune" `
>> --traineddata "C:Program FilesTesseract-OCRtessdataeng.traineddata" `
>> --train_listfile "C:UsersDell7420DesktopKerasOCRKerasOCRtesstraindatalist.train" `
>> --max_iterations 1000
I am getting
Loaded file eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from eng.lstm
Deserialize header failed: C:UsersDell7420DesktopKerasOCRKerasOCRtesstraindataAW.lstmf
Load of page 0 failed!
Load of images failed!!
I’ve tried turning the .box into lf from CRLF with a tab at the end of each line, but no luck.
an example of my .box file is
A 380 3766 461 3902 0
A 522 3760 623 3920 0
A 692 3752 774 3915 0
A 836 3790 892 3889 0
A 966 3790 1037 3920 0
A 1131 3784 1204 3915 0
A 1273 3803 1345 3915 0
A 1390 3779 1461 3920 0
A 1484 3797 1562 3907 0
A 1629 3803 1732 3920 0
A 1777 3784 1857 3939 0
A 1894 3803 1976 3920 0
A 2042 3803 2098 3896 0
A 2152 3784 2208 3889 0
A 2243 3790 2324 3915 0
a 377 3667 436 3723 0
a 513 3659 567 3734 0
a 621 3678 675 3723 0
a 757 3678 821 3741 0
a 912 3678 961 3734 0
a 1049 3696 1114 3728 0
a 1179 3691 1237 3747 0
a 1282 3685 1344 3752 0
a 1446 3678 1512 3741 0
a 1607 3685 1678 3752 0
a 1778 3691 1838 3760 0
a 1913 3710 1971 3760 0
a 2025 3696 2083 3771 0
a 2172 3696 2243 3760 0 ...
I built the lstmf file with the following command
tesseract "C:UsersDell7420DesktopKerasOCRKerasOCRdatatrainAW.JPG" "C:UsersDell7420DesktopKerasOCRKerasOCRtesstraindataAW" lstm.train
I am quite stuck, so any advice is appreciated.
-Please ignore the directory names, I thought I was going to be doing something else when I started.
Henry is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.