While learning how to fine-tune the BERT-based-uncased LLM without using any Huggingface tools, I encountered an error message when attempting to import tokenization_test
in tmp.py
that’s in my parent directory.
My project directory is structured as follows:
.
├── .gitignore
├── .log
├── __init__.py
├── _bert
├── _ds_GLUE_MRCP
│ ├── msr_paraphrase_test.txt
│ └── msr_paraphrase_train.txt
├── mdl_bert
│ ├── bert_config.json
│ ├── bert_model.ckpt.data-00000-of-00001
│ ├── bert_model.ckpt.index
│ ├── bert_model.ckpt.meta
│ └── vocab.txt
└── tmp.py
mdl_bert
contains the pre-trained model’s checkpoint/weights, and _bert
is the submodule cloned from the official git repo.
I can import tokenization.py
inside tmp.py
:
from _bert import tokenization
However, if I import tokenization_test.py
inside tmp.py
, python responds with
Traceback (most recent call last):
File "/parent_dir/./tmp.py", line 14, in <module>
from _bert import tokenization_test
File "/parent_dir/_bert/tokenization_test.py", line 21, in <module>
import tokenization
ModuleNotFoundError: No module named 'tokenization
tokenization_test.py
is importing tokenization
as:
import tokenization
How can I fix this issue?
Python has great documentation on its import system:
https://docs.python.org/3/reference/import.html#package-relative-imports
try a relative path:
from . import tokenization
The reason might also be your hierarchy. Which might be why tokenization_test.py works with tmp.py Theres more documentation on sys.paths in the link above.