Context:
I have a very simple project structure for a data science project, shown below. It contains a few data files (.csv
‘s and .xlsx
‘s etc.), a dir for storing Python scripts in src
, Jupyter notebooks in notebooks
and unit tests in tests
. Nothing particularly exotic or groundbreaking going on.
The Problem:
My problem is that different sections of my project are not ‘seeing’ each other. I’ve been able to have code in src
be visible to my unit tests in tests
. I even have some unit tests to check that data
is in fact an existing dir using the built-in pathlib.Path(...).is_dir()
. So far so good.
I’ve been using pathlib
when defining all paths for consistency amongst my Windows paths.
The problem starts when I try to use the scripts in src
in my notebookstest_bed.ipynb
file. Initially, I got ModuleNotFoundError
s that I solved using sys.path.append("C:my_project")
, as outlined in this solution. However, now I get FileNotFoundError: [WinError 3] The system cannot find the path specified: 'dataone'
errors, despite adding my project to the system %PATH%
, as mentioned.
Attempted Solutions:
-
I have tried manually adding my project root path to the Windows
%PYTHONPATH%
environment variable, which appeared to have no effect. -
As mentioned above, I used this solution of adding my project root dir using
sys.path.append(...)
. However, whilst it is a quick fix for theModuleNotFoundError
, it does not solve theFileNotFoundError
s and the solution does not persist when restarting Jupyter notebook. -
I have tried to change the default Jupyter notebook working directory as outline in this post as well.
I suspect there is an issue with my %PYTHONPATH%
variable here, not allowing my project to find itself, but I can’t seem to solve it.
System Info:
- OS: Windows 10
- Python version: python-3.9.5
- Using a virtual environment?: Yes.
%PYTHONPATH%
:C:my_project
Project Structure:
C:my_project
│ .gitignore
│ main.py
│ requirements.txt
│
├───data
│ ├───one
│ │ data_file_1.xlsx
│ │ data_file_2.csv
│ │ data_file_etc.csv
│ │
│ └───two
│ data_file_1.xlsx
│ data_file_2.csv
│ data_file_etc.csv
│
├───notebooks
│ test_bed.ipynb
│
├───saves
│
├───src
│ data_mgmt.py
│ __init__.py
│
└───tests
test_data_mgmt.py
__init__.py
Thanks in advanced for your help/suggestions/guidance!