Background
I have developed a Python program that fetches data from three different REST APIs, processes it, and inserts it into a database. The program also queries the database to identify which values to fetch from the APIs, so there is constant communication beteen the database and the APIs.
The data processing involves significant preprocessing, cleaning, and normalisation. My codebase is quite complex, utilising object-oriented programming with multiple modules, including database management classes, data preprocessing classes, OCR for reading PDFs, XML parsers, etc. I also use Poetry for dependency management.
Problem
I need to migrate my program to the cloud. My organisation uses Azure, and I am new to cloud computing. I have been advised to use Azure Databricks, but it seems more like a Jupyter Notebook environment, which might not be suitable for my program’s architecture (see example structure below).
Questions
Is Azure Databricks suitable for this kind of task?
Are there better Azure products for deploying a complex Python program with heavy preprocessing and database interactions?
I don’t mind refactorign my code or even changing its structure to make it fit with Azure Databricks.
Any guidance on the best approach for migrating my application to Azure would be greatly appreciated.
Example program structure
project_directory/
|
├──.github/
| └── .keep
|
├── docs/
│ ├── CODE_OF_CONDUCT.md
│ ├── CONTRIBUTING.md
│ ├── db_schema.md
│ └── README.md
|
├── img/
│ └── db_mockup.jpg
|
├── src/
| ├── api
│ | ├── api_base_class.py
│ | ├── api_subclass_1.py
│ | ├── api_subclass_2.py
│ | ├── api_subclass_3.py
| | └── api_helper.py
| |
| ├── db
│ | ├── db_manager_base.py
│ | ├── db_sub_postgress.py
| | └── db_sub_mysql.py
| |
| └── preprocessing
│ | ├── preprocessor_base.py
│ | ├── preprocessor_sub_1.py
│ | ├── preprocessor_sub_1.py
| | └── normaliser.py
| |
| ├── config.py
| ├── build.py
| ├── run.py
| └── main.py
│
├── .gitattributes
├── .gitignore
├── poetry.lock
├── pyproject.toml
└── README.md
- I haven’t tried any Azure products or used cloud services.
- I have seen what Azure Databricks looks like but don’t know its full funcitonality and potential.
- There is no one in my organisation that can advise me as they don’t use Python.
- I don’t want to go through the hassle of setting everything up just to find out that it isn’t suitable for my requirements.
Kman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.