I am implementing a function to convert a document to a pdf file of my Pyside6 project. I managed to do, however the program runs sequentially instead of in parallel, which means I can speed up the conversion process. For this I am using mpire
with dill
, because pickle
can’t process something in the library I am using, but dill
managed to.
So, the problem is that when I run WorkerPool
, my program is executed from the beginning (start from main.py
) instead of the process_object_of_project_pages_objects()
function I specified. I don’t need this, because otherwise the code will generate errors related to the fact that initially the project for my program will not be selected, and therefore the database will not be created/selected (sqlite3.OperationalError: no such table: Project_pages_data
).
converter.py (ignore the fact that n_jobs=1)
from mpire import WorkerPool
...
class ElementPool:
def __init__(self, value):
self.value = value
def get_value(self):
return self.value
...
class Converter:
...
def get_list_of_created_pdf_pages(self, project_pages_objects) -> list:
"""
Проход по project_pages_objects для преобразования каждой страницы в docx, а потом в pdf
"""
log.obj_l.debug_logger(
f"IN get_list_of_created_pdf_pages(project_pages_objects): project_pages_objects = {project_pages_objects}"
)
list_of_pdf_pages = list()
project_pages_objects_for_pool = list()
for object in project_pages_objects:
project_pages_objects_for_pool.append(ElementPool(object))
with WorkerPool(n_jobs=1, use_dill=True) as pool:
results = pool.map(
self.process_object_of_project_pages_objects,
project_pages_objects_for_pool,
)
list_of_pdf_pages = [result for result in results if result]
return list_of_pdf_pages
def process_object_of_project_pages_objects(self, object_for_pool) -> dict:
log.obj_l.debug_logger(
f"IN process_object_of_project_pages_objects(object_for_pool): object_for_pool = {object_for_pool}"
)
object = object_for_pool.get_value()
object_type = object.get("type")
number_page = object.get("number_page")
if object_type == "page":
pdf_path = self.create_page_pdf(object.get("page"), True)
return {"number_page": number_page, "pdf_path": pdf_path}
return dict()
main.py
import package.app as app
import os
import sys
def main():
current_directory = os.path.dirname(os.path.abspath(sys.argv[0]))
app.App(current_directory)
if __name__ == "__main__":
main()
It is expected that at any time when the project is open, the user clicks on the “Export to PDF” button, during which the methods in the file are executed converter.py
So, how can I fix this? Or are there analogues/other ways to do this in parallel to improve performance?
I’ve read this Why does pool run the entire file multiple times?
and this Multiprocessing Pool is running the whole code for every process created instead of just the function passed to it
but I haven’t figured out how it can be applied in my case.
Artemako is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.