I am trying to find an updated way to convert docx and pptx files to PDF in Lambda in a long-term, sustainable manner. I am creating a data ingestion pipeline and Textract cannot work with docx or pptx.
I can do this with LibreOffice in python 3.8 using the libreoffice-brotl lambda layer(https://github.com/kuharan/Lambda-Layers?source=post_page—–cb111be0d685——————————–) but I cannot find a way to upgrade this to python 3.9, as AWS will stop supporting 3.8 in October. I don’t know if this will be updated at all.
The other option I have seen is to do this with the aws-lambda-libreoffice docker image but there is the same issue where the repo seems not to be maintained and the support is up to nodejs 18 (https://github.com/shelfio/aws-lambda-libreoffice?tab=readme-ov-file).
Does anyone have any workarounds?