Current Situation
I’m developing a WordPress plugin that processes PDFs uploaded through the admin panel to create a flipbook. The process involves:
-
PDF upload (via admin panel)
-
Thumbnail generation (using Imagick)
-
Text and coordinate extraction (using Smalot PdfParser)
-
Storing processed data in the
wp-content/uploads
folder -
Displaying the PDF as a flipbook on the frontend (using PDF.js)
Issues
-
The thumbnail generation process is taking too long.
-
Smalot PdfParser is not providing accurate text coordinates and dimensions.
-
After processing, it takes 20-30 seconds to load the PDF on the frontend.
-
The entire process (upload, processing, and initial display) is slow, especially for larger PDFs.
Technical Details
-
WordPress version: [Your WordPress version]
-
PHP version: [Your PHP version]
-
Maximum PDF size: 75MB
-
Maximum page count: 140
-
Libraries used:
-
Imagick (for thumbnails)
-
Smalot PdfParser (for text extraction)
-
PDF.js (for frontend rendering)
-
-
Hosting limitations:
-
No Node.js support
-
exec() is disabled for security reasons (can’t use Ghostscript)
-
Questions
-
How can I optimize this processing workflow, especially the backend operations (thumbnail generation and text extraction)?
-
Can you recommend an alternative to Smalot PdfParser for text and coordinate extraction that provides more accurate results and works within typical WordPress hosting constraints?
-
Should I consider processing these files on a separate server? If so, what would be the best approach considering I can’t use Node.js on the main server?
-
Are there any caching strategies or asynchronous processing techniques I could implement to improve performance within a WordPress environment?
-
How can I achieve performance closer to what I experienced with the Node.js implementation while working within WordPress hosting limitations?
Goal
Process PDFs quickly and accurately, ideally reducing the total processing time to under 10 seconds for a 75MB, 140-page PDF, while working within typical WordPress hosting constraints.
Any insights or suggestions on improving the overall performance and accuracy of this system, while keeping costs down, would be greatly appreciated. Thank you!
What I’ve Tried
-
Developed a working system using Node.js locally
- Result: Fast and accurate processing, but can’t be used in production due to hosting limitations.
-
Increased PHP memory limit and execution time
- Result: Can handle larger files, but didn’t significantly improve speed.
-
Implemented basic caching for processed PDFs
- Result: Slightly faster for repeat views, but initial processing is still slow.
-
Tested other PHP-based PDF processing libraries (e.g., FPDI, TCPDF)
- Result: Similar performance issues or lack of needed features.
YigitCevik is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.