I am working on a code that converts pdf file to image file without compromising image quality, and require poppler to work with pdf2image module but apparently I am running the code on an online server that my company provides that runs jupyter notebook on my browser. So adding the path to the poppler bin file or adding thr path to my environment variable is not possible. My code successfully installs and unzips the poppler file from a github repository.
Please Instruct me on how should I modify the code to get the path to the poppler bin folder in order to use the resources and for the code to run.
#%%
import fitz
from PIL import Image
import wget
from pdf2image import convert_from_path
#%%
#this will install poppler file
zip_url = "https://github.com/oschwartz10612/poppler-windows/releases/download/v24.02.0-0/Release-24.02.0-0.zip"
zip_filename = wget.download(zip_url)
# unzips poppler file
!unzip -o {zip_filename}
#%%
# Below is the code to extract lines from a pdf file
doc = fitz.open("table.pdf")
page = doc[0]
paths = page.get_drawings() # extract existing drawings
# this is a list of "paths", which can directly be drawn again using Shape
# -------------------------------------------------------------------------
#%%
# define some output page with the same dimensions
outpdf = fitz.open()
outpage = outpdf.new_page(width=page.rect.width, height=page.rect.height)
shape = outpage.new_shape() # make a drawing canvas for the output page
#%%
# --------------------------------------
# loop through the paths and draw them
# --------------------------------------
for path in paths:
# ------------------------------------
# draw each entry of the 'items' list
# ------------------------------------
for item in path["items"]: # these are the draw commands
if item[0] == "l": # line
shape.draw_line(item[1], item[2])
elif item[0] == "re": # rectangle
shape.draw_rect(item[1])
elif item[0] == "qu": # quad
shape.draw_quad(item[1])
elif item[0] == "c": # curve
shape.draw_bezier(item[1], item[2], item[3], item[4])
else:
raise ValueError("unhandled drawing", item)
# ------------------------------------------------------
# all items are drawn, now apply the common properties
# to finish the path
# ------------------------------------------------------
shape.finish()
#%%
shape.commit()
outpdf.save("test_file_ext2.pdf")
#%%
pdf_path = 'test_file_ext.pdf'
doc = fitz.open(pdf_path)
#%%
# poppler-24.02 ->Library -> bin -> copy path and paste path below
poppler_path = r'C:UsersKIITOneDriveDesktopProject_Folderpoppler-24.02.0Librarybin'
images = convert_from_path(pdf_path, poppler_path=poppler_path)
# Save each image as a JPEG file
for image in images:
image.save('extracted_image1.jpg', 'JPEG')
Since I cant install files from the internet on my local system i used the wget command to install the poppler file on the online server i can see the folder and all other files that come with it, but apparently I have no idea on how I can get the path to the bin Folder so as to use it in my code.The code run perfectly on my personal computer since I have all the admin access and can edit my environment variables and install the folder on my laptop to get the bin path but am unable to do the same on my corporate laptop the poppler folder get stored on their online server and just shows me “Project_Folder / poppler-24.02.0 / bin” upon copying this in my code it dosen’t work.
Help me to get the path to the poppler folder and use it in the code, also suggest me on how I can automatically Insert the bin path in the code so that other people using my code will not have to manually change the path according to their system.
coderman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.