Python PyPDF2 – Extract images from PDF file

In this post we will be creating a Python script that will extract all images within an existing PDF file, to do this we will be using the PyPDF2 library. The PyPDF2 is a Python library that enables users to perform various operations on PDF documents. It provides a range of functions for reading, editing, and manipulating PDF files, such as merging and splitting PDFs, extracting text and images, and encrypting and decrypting content.

See the sample of Python code below where we utilize the PyPDF2 to extract images from a PDF file, we use a for loop to iterate through each page and another for loop to iterate over images. Any images found are then saved with unique name containing the ‘count’ value.

from PyPDF2 import PdfReader

reader = PdfReader("file.pdf")
count = 0

for page_num in range(len(reader.pages)):
    page = reader.pages[page_num]
    for image_file_object in page.images:
        with open(str(count) + image_file_object.name, "wb") as fp:
            fp.write(image_file_object.data)
            count += 1

Take a look at some of our other content around the Python programming language by clicking here.

Leave a Reply