Python gTTS – PDF text to speech

Today we will be making use of the gTTS Python library which is a tool to interface with Google Translate’s text-to-speech API. The script will retrieve text from a PDF file and convert it in to speech audio. The audio will be saved as an MP3 file.

We begin by retriving the text from our PDF file to do this we are using the PyPDF2 library.

from PyPDF2 import PdfReader

reader = PdfReader(r"File.pdf")
Text = ""
for page in reader.pages:
     Text += page.extract_text() + "\n"

We then pass the text over to the gTTS module for conversion.

from gtts import gTTS

tts = gTTS(Text, lang = 'en')
tts.save('output.mp3')

The full source code can be found below.

from PyPDF2 import PdfReader
from gtts import gTTS

reader = PdfReader(r"File.pdf")
Text = ""
for page in reader.pages:
       Text += page.extract_text() + "\n"

tts = gTTS(Text, lang = 'en')
tts.save('output.mp3')

Leave a Reply