Python docx – Retrieving text from a Word document

In todays post we will be retrieving the contents of a Word document via a python script. To do this we will use docx which is a Python library for creating and updating Microsoft Word (.docx) files.

The script will take the location of a file which will come as user input. This will then be passed to the ‘getText’ function which will retrieve the text within the document.

The full source code for this project can be found below.

import docx

def getText(filename):
    doc = docx.Document(filename)
    AllText = []
    for each in doc.paragraphs:
        AllText.append(each.text)
    return '\n'.join(AllText)

fileLocation = input("Please Enter File Location: ")
DocumentText = getText(fileLocation)
print(DocumentText)

Leave a Reply