In todays post we will be retrieving the contents of a Word document via a python script. To do this we will use docx which is a Python library for creating and updating Microsoft Word (.docx) files.
The script will take the location of a file which will come as user input. This will then be passed to the ‘getText’ function which will retrieve the text within the document.
The full source code for this project can be found below.
import docx def getText(filename): doc = docx.Document(filename) AllText =  for each in doc.paragraphs: AllText.append(each.text) return '\n'.join(AllText) fileLocation = input("Please Enter File Location: ") DocumentText = getText(fileLocation) print(DocumentText)