Python NLTK – Tokenize sentences

In this post we will be creating a Python script that will tokenize sentences within some text, to do this we will be using NLTK short for the natural language toolkit. This consists of a suite of libraries dedicated to natural language processing also known as NLP.

Tokenization is the process of breaking down text in to smaller units which are called tokens. Tokens can consist of sentences, words, symbols, phrases or even other elements within a given piece of text.

For the purpose of this example we will be creating a Python script that will tokenize the sentences in our text. Text will be provided via user input.

See the sample of code below for how this is achieved.

import nltk

while True:
    text = input("Enter your Text: ") 
    sentences = nltk.sent_tokenize(text)

An example of the above’s running can be seen below. Here the user input given was ‘Hello how are you? Welcome to scriptopia! We hope you enjoy this.’

['Hello how are you?', 'Welcome to scriptopia!', 'We hope you enjoy this.']

