In this post we will be creating a script that will download the captions of a YouTube video. We will be using pytube which is a lightweight, Pythonic, dependency-free, library for downloading YouTube Videos. The library is very easy to use and quite intuitive.
We start by taking input which will be the URL of the YouTube video we are interested in, using this URL we then create a YouTube object. For this project we are only interested in the captions if they are available in English, this excludes any captions that may be in English but may be auto generated. Ive found that auto generated captions don’t always make sense. The captions are received in XML and I then use Beautiful soup to convert this.
The full source code for this project can be found below:
from pytube import YouTube from bs4 import BeautifulSoup while 1 == 1: URL = input("Please Enter YouTube URL: ") yt = YouTube(URL) try: VideoCaptions = yt.captions.get_by_language_code('en') CaptionXML =(VideoCaptions.xml_captions) except: print("Captions not available") exit() soup = BeautifulSoup(CaptionXML, features = "lxml") Content = (soup.get_text()) print(Content)