Introduction

This website is the host of an online textbook, Stylometric Analysis on Churchill’s Political Speeches, which will describe the procedure that our team followed in order to scrape political speeches from an online archive and the stylometric analysis that was applied in order to study Churchill’s way of speaking and the change in terms of speaking according to the parties that exist in the British Parliament.

Why did we choose this topic?

First and foremost, every member of the group was interested in politics thus it was a great opportunity for everyone to delve into a new aspect; British politics. Furthermore, the great amount of textual data that we had to work with was a challenge for everyone and allowed us to apply a certain method that can only be applied to texts. One more thing that led to this choice was the massive amount of political speeches in English. Since our team is international in many ways, the best language with which we could work was English. In addition, we thought that working with speeches in languages that could not be understood by everyone could lead to a loss of context.

Why this archive?

As with a lot of datasets we found this archive completely randomly but it met our prerequisites. First of all, it offers a clear overview of British political speeches from 1895 until 2018. Moreover, the metadata that it offers are by any means very rich since we have information about the speaker, the political party that they belong, the location of the speech etc. Lastly, it is free and available not only for research but for educational purposes as well. The only constraint with which we had a hard time was copyright issues to Churchill speeches. In the British Political Speech Archive the speeches by Churchill were ommited from the archive. Therefore, we had to find the speeches with the correct copyright exceptions to use it. We manually retrieved Churchill’s speeches from the International Churchill Society which allowed us to include them in our dataset (for more information about the copyright issues with regards to Churchill’s speeches visit The Churchill Copyright). Although, this constraint ignited our curiosity and we wanted to apply a stylometric analysis to Churchill speeches, during the first years of World War II and after the War. The time span for speeches from the first years of World War II is 1939-1940 (Ante War Churchill) and for speeches after World War II is 1946-1966 (Post War Churchill). For an overview of the dataset visit: British Political Speech Archive.

Description of the project

This project aims to determine to which political party the speeches of Winston Churchill circulated from 1939 to 1940 come closer. Once the data were defined by focusing on two different archives, the British Political Speech, and the International Churchill Society, it used the scraping technique to extract data. Then, the three levels of stylometric analysis were used to provide answers to our research question, determining where exactly in the political spectrum of the British Parliament Churchill’s Speeches fell during the relevant time.

Acknowledgments

This analysis was inspired by an excellent introduction to stylometric methods and analyses by François Dominic Laramée (Introduction to Stylometry with Python and the very useful and insightful methods for scraping that were provided by Dr. Federico Pianzola and his assistant Lampros Ntoumas.