For this assignment, you are supplied with three speeches to analyze (Click on the file link and then look for the download link at the bottom of the page. It is better to download the file instead of copying the text from the screen directly.):
Prime Minister Harper on Nov. 9, 2009 in Ottawa, marking the 20th Anniversary of the fall of the Berlin wall. (Download)
President Obama, Inaugural Address, 20th January 2009. (Download)
Senator Obama’s evening speech in front of the Tiergarten’s Victory Column in Berlin, Germany on July 24, 2008 during his first European trip as a U.S. presidential candidate. (Download)
The program you will complete only suggests the kinds of analyses that are possible, but it would be a great starting point for a more complete version. When you are finished you will have the frequency (or “number of occurrences”) of each unique word in the speech. If we were to follow the analysis described by Prof. Skillicorn we would be most interested in just the frequency his 86 “deception” words. It would be fun to continue on with the analysis, but this assignment is already long enough!
Unlike assignment 3, this assignment will not tell you what functions to write. You will need to decide how to break down your program. Your speech analysis must supply the following information:
The total number of characters, sentences, words and unique words (also as a percent) in the speech.
The longest word or words (if more than one word has the same longest length) in the speech.
The frequency of occurrence of each unique word in the speech, saved to a text file.
Optional: Ten words of over 5 letters in length that occur most often in the speech.
For example, if PM Harper’s speech is analyzed this way, here are the results you would see displayed on the console:
Harper's Speech: 7792 characters. 86 sentences. 1361 words. 527 unique words. 38.7% of the words are unique. Longest word is: responsibilities
The frequency listing for each unique word is shown in this file. You will note that the words are in alphabetical order.
And the result of the optional analysis is:
Most used words over 5 letters are: canada: 4 times democracy: 5 times government: 5 times freedom: 7 times canadian: 4 times people: 11 times system: 4 times berlin: 4 times fechter: 7 times germany: 4 times
for fname in fnames:
num_words = 0
num_chars = 0
with open(fname, ‘r’) as f:
for line in f:
words = line.strip().split()
for word in words:
if word not in dict_unique.keys():
num_words += len(words)
num_chars += len(line)
print(str(uni_word_per) + “% of the words are unique.”)
print(str(word_I_per) + “% of the words are “I”.”)
for key in dict_unique.keys():
print(“Longest word is:”,*lst)
NOTE: download all 4 files into local system. Put all 4 txt files along with the python code file in one folder and then run