Answered! Using Matlab In linguistics, stemming is the process of reducing inflected words to their word stem, base, or root form. In this…

Using Matlab

In linguistics, stemming is the process of reducing inflected words to their word stem, base, or root form. In this assignment, you are to write a simple word stemmer for English. The input is given a string text that may have punctuations or other non-alphabetical characters. Your program should stem the words in the text and and return these words as a cell array.

Here are the steps your program should perform to derive and filter the word stems:

Convert any upper case letter to lower case.

Replace each non-alphabetical or non-space character to a space character. e.g., “My 1st NLP program!!!” should become: “my st nlp program ”

Extract the words from the string. e.g., “my st nlp program ” will result in the list: “my”, “st”, “nlp”, and “program”.

Strip the following suffixes from the words that have them: -ly, -ed, -ing, -es, -s. Each suffixes should be considered once and in that order (first strip -ly, then strip -ed, then strip -ing, etc.). e.g., the word “excitedly” turns into “excit”; the word “feeding” turns into “feed”.

Remove any word from the list that is 2 characters or less.

Remove the following common words from the list: the, and, that, have, for, not

Note that the stemming strategies used in this program are over-simplistic and may not give sensible results.

>> simplestemmer( 'Learning never exhausts the mind.' )
ans = 
  { 'learn' 'never' 'exhaust' 'mind' }

>> simplestemmer( 'Simplicity is the ultimate sophistication.' )
ans =
  { 'simplicity' 'ultimate' 'sophistication' }

Expert Answer

 copyable code

% simplestemmer.m

%function

function z1 = simplestemmer(x1)

%convert into lower case letter

x1 = lower(x1)

%delete non alphet data

for i = 1:length(x1)

if ~((x1(i) >= ‘a’ && x1(i)<=’z’) || (x1(i) >= ‘A’ && x1(i)<=’Z’))

x1(i)= ‘ ‘;

end

end

%split the particular sentences

y1 = strsplit(x1);

z1 = [];

%remove the value

for k = 1:numel(y1)

y1(1,k) = regexprep(y1(1,k), ‘(s|ing|es|ly|ed)$’, ”);

 

%certain word is deleted

if numel(regexp(y1(1,k),’that|for|the|not|and|have’){1})== 0 && length(y1(1,k){1}) > 2

z1 = [z1, y1(1,k){1},’ ‘];

end

end

end

%main.m

simplestemmer(‘Simplicity is the ultimate sophistication’)

Still stressed from student homework?
Get quality assistance from academic writers!