Question

In the following two strings, the words 'rabbit' and 'tree' are matching:

str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under tree');

Suppose cmp is a variable declared to compare both. I want the result as:

cmp = 2

or something that shows that two words are matching. How do I do this?

Was it helpful?

Solution 2

"Crazy" bsxfun approach, which might be similar to intersect, but not tested -

Function -

function out = cell2_matchind(split1,split2)

c1 = char(split1)-'0';
c2 = char(split2)-'0';
if size(c1,2)<size(c2,2)
    c1 = [c1 -16.*ones(size(c1,1),size(c2,2)-size(c1,2))];
else
    c2 = [c2 -16.*ones(size(c2,1),size(c1,2)-size(c2,2))];
end
out = any(squeeze(sum(bsxfun(@eq,permute(c1,[3 2 1]),c2),2))==size(c2,2),2);

Main MATLAB script -

% Source of stopwords- http://norm.al/2009/04/14/list-of-english-stop-words/
stopwords_cellstring={'a', 'about', 'above', 'above', 'across', 'after', ...
    'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', ...
    'already', 'also','although','always','am','among', 'amongst', 'amoungst', ...
    'amount',  'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', ...
    'anywhere', 'are', 'around', 'as',  'at', 'back','be','became', 'because','become',...
    'becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below',...
    'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by',...
    'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de',...
    'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight',...
    'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', ...
    'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify',...
    'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found',...
    'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt',...
    'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', ...
    'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if',...
    'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last',...
    'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile',...
    'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must',...
    'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine',...
    'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off',...
    'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise',...
    'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please',...
    'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious',...
    'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so',...
    'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', ...
    'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them',...
    'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', ...
    'therein', 'thereupon', 'these', 'they', 'thickv', 'thin', 'third', 'this', 'those',...
    'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too',...
    'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up',...
    'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when',...
    'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein',...
    'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever',...
    'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet',...
    'you', 'your', 'yours', 'yourself', 'yourselves', 'the'};

str1 = ('rabbit is eating grass near a tree and will be sleeping inside the tree-hole');
str2 = ('rabbit is sleeping under tree and after waking up will be eating the nuts nearby');

split1 = unique(regexp(str1,'\s','Split'),'stable');
split2 = unique(regexp(str2,'\s','Split'),'stable');

cw_split2 = split2(cell2_matchind(split1,split2))
cw_split2_nostopwd = cw_split2(~cell2_matchind(stopwords_cellstring,cw_split2))
cmp = numel(cw_split2_nostopwd)

Output -

cw_split2 = 
    'rabbit'    'is'    'sleeping'    'tree'    'and'    'will'    'be'    'eating'    'the'

cw_split2_nostopwd = 
    'rabbit'    'sleeping'    'tree'    'eating'

cmp =
     4

OTHER TIPS

As per the other answer split the string into a cell array of unique words.

str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');

% split string into cell array of unique strings
split1 = regexp(str1,'\s','Split');
split2 = regexp(str2,'\s','Split');

Alternatively later versions of MATLAB (IIRC R2013a) includes a strsplit() function so the split could be reduced to

split1 = strsplit(str1);
split2 = strsplit(str2);

Then use the intersect() function to get the number of common elements between the two cell arrays. Add a length to return the integer count.

cmp = length(intersect(split1,split2));

I am assuming there is no restriction on the location or order in which they are matching. First you need to split the sentence into individual words, remove any duplicates, and then see if any words in sentence two matches ones in the first sentence.

Now if ordering does matter, it is not quite as straightforward, but your question made no indication of such constraints

str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');
split1 = unique(regexp(str1,'\s','Split'));
split2 = unique(regexp(str2,'\s','Split'));

% Storing all words in the first sentence into a map for quick search/access
dict = containers.Map();
for ii = 1:numel(split1)
   dict(split1{ii}) = true; 
end

% create temp holding cell array, then loop through, looking to see if 
% any word in the second sentence is stored in the dictionary made from
% the first sentence. 
matches = {};
for jj = 1:numel(split2)
    if dict.isKey(split2{jj})
        matches = [matches,split2{jj}]; % not best but length initially unknown
    end
end

numMatches = numel(matches) % return the number of matches

The variable matches will contain all of the words that match between the two sentences.

With ismember you just need one line.

str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )

result =

    4               %// I included also the article "a"

Be aware that for the following sentences the result is the same:

str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )

The removing of duplicates in advance, suggested by MZimmerman6 is not necessary.


If you want to filter the result for unwanted strings, you can introduce another cell array of strings with all exceptions:

str3 = {'is','a'}
unwanted = sum( ismember( intersect( strsplit(str1), strsplit(str2) ), str3 ) )

unwanted =

     2

Alltogether it could look like:

str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');
str3 = {'is','a'}

[x,y,z] = deal( strsplit(str1), strsplit(str2), str3 )
result = sum(ismember(x,y)) - sum(ismember(intersect(x,y),z))
       =       4            -            2           =        2

Use this for case insensitivity;

CMP = strcmpi(string,string)

Use this for case sensitivity;

CMP = strcmpi(string,string)

if CMP is 1 they are same if 0 they are not.

If you dont want to whitespaces, which makes better comparison please first trim them and compare.

For trimming;

newString = strtrim(str)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top