Question

After the output of keywords in URL, how do I check whether the keywords exist in the content of the page like the content below, if yes then return 1, else return 0. There is strfind at there, but I do not have idea why it cannot work

str = 'http://en.wikipedia.org/wiki/hostname'
Paragraph = 'hostname From wikipedia, the free encyclopedia Jump to: navigation, search In    computer networking, a hostname (archaically nodename .....'
SplitStrings = regexp(str,'[/.]','split')

for it = SplitStrings
c( it{1} ) = strfind(Paragraph, it{1} )
end

SplitStrings = {};

feature11=(cellfun(@(n) isempty(n), strfind(Paragraph, SplitStrings{1})))

enter image description here

I can do with the below code 4 checking whether 'https' exist or not. But, how to modify the 'SplitString' into 'B6'?

str = 'https://en.wikipedia.org/wiki/hostname'

A6 = regexp(str,'\w*://','match','once')
B6 = {'https'};

feature6=(cellfun(@(n) isempty(n), strfind(A6, B6{1})))
Was it helpful?

Solution

It is absolutely not clear to me what you want to do here...

I suspect it is this:

str      = 'http://en.wikipedia.org/wiki/hostname';

haystack = 'hostname From wikipedia, the free encyclopedia Jump to: navigation, search In    computer networking, a hostname (archaically nodename .....';
needles  = regexp(str,'[:/.]*','split') %// note the different search string

%// What I think you want to do
~cellfun('isempty', regexpi(haystack, needles, 'once'))

Results:

needles = 
    'http'    'en'    'wikipedia'    'org'    'wiki'    'hostname'
ans =
     0     1     1     0     1     1

but if this is not the case, please edit your question and include your desired outputs for some example inputs.

EDIT

OK, so if I understand you corretly now, you want whole words and not partial matches. You must tell this to regexp, in the following way:

%// NOTE: these  metacharacters indicate that match is to occur 
%//       at beginning AND end of word (so whole words only)
needles  = strcat('\<', regexpi(str,'[:/.]*','split'), '\>') 

%// Search for these words in the paragraph
~cellfun('isempty', regexpi(haystack, needles, 'once'))

OTHER TIPS

You can try this

f=@(str) isempty(strfind(Paragraph,str))
cellfun(f,SplitStrings)

This should get whole words. The key is parsing the variable Paragraph to get them

SplitParagraph=regexp(Paragraph,'[ ,:.()]','split');
I=ismember(SplitStrings,SplitParagraph);
SplitStrings(I)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top