Pregunta

Given a decently long text, I need to find how many times a certain word appears into it. Like the sherlock novels, if I type in Sherlock, to give me 200 times or something similar.

So far I know how to read a list with this function I implemented, posted below. I appreciate all the help, don't know what to do next or how.

read_list(L) :- read(N), N \= end_of_file -> L = [N|Ns], !, read_list(Ns) ; L = [] .

Thank you.

¿Fue útil?

Solución

read/1 fetch a term followed by . but for sake of discussion let's ignore this fact.

If you are just interested in word frequency, why building a list? just count the words and the matches, and at end of file compute the frequency:

word_freq(W, Freq) :-
  word_count(W, 0, Total, 0, Match),
  Total > 0 -> Freq is Match / Total.

word_count(W, TotSoFar, Tot, MatchSoFar, Match) :-
  (  read(N),
     N \= end_of_file
  -> T1 is TotSoFar + 1,
     (  N == W
     -> M1 is MatchSoFar+1
     ;  M1 is MatchSoFar
     ),
     word_count(W, T1, Tot, M1, Match)
   ; TotSoFar = Tot,
     MatchSoFar = Match
   ).

test:

?- word_freq(a,F).
|: a.
|: b.
|: c.
|: a.
|: F = 0.5.

edit Instead of read/1, let's define a read_word(W), where a word is simply a sequence of alphanumerics

read_word(SoFar, W) :-
    get_code(C),
    (   C == -1
    ->  ( SoFar == [] -> W = end_of_file ; reverse(SoFar, W) )
    ;   code_type(C, alnum)
    ->  read_word([C|SoFar], W)
    ;   reverse(SoFar, W)
    ).

equipped with such ugly code, and replaced read/1 with read_word/1, we get

?- word_freq("ab",F).
|: a ab abc
|: F = 0.3333333333333333.

Note that now I'm passing a string, not an atom.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top