Pergunta

I would like to have each word in a string cross-referenced in a file.

So, if I was given the string: Jumping jacks wake me up in the morning.

  1. I use some regex to strip out the period. Also, the entire string is made lowercase.
  2. I then go on to have the words separated into an array by using PHP's nifty explode() function.
  3. Now, what I'm left with, is an array with the words used in the string.

From there I need to look up each value in the array and get a value for it and add it to a running sum. for() loop it is. Okay, this is where I get stuck...

The list ($wordlist) is structured like so:

wake#4 waking#3 0.125

morning#2 -0.125

There are \ts in between the word and the number. There can be more than one word per value.

What I need the PHP to do now is look up the number to each word in the array then pull that corresponding number back to add it to a running sum. What's the best way for me to go about this?

The answer should be easy enough, just finding the location of the string in the wordlist and then finding the tab and from there reading the int... I just need some guidance.

Thanks in advance.

EDIT: to clarify -- I don't want the sum of the values of the wordlist, rather, I'd like to look up my individual values as they correspond to the words in the sentence and THEN look them up in the list and add just those values; not all of them.

Foi útil?

Solução

Edited answer based on your comment and question edit. The running sum is stored in an array called $sum where the key value of the "word" will store the value of its running sum. e.g $sum['wake'] will store the running sum for the word wake and so on.

$sum = array();
foreach($wordlist as $word) //Loop through each word in wordlist
{
    // Getting the value for the word by matching pattern.
    //The number value for each word is stored in an array $word_values, where the key is the word and value is the value for that word.
    // The word is got by matching upto '#'. The first parenthesis matches the word - (\w+)
    //The word is followed by #, single digit(\d), multiple spaces(\s+), then the number value(\S+ matches the rest of the non-space characters)
    //The second parenthesis matches the number value for the word

    preg_match('/(\w+)#\d\s+(\S+)/', $word, $match);  
    $word_ref = $match[1];
    $word_ref_number = $match[2];
    $word_values["$word_ref"] = $word_ref_number;

}

//Assuming $sentence_array to store the array of words used in your string example {"Jumping", "jacks", "wake", "me", "up", "in", "the", "morning"}

foreach ($sentence_array as $word)
{
    if (!array_key_exists("$word", $sum)) $sum["$word"] = 0;
    $sum["$word"] += $word_values["$word"]; 
}

Am assuming you would take care of case sensitivities, since you mentioned that you make the entire string lowercase, so am not including that here.

Outras dicas

$sentence = 'Jumping jacks wake me up in the morning';

$words=array();

foreach( explode(' ',$sentence) as $w ){

  if( !array_key_exists($w,$words) ){

   $words[$w]++;

  } else {
    $words[$w]=1;
  }

}

explodeby space, check if that word is in the words array as key; if so increment it's count(val); if not, set it's val as 1. Loop this for each of your sentences without redeclaring the $words=array()

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top