I am trying to use a script to search a text file and return words that meet certain criteria:

*The word is only listed once *They are not one words in an ignore list *they are the top 10% of the longest words *they are not repeating letters *The final list would be a random ten that met the above criteria. *If any of the above were false then words reported would be null.

I've put together the following but the script dies at arsort() saying it expects an array. Can anyone suggest a change to make arsort work? Or suggest an alternative (simpler) script to find metadata?**I realize this second question may be a question better suited for another StackExchange.

<?php
  $fn="../story_link";
  $str=readfile($fn);
    function top_words($str, $limit=10, $ignore=""){
        if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by"; 
        $ignore_arr = explode(" ", $ignore);
        $str = trim($str);
        $str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
        $str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
        $str = preg_replace("#\s+#sim", " ", $str);
        $arraw = explode(" ", $str);
        foreach($arraw as $v){
            $v = trim($v);
            if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
            $arr[$v]++;
        }
        arsort($arr);   
        return array_keys( array_slice($arr, 0, $limit) );
    }
    $meta_keywords = implode(", ", top_words( strip_tags( $html_content ) ) );
?>
有帮助吗?

解决方案

The problem is when your loop never increments $arr[$v], which results in the possibility of $arr not becoming defined. This is the reason for your error because then arsort() is given null as its argument - not an array.

The solution is to define $arr as an array before the loop for instances where $arr[$v]++; isn't executed.

function top_words($str, $limit=10, $ignore=""){
    if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by"; 
    $ignore_arr = explode(" ", $ignore);
    $str = trim($str);
    $str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
    $str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
    $str = preg_replace("#\s+#sim", " ", $str);
    $arraw = explode(" ", $str);
    $arr = array(); // Defined $arr here.
    foreach($arraw as $v){
        $v = trim($v);
        if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
        $arr[$v]++;
    }
    arsort($arr);   
    return array_keys( array_slice($arr, 0, $limit) );
}

其他提示

Came across an excellent code that words well for this:

        <?php
    function extract_keywords($str, $minWordLen = 3, $minWordOccurrences = 2, $asArray = false, $maxWords = 5, $restrict = true)
    {
        $str = str_replace(array("?","!",";","(",")",":","[","]"), " ", $str);
        $str = str_replace(array("\n","\r","  "), " ", $str);
        strtolower($str);

        function keyword_count_sort($first, $sec)
        {
            return $sec[1] - $first[1];
        }
        $str = preg_replace('/[^\p{L}0-9 ]/', ' ', $str);
        $str = trim(preg_replace('/\s+/', ' ', $str));

        $words = explode(' ', $str);

        // If we don't restrict tag usage, we'll remove common words from array
        if ($restrict == false) {
        $commonWords = array('a','able','about','above', 'get a list here http://www.wordfrequency.info','you\'ve','z','zero');
        $words = array_udiff($words, $commonWords,'strcasecmp');
        }

        // Restrict Keywords based on values in the $allowedWords array
        // Use if you want to limit available tags
        if ($restrict == true) {
        $allowedWords =  array('engine','boeing','electrical','pneumatic','ice','pressurisation');
        $words = array_uintersect($words, $allowedWords,'strcasecmp');
        }

        $keywords = array();

        while(($c_word = array_shift($words)) !== null)
        {
            if(strlen($c_word) < $minWordLen) continue;

            $c_word = strtolower($c_word);
            if(array_key_exists($c_word, $keywords)) $keywords[$c_word][1]++;
            else $keywords[$c_word] = array($c_word, 1);
        }
        usort($keywords, 'keyword_count_sort');

        $final_keywords = array();
        foreach($keywords as $keyword_det)
        {
            if($keyword_det[1] < $minWordOccurrences) break;
            array_push($final_keywords, $keyword_det[0]);
        }
        $final_keywords = array_slice($final_keywords, 0, $maxWords);
        return $asArray ? $final_keywords : implode(', ', $final_keywords);
    }


    $text = "Many systems that traditionally had a reliance on the pneumatic system have been transitioned to the electrical architecture. They include engine start, API start, wing ice protection, hydraulic pumps and cabin pressurisation. The only remaining bleed system on the 787 is the anti-ice system for the engine inlets. In fact, Boeing claims that the move to electrical systems has reduced the load on engines (from pneumatic hungry systems) by up to 35 percent (not unlike today’s electrically power flight simulators that use 20% of the electricity consumed by the older hydraulically actuated flight sims).";

    echo extract_keywords($text);

    // Advanced Usage
    // $exampletext = "The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.";
    // echo extract_keywords($exampletext, 3, 1, false, 5, false);
    ?>
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top