Pergunta

<?php



$filename = "largefile.txt";



/* get content of $filename in $content */

$content = strtolower(file_get_contents($filename));



/* split $content into array of substrings of $content i.e wordwise */

$wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);



/* "stop words", filter them */

$filteredArray = array_filter($wordArray, function($x){

return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);

});



/* get associative array of values from $filteredArray as keys and their frequency count as value */

$wordFrequencyArray = array_count_values($filteredArray);



/* Sort array from higher to lower, keeping keys */

arsort($wordFrequencyArray);

This is my code i have implemented to find out the frequency of distinct words in a file. This is working.

Now what i want to do is, Let suppose there be 10 text files.I want to count the word frequency of a word in all the 10 files i.e if i want to find frequency of word "stack" in all the 10 files that is how many times the word stack appears in all the files.And then would do it for all the distinct words.

I have done it for a single file but cannot thnk of how to extend it to multiple files. THanks for help and sorry for my bad english

Foi útil?

Solução

Put what you've got into a function & call it for each filename in an array using a foreach loop:

<?php

$wordFrequencyArray = array();

function countWords($file) use($wordFrequencyArray) {
    /* get content of $filename in $content */
    $content = strtolower(file_get_contents($filename));

    /* split $content into array of substrings of $content i.e wordwise */
    $wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);

    /* "stop words", filter them */
    $filteredArray = array_filter($wordArray, function($x){
        return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);
    });

    /* get associative array of values from $filteredArray as keys and their frequency count as value */
    foreach (array_count_values($filteredArray) as $word => $count) {
        if (!isset($wordFrequencyArray[$word])) $wordFrequencyArray[$word] = 0;
        $wordFrequencyArray[$word] += $count;
    }
}
$filenames = array('file1.txt', 'file2.txt', 'file3.txt', 'file4.txt' ...);
foreach ($filenames as $file) {
    countWords($file);
}

print_r($wordFrequencyArray);
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top