Here is a neat little PHP function that pulls out the most common words from a string and ranks them based on frequency used. A preset list of words in an external .txt file provide the “skip word” reference, meaning they are not counted as a common occurrence. This prevents the, a, and etc from showing up.
To call the function:
1 |
$variable = extractWords( $your_txt_string, $stop_words); |
The function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
function extractWords($string, $stop_words, $max_count = 5){ //max count to define how many words are returned $string = preg_replace('/\s\s+/i', '', $string); // replace whitespace $string = trim($string); // trim the string $string = preg_replace('/[^a-zA-Z.\/\ \;() - ]/', '', $string); // only take alpha characters, but keep the spaces and dashes too… remove / . $string = strtolower($string); // make it lowercase $string = strip_tags($string); preg_match_all('/\b.*?\b/i', $string, $matchWords); $matchWords = $matchWords[0]; foreach ( $matchWords as $key=>$item ) { if ( $item == '' || in_array($item, $stop_words) || strlen($item) <= 3 ) { unset($matchWords[$key]); } } $wordCountArr = array(); if ( is_array($matchWords) ) { foreach ( $matchWords as $key => $val ) { $val = strtolower($val); if ( isset($wordCountArr[$val]) ) { $wordCountArr[$val]++; } else { $wordCountArr[$val] = 1; } } } arsort($wordCountArr); $wordCountArr = array_slice($wordCountArr, 0, $max_count); return $wordCountArr; } |
The text file for stop_words.txt is just a list of words and common phrases. To include the .txt file use this line of PHP somewhere in your page:
1 |
$stop_words = file('inc/stop_words.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); |
stop_words.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
a able about above abst accordance according accordingly across act actually added adj adopted affected affecting affects ... .... (add as many as needed to suit your use) |
If you wanted to show the results in a comma separated list you could use something like the below:
1 |
echo implode(',', array_keys($variable)) |