Making an index for a search by PHP
Question
How can you search only unique words with PHP such that I can learn the basics in making the search?
I have had a few problems in making a multidimensional array for questions.
My first unsuccessful attempt is the following.
#1
$result = pg_query_params ( $dbconn,
"SELECT question_id, body
FROM questions",
array ()
);
while ( $row = pg_fetch_array ( $result ) ) {
$question_body [ $row['question_id'] ] ['body'] = $row['body'];
$question_index = explode ( " ", $question_body[ $row['question_id'] ] ['body'] );
$question_index = array_unique ( $question_index );
}
var_dump( $question_index );
The problem with this code is that it combines the words in each question.
It seems that I cannot use explode
, since it seems to make only a single dimensional array.
I also run the following code trying to get question_id unsuccessfully.
#2
while ( $row = pg_fetch_array ( $result ) ) {
$question_body [ $row['question_id'] ] ['body'] = $row['body'];
$question_index[ $row['question_id'] ] = explode ( " ", $question_body[ $row['question_id'] ] ['body'] );
$question_index[ $row['question_id'] ]= array_unique ( $question_index );
}
var_dump( $question_index );
Solution
Do yourself a favor and take a look at Zend_Search_Lucene.
OTHER TIPS
Note that str_replace
can accept an array of values as the needle. For example,
$body = str_replace(array(',', '.'), '', $body);
replaces any instance of an element in the array with an empty string.
Alternatively, I'd suggest looking at some pre-built search and indexing libraries. It's a very hard area to get right from scratch and you're much more likely to get reliable results using an already-polished indexing and searching algorithm.
It would be a good idea to first extract all the textual (non-markup) content from the pages by using a DOM parser. See this:
I finally decided to use Postgres commands such as LIKE
for search.
It has been much easier for me to make the data manipulation at DB than at PHP.