모바일 장치에서의 전체 텍스트 검색?

https://stackoverflow.com/questions/276489

07-07-2019
|

문제

우리는 곧 새로운 모바일 애플리케이션 개발에 착수 할 것입니다. 이 특정 앱은 텍스트 기반 필드를 많이 검색하는 데 사용됩니다. 모바일 플랫폼에서 이러한 유형의 검색을 허용하는 데 가장 적합한 데이터베이스 엔진에 대한 그룹의 제안은 무엇입니까?

세부 사항에는 Windows Mobile 6이 포함되며 .NET Cf를 사용하겠습니다. 또한 일부 텍스트 기반 필드는 35 자에서 500 자 사이에 있습니다. 장치는 두 가지 다른 방법 인 배치와 WiFi로 작동합니다. 물론 Wi -Fi의 경우 전체 DB 엔진에 요청을 제출하고 결과를 다시 가져올 수 있습니다. 이 질문은 "배치"버전을 중심으로 장치 플래시/이동식 스토리지 카드에 정보가 장착 된 데이터베이스가 있습니다.

어쨌든, 나는 SQLCE에 몇 가지 기본 색인이 있다는 것을 알고 있지만 모바일 플랫폼에서 사용할 수없는 풀 블로우 버전을 얻을 때까지 실제 "전체 텍스트"스타일 인덱스에 들어 가지 않습니다.

데이터가 어떻게 보이는지에 대한 예입니다.

"앞치마 목수 조절 가능한 가죽 컨테이너 포켓 허리 하드웨어 벨트" 기타 등

나는 다른 특정 옵션의 평가를받지 못했습니다. 나는 특정 길을 지적하기 위해이 그룹의 경험을 활용할 것이라고 생각합니다.

제안/팁이 있습니까?

해결책

최근에 나는 같은 문제가있었습니다. 여기에 내가 한 일은 다음과 같습니다.

나는 각 객체에 대한 ID와 텍스트 만 보유하는 클래스를 만들었습니다 (내 경우에는 SKU (항목 번호)와 설명이라고합니다). 이것은 검색에만 사용되므로 메모리를 적게 사용하는 작은 객체를 만듭니다. 경기를 찾은 후에도 여전히 데이터베이스에서 본격적인 객체를 가져갑니다.

public class SmallItem
{
    private int _sku;
    public int Sku
    {
        get { return _sku; }
        set { _sku = value; }
    }

    // Size of max description size + 1 for null terminator.
    private char[] _description = new char[36];
    public char[] Description
    {
        get { return _description; }
        set { _description = value; }
    }

    public SmallItem()
    {
    }
}

이 클래스가 만들어지면이 객체의 배열 (실제로 내 목록을 사용했습니다)을 만들고 응용 프로그램 전체에서 검색하는 데 사용할 수 있습니다. 이 목록의 초기화에는 약간의 시간이 걸리지 만 시작시 이에 대해 걱정하면됩니다. 기본적으로 데이터베이스에서 쿼리를 실행 하고이 목록을 작성하는 데 필요한 데이터를 가져옵니다.

일단 목록이 있으면 원하는 단어를 검색하여 빠르게 이동할 수 있습니다. 포함되어 있으므로 단어 내에서 단어를 찾아야합니다 (예 : 드릴은 드릴, 드릴 비트, 드릴 등을 반환합니다). 이를 위해, 우리는 집에서 재배하고 관리되지 않는 C#이 함수를 썼습니다. 문자열 배열이 필요합니다 (따라서 하나 이상의 단어를 검색 할 수 있습니다 ... 우리는 그것을 사용합니다. 이 예에서). 단어 목록을 검색하면 ID 목록을 작성 한 다음 호출 함수로 다시 전달됩니다. ID 목록이 있으면 데이터베이스에서 빠른 쿼리를 쉽게 실행하여 빠른 인덱스 ID 번호를 기반으로 본격 객체를 반환 할 수 있습니다. 반환 된 최대 결과 수를 제한한다고 언급해야합니다. 이것은 꺼낼 수 있습니다. 누군가가 "e"와 같은 것을 검색어로 입력하면 편리합니다. 그것은 많은 결과를 반환 할 것입니다.

다음은 기능이 포함 된 기능의 예입니다.

public static int[] Contains(string[] descriptionTerms, int maxResults, List<SmallItem> itemList)
{
    // Don't allow more than the maximum allowable results constant.            
    int[] matchingSkus = new int[maxResults];

    // Indexes and counters.
    int matchNumber = 0;
    int currentWord = 0;
    int totalWords = descriptionTerms.Count() - 1;  // - 1 because it will be used with 0 based array indexes

    bool matchedWord;

    try
    {   
        /* Character array of character arrays. Each array is a word we want to match.
         * We need the + 1 because totalWords had - 1 (We are setting a size/length here,
         * so it is not 0 based... we used - 1 on totalWords because it is used for 0
         * based index referencing.)
         * */
        char[][] allWordsToMatch = new char[totalWords + 1][];

        // Character array to hold the current word to match. 
        char[] wordToMatch = new char[36]; // Max allowable word size + null terminator... I just picked 36 to be consistent with max description size.

        // Loop through the original string array or words to match and create the character arrays. 
        for (currentWord = 0; currentWord <= totalWords; currentWord++)
        {
            char[] desc = new char[descriptionTerms[currentWord].Length + 1];
            Array.Copy(descriptionTerms[currentWord].ToUpper().ToCharArray(), desc, descriptionTerms[currentWord].Length);
            allWordsToMatch[currentWord] = desc;
        }

        // Offsets for description and filter(word to match) pointers.
        int descriptionOffset = 0, filterOffset = 0;

        // Loop through the list of items trying to find matching words.
        foreach (SmallItem i in itemList)
        {
            // If we have reached our maximum allowable matches, we should stop searching and just return the results.
            if (matchNumber == maxResults)
                break;

            // Loop through the "words to match" filter list.
            for (currentWord = 0; currentWord <= totalWords; currentWord++)
            {
                // Reset our match flag and current word to match.
                matchedWord = false;
                wordToMatch = allWordsToMatch[currentWord];

                // Delving into unmanaged code for SCREAMING performance ;)
                unsafe
                {
                    // Pointer to the description of the current item on the list (starting at first char).
                    fixed (char* pdesc = &i.Description[0])
                    {
                        // Pointer to the current word we are trying to match (starting at first char).
                        fixed (char* pfilter = &wordToMatch[0])
                        {
                            // Reset the description offset.
                            descriptionOffset = 0;

                            // Continue our search on the current word until we hit a null terminator for the char array.
                            while (*(pdesc + descriptionOffset) != '\0')
                            {
                                // We've matched the first character of the word we're trying to match.
                                if (*(pdesc + descriptionOffset) == *pfilter)
                                {
                                    // Reset the filter offset.
                                            filterOffset = 0;

                                    /* Keep moving the offsets together while we have consecutive character matches. Once we hit a non-match
                                     * or a null terminator, we need to jump out of this loop.
                                     * */
                                    while (*(pfilter + filterOffset) != '\0' && *(pfilter + filterOffset) == *(pdesc + descriptionOffset))
                                    {
                                        // Increase the offsets together to the next character.
                                        ++filterOffset;
                                        ++descriptionOffset;
                                    }

                                    // We hit matches all the way to the null terminator. The entire word was a match.
                                    if (*(pfilter + filterOffset) == '\0')
                                    {
                                        // If our current word matched is the last word on the match list, we have matched all words.
                                        if (currentWord == totalWords)
                                        {
                                            // Add the sku as a match.
                                            matchingSkus[matchNumber] = i.Sku.ToString();
                                            matchNumber++;

                                            /* Break out of this item description. We have matched all needed words and can move to
                                             * the next item.
                                             * */
                                            break;
                                        }

                                        /* We've matched a word, but still have more words left in our list of words to match.
                                         * Set our match flag to true, which will mean we continue continue to search for the
                                         * next word on the list.
                                         * */
                                         matchedWord = true;
                                    }
                                }

                                // No match on the current character. Move to next one.
                                descriptionOffset++;
                            }

                            /* The current word had no match, so no sense in looking for the rest of the words. Break to the
                             * next item description.
                             * */
                             if (!matchedWord)
                                break;
                        }
                    }
                }
            }
        };

        // We have our list of matching skus. We'll resize the array and pass it back.
        Array.Resize(ref matchingSkus, matchNumber);
        return matchingSkus;
    }
    catch (Exception ex)
    {
        // Handle the exception
    }
}

일치하는 SKU 목록이 있으면 배열을 반복하고 일치하는 SKU 만 리턴하는 쿼리 명령을 작성할 수 있습니다.

성능에 대한 아이디어를 위해, 우리가 찾은 내용은 다음과 같습니다 (다음 단계를 수행).

~ 171,000 항목을 검색하십시오
모든 일치하는 항목의 목록을 작성하십시오
데이터베이스를 쿼리하여 일치하는 항목 만 반환합니다
본격적인 항목 구축 (SmallItem 클래스와 유사하지만 훨씬 더 많은 필드)
풀 블로우 항목 객체로 데이터 그라이드를 채 웁니다.

모바일 장치에서는 전체 프로세스가 2-4 초가 걸립니다 (모든 품목을 검색하기 전에 경기 제한을 누르면 2를 차지합니다. 모든 품목을 스캔 해야하는 경우 4 초가 걸립니다).

또한 관리되지 않는 코드없이 String.indexof를 사용하지 않고이 작업을 시도했습니다 (그리고 시도한 String.crantains ... indexof와 동일한 성능이있었습니다). 그렇게하면 훨씬 느 렸습니다 ... 약 25 초.

또한 StreamReader와 [SKU 번호] | [설명]의 줄이 포함 된 파일을 사용해 보았습니다. 코드는 관리되지 않는 코드 예제와 유사했습니다. 이 방법은 전체 스캔에 약 15 초가 걸렸습니다. 속도는 나쁘지 않지만 좋지 않습니다. 파일 및 스트리 리더 메소드는 내가 보여준 방식보다 하나의 이점이 있습니다. 파일은 미리 생성 할 수 있습니다. 내가 당신에게 보여준 방식에는 응용 프로그램이 시작될 때 메모리와 초기 시간이 목록을로드해야합니다. 171,000 개의 항목의 경우 약 2 분이 걸립니다. 앱이 시작될 때마다 (물론 별도의 스레드에서 수행 할 수 있음) 초기 부하를 기다릴 수 있다면이 방법을 검색하는 것이 가장 빠른 방법입니다 (적어도 찾은 것).

도움이되기를 바랍니다.

추신 - 관리되지 않는 코드를 도와 준 Dolch에게 감사합니다.

다른 팁

Lucene.net을 사용해 볼 수 있습니다. 모바일 장치에 얼마나 적합한 지 잘 모르겠지만 "고성능, 완전한 수분이 풍부한 텍스트 검색 엔진 라이브러리"로 청구됩니다.

http://incubator.apache.org/lucene.net/ http://lucene.apache.org/java/docs/

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow