Pythonの二分探索（二分法）

https://stackoverflow.com/questions/212358

03-07-2019
|

質問

リスト/タプルに対して二分検索を実行し、見つかった場合は項目の位置を返し、見つからなかった場合は「False」(-1、Noneなど)を返すライブラリ関数はありますか?

関数 bisect_left/right を見つけました。二等分モジュール, 、ただし、項目がリストにない場合でも位置を返します。意図した用途にはまったく問題ありませんが、項目がリストにあるかどうかを知りたいだけです (何も挿入したくありません)。

使おうと思った bisect_left 次に、その位置にある項目が検索しているものと等しいかどうかをチェックしますが、それは面倒に思えます（また、数値がリスト内の最大の数値より大きくなる可能性があるかどうかの境界チェックも行う必要があります）。もっと良い方法があれば知りたいです。

編集これが何のために必要なのかを明確にするには、次のようにします。これには辞書が非常に適していることは承知していますが、メモリ消費をできるだけ低く抑えるように努めています。私の意図した使用法は、一種の双方向ルックアップテーブルです。テーブルに値のリストがあり、インデックスに基づいて値にアクセスできる必要があります。また、特定の値のインデックス、または値がリストにない場合は None を見つけられるようにしたいと考えています。

これには辞書を使用するのが最も早い方法ですが、メモリ要件が (約) 2 倍になります。

Python ライブラリで何かを見落としているのではないかと思い、この質問をしました。Moe さんが提案したように、自分でコードを書く必要があるようです。

解決

from bisect import bisect_left

def binary_search(a, x, lo=0, hi=None):  # can't use a to specify default for hi
    hi = hi if hi is not None else len(a)  # hi defaults to len(a)   
    pos = bisect_left(a, x, lo, hi)  # find insertion position
    return (pos if pos != hi and a[pos] == x else -1)  # don't walk off the end

他のヒント

bisect_left / rightのコードを見て、目的に合わせて調整しないのはなぜですか。

このように：

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1

これは少し話題から外れていますが（Moeの答えはOPの質問に対して完全なようです）、手順全体の複雑さを最後から最後まで見る価値があります。ソートされたリスト（バイナリ検索が役立つ場所）に物を保存し、存在を確認するだけの場合（指定しない限り最悪の場合）：

並べ替え済みリスト

O（n log n）最初にリストを作成します（ソートされていないデータの場合。O（n）、ソートされている場合）
O（log n）ルックアップ（これはバイナリ検索部分です）
O（n）挿入/削除（パターンに応じてO（1）またはO（log n）の平均ケースになる場合があります）

set（） では、あなたが被っています

作成するO（n）
O（1）ルックアップ
O（1）挿入/削除

ソートされたリストで本当に得られるのは、「次へ」、「前へ」、「範囲」です。（範囲の挿入または削除を含む）、開始インデックスが指定されたO（1）またはO（| range |）です。これらの種類の操作を頻繁に使用しない場合は、セットとして保存し、表示用に並べ替えることをお勧めします。 set（） は、Pythonで追加のオーバーヘッドをほとんど発生させません。。

bisectのドキュメントが検索の例を提供していることに言及する価値があるかもしれません： http://docs.python.org/library/bisect.html#searching -sorted-lists

（-1またはNoneを返す代わりにValueErrorを上げることは、より多くのpythonic＆＃8211; list.index（）がそれを行います。しかし、もちろんあなたはあなたのニーズにサンプルを適応させることができます。）

最も簡単なのは、 bisect を使用し、1つの位置をチェックして、アイテムがそこ：

def binary_search(a,x,lo=0,hi=-1):
    i = bisect(a,x,lo,hi)
    if i == 0:
        return -1
    elif a[i-1] == x:
        return i-1
    else:
        return -1

これはマニュアルからの抜粋です：

http://docs.python.org/2/library/bisect.html

8.5.1。ソート済みリストの検索

上記のbisect（）関数は挿入ポイントを見つけるのに役立ちますが、一般的な検索タスクに使用するには扱いにくいか扱いにくい場合があります。次の5つの関数は、それらをソート済みリストの標準ルックアップに変換する方法を示しています。

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError

そのため、コードをわずかに変更する必要があります：

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    return -1

bisectモジュールを使用する @DaveAbrahamsの回答が正しいアプローチであることに同意します。彼は彼の答えで重要な詳細に言及しなかった。

ドキュメント bisectから.bisect_left（a、x、lo = 0、hi = len（a））

バイセクションモジュールでは、事前に検索配列を事前に計算する必要はありません。デフォルトの 0 および len（a）を使用する代わりに、 bisect.bisect_left にエンドポイントを提示することができます。

さらに重要なのは、特定の関数のエラーが最小化されるような値Xを探すことです。そのためには、代わりにbisect_leftのアルゴリズムが計算を呼び出す方法が必要でした。これは本当に簡単です。

__ getitem __ を a

として定義するオブジェクトを提供するだけです

たとえば、bisectアルゴリズムを使用して、任意の精度の平方根を見つけることができます！

import bisect

class sqrt_array(object):
    def __init__(self, digits):
        self.precision = float(10**(digits))
    def __getitem__(self, key):
        return (key/self.precision)**2.0

sa = sqrt_array(4)

# "search" in the range of 0 to 10 with a "precision" of 0.0001
index = bisect.bisect_left(sa, 7, 0, 10*10**4)
print 7**0.5
print index/(10**4.0)

存在するかどうかだけを見たい場合は、リストを辞書に変えてみてください：

# Generate a list
l = [n*n for n in range(1000)]

# Convert to dict - doesn't matter what you map values to
d = dict((x, 1) for x in l)

count = 0
for n in range(1000000):
    # Compare with "if n in l"
    if n in d:
        count += 1

私のマシンでは、「if n in l」 37秒かかりましたが、「if n in d」は0.4秒かかりました。

これは：

非再帰的（ほとんどの再帰的アプローチよりもメモリ効率がよい）
実際には作業中
不要な if および条件なしで実行されるため、高速
数学的アサーションに基づいて （low + high）/ 2 のフロアは常に high low は下限で、高は上限です。
テスト済み：D

def binsearch(t, key, low = 0, high = len(t) - 1):
    # bisecting the range
    while low < high:
        mid = (low + high)//2
        if t[mid] < key:
            low = mid + 1
        else:
            high = mid
    # at this point 'low' should point at the place
    # where the value of 'key' is possibly stored.
    return low if t[low] == key else -1

Dave Abrahamsのソリューションは良いです。私はそれを最小限にしたかもしれませんが：

def binary_search(L, x):
    i = bisect.bisect_left(L, x)
    if i == len(L) or L[i] != x:
        return -1
    return i

Pythonには明示的なバイナリ検索アルゴリズムはありませんが、バイナリ検索を使用してソートされたリスト内の要素の挿入ポイントを見つけるように設計されたモジュール bisect があります。これは「トリック」できます。バイナリ検索を実行します。これの最大の利点は、ほとんどのライブラリコードと同じ利点です-高性能で、十分にテストされ、動作するだけです（特にバイナリ検索は実装が非常に難しい-特にエッジケースが慎重に検討されていない場合）。

基本タイプ

文字列やintなどの基本型の場合、非常に簡単です。必要なのは、 bisect モジュールと並べ替えられたリストです：

>>> import bisect
>>> names = ['bender', 'fry', 'leela', 'nibbler', 'zoidberg']
>>> bisect.bisect_left(names, 'fry')
1
>>> keyword = 'fry'
>>> x = bisect.bisect_left(names, keyword)
>>> names[x] == keyword
True
>>> keyword = 'arnie'
>>> x = bisect.bisect_left(names, keyword)
>>> names[x] == keyword
False

これを使用して重複を見つけることもできます：

...
>>> names = ['bender', 'fry', 'fry', 'fry', 'leela', 'nibbler', 'zoidberg']
>>> keyword = 'fry'
>>> leftIndex = bisect.bisect_left(names, keyword)
>>> rightIndex = bisect.bisect_right(names, keyword)
>>> names[leftIndex:rightIndex]
['fry', 'fry', 'fry']

明らかに、必要に応じてそのインデックスの値ではなく、インデックスを返すことができます。

オブジェクト

カスタムタイプまたはオブジェクトの場合、少し注意が必要です：正確に比較するためにbisectを取得するには、豊富な比較メソッドを実装する必要があります。

>>> import bisect
>>> class Tag(object):  # a simple wrapper around strings
...     def __init__(self, tag):
...         self.tag = tag
...     def __lt__(self, other):
...         return self.tag < other.tag
...     def __gt__(self, other):
...         return self.tag > other.tag
...
>>> tags = [Tag('bender'), Tag('fry'), Tag('leela'), Tag('nibbler'), Tag('zoidbe
rg')]
>>> key = Tag('fry')
>>> leftIndex = bisect.bisect_left(tags, key)
>>> rightIndex = bisect.bisect_right(tags, key)
>>> print([tag.tag for tag in tags[leftIndex:rightIndex]])
['fry']

これは少なくともPython 2.7で動作するはずです-＆gt; 3.3

値は実際のオブジェクトへのポインタにすぎないため、格納しているオブジェクトが本当に小さい場合を除き、dictを使用してもメモリ使用量が2倍になることはありません。

>>> a = 'foo'
>>> b = [a]
>>> c = [a]
>>> b[0] is c[0]
True

この例では、「foo」は1回だけ保存されます。それはあなたに違いをもたらしますか？そして、とにかく正確にいくつのアイテムについて話しているのですか？

このコードは、整数リストを再帰的に使用します。最も単純なケースシナリオを探します。リストの長さが2未満です。これは、回答が既に存在し、正しい回答を確認するためのテストが実行されることを意味します。そうでない場合は、中間値が設定され、正しいとテストされます。そうでない場合は、関数を再度呼び出して二分しますが、中間値を左または右にシフトして上限または下限として設定します

< p>

def binary_search(intList, intValue, lowValue, highValue):
    if(highValue - lowValue) < 2:
        return intList[lowValue] == intValue or intList[highValue] == intValue
    middleValue = lowValue + ((highValue - lowValue)/2)
    if intList[middleValue] == intValue:
        return True
    if intList[middleValue] > intValue:
        return binary_search(intList, intValue, lowValue, middleValue - 1)
   return binary_search(intList, intValue, middleValue + 1, highValue)

ウィキペディアの例をご覧ください http://en.wikipedia.org/wiki/Binary_search_algorithm

def binary_search(a, key, imin=0, imax=None):
    if imax is None:
        # if max amount not set, get the total
        imax = len(a) - 1

    while imin <= imax:
        # calculate the midpoint
        mid = (imin + imax)//2
        midval = a[mid]

        # determine which subarray to search
        if midval < key:
            # change min index to search upper subarray
            imin = mid + 1
        elif midval > key:
            # change max index to search lower subarray
            imax = mid - 1
        else:
            # return index number 
            return mid
    raise ValueError

'''
Only used if set your position as global
'''
position #set global 

def bst(array,taget): # just pass the array and target
        global position
        low = 0
        high = len(array)
    while low <= high:
        mid = (lo+hi)//2
        if a[mid] == target:
            position = mid
            return -1
        elif a[mid] < target: 
            high = mid+1
        else:
            low = mid-1
    return -1

これははるかに優れていて効果的だと思います。私を修正してください:)。ありがとう

s はリストです。
binary（s、0、len（s）-1、find）は最初の呼び出しです。

関数は、クエリされたアイテムのインデックスを返します。そのようなアイテムがない場合、 -1 を返します。

def binary(s,p,q,find):
    if find==s[(p+q)/2]:
        return (p+q)/2
    elif p==q-1 or p==q:
        if find==s[q]:
            return q
        else:
            return -1
    elif find < s[(p+q)/2]:
        return binary(s,p,(p+q)/2,find)
    elif find > s[(p+q)/2]:
        return binary(s,(p+q)/2+1,q,find)

def binary_search_length_of_a_list(single_method_list):
    index = 0
    first = 0
    last = 1

    while True:
        mid = ((first + last) // 2)
        if not single_method_list.get(index):
            break
        index = mid + 1
        first = index
        last = index + 1
    return mid

バイナリ検索：

// List - values inside list
// searchItem - Item to search
// size - Size of list
// upperBound - higher index of list
// lowerBound - lower index of list
def binarySearch(list, searchItem, size, upperBound, lowerBound):
        print(list)
        print(upperBound)
        print(lowerBound)
        mid = ((upperBound + lowerBound)) // 2
        print(mid)
        if int(list[int(mid)]) == value:
               return "value exist"
        elif int(list[int(mid)]) < value:
             return searchItem(list, value, size, upperBound, mid + 1)
        elif int(list[int(mid)]) > value:
               return searchItem(list, value, size, mid - 1, lowerBound)

//上記の関数を呼び出すには：

list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
searchItem = 1        
print(searchItem(list[0], item, len(list[0]) -1, len(list[0]) - 1, 0))

Pythonのバイナリ検索とDjangoモデルのジェネリック検索が必要でした。 Djangoモデルでは、1つのモデルが別のモデルへの外部キーを持つことができるため、取得したモデルオブジェクトで検索を実行したかったのです。これを使用できる次の関数を書きました。

def binary_search(values, key, lo=0, hi=None, length=None, cmp=None):
    """
    This is a binary search function which search for given key in values.
    This is very generic since values and key can be of different type.
    If they are of different type then caller must specify `cmp` function to
    perform a comparison between key and values' item.
    :param values:  List of items in which key has to be search
    :param key: search key
    :param lo: start index to begin search
    :param hi: end index where search will be performed
    :param length: length of values
    :param cmp: a comparator function which can be used to compare key and values
    :return: -1 if key is not found else index
    """
    assert type(values[0]) == type(key) or cmp, "can't be compared"
    assert not (hi and length), "`hi`, `length` both can't be specified at the same time"

    lo = lo
    if not lo:
        lo = 0
    if hi:
        hi = hi
    elif length:
        hi = length - 1
    else:
        hi = len(values) - 1

    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if not cmp:
            if values[mid] == key:
                return mid
            if values[mid] < key:
                lo = mid + 1
            else:
                hi = mid - 1
        else:
            val = cmp(values[mid], key)
            # 0 -> a == b
            # > 0 -> a > b
            # < 0 -> a < b
            if val == 0:
                return mid
            if val < 0:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1

上記には多くの優れた解決策がありますが、二分探索を行うための単純な (KISS はシンプルにしておきます (なぜなら私は) Python の組み込み/汎用 bisect 関数を使用する愚かな使用方法をまだ見たことがありません。bisect 関数に関する少しのコードを使用して、名前の小さな文字列配列のすべてのケースをテストした例を以下に示します。上記の解決策の一部はこれをほのめかしたり言ったりしていますが、以下の簡単なコードが私と同じように混乱している人を助けることを願っています。

Python bisect は、ソートされたリストのどこに新しい値/検索項目を挿入するかを示すために使用されます。以下のコードは、リスト/配列内の検索項目が見つかった場合にヒットのインデックスを返す bisect_left を使用しています (bisect と bisect_right は、挿入ポイントとしてヒットまたは一致した後の要素のインデックスを返します) 見つからない場合は、、 bisect_left は、== 検索値ではない、ソートされたリスト内の次の項目へのインデックスを返します。他の唯一のケースは、検索項目がリストの末尾にあり、返されるインデックスがリスト/配列の末尾を超える場合であり、以下のコードでは、「and」ロジックハンドルを使用して Python による早期終了が行われます。(最初の条件が False の場合、Python は後続の条件をチェックしません)

#Code
from bisect import bisect_left
names=["Adam","Donny","Jalan","Zach","Zayed"]
search=""
lenNames = len(names)
while search !="none":
    search =input("Enter name to search for or 'none' to terminate program:")
    if search == "none":
        break
    i = bisect_left(names,search)
    print(i) # show index returned by Python bisect_left
    if i < (lenNames) and names[i] == search:
        print(names[i],"found") #return True - if function
    else:
        print(search,"not found") #return False – if function
##Exhaustive test cases:
##Enter name to search for or 'none' to terminate program:Zayed
##4
##Zayed found
##Enter name to search for or 'none' to terminate program:Zach
##3
##Zach found
##Enter name to search for or 'none' to terminate program:Jalan
##2
##Jalan found
##Enter name to search for or 'none' to terminate program:Donny
##1
##Donny found
##Enter name to search for or 'none' to terminate program:Adam
##0
##Adam found
##Enter name to search for or 'none' to terminate program:Abie
##0
##Abie not found
##Enter name to search for or 'none' to terminate program:Carla
##1
##Carla not found
##Enter name to search for or 'none' to terminate program:Ed
##2
##Ed not found
##Enter name to search for or 'none' to terminate program:Roger
##3
##Roger not found
##Enter name to search for or 'none' to terminate program:Zap
##4
##Zap not found
##Enter name to search for or 'none' to terminate program:Zyss
##5
##Zyss not found

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow