Linq و Binary Search - تحسين هذا البطيء حيث البيان؟

https://stackoverflow.com/questions/1329650

19-09-2019
|

سؤال

لدي مجموعتين، كل منها يحتوي على حوالي 40،000 عنصر.

ترتبط العناصر الموجودة في القائمة 2 بعناصر القائمة عبر مفتاح أجنبي.

لكل عنصر من عنصر واحد، أريد العثور على العنصر المقابل في قائمة اثنين.

شيء من هذا القبيل:

foreach(var item in list1)
{
  var match = list2.Where(child => child.ID == item.ChildID).FirstOrDefault();
  item.Child = match;
}

هذا يعمل ولكنه بطيء مثل الجحيم.

الآن، يتم فرز كل من القائمة 1 وقائمة 2 بواسطة هذه المفاتيح من قاعدة البيانات. لذلك يتم ترتيب List1 بواسطة ChildID و List2 مرتبة حسب المعرف (نفس القيمة).

أعتقد أن البحث الثنائي سوف يسرع بشكل كبير في هذا الأمر، لكنني قرأت في مكان ما أن LINQ ستختار الاستراتيجية الأنسب للقائمة في جملة WHERE. ربما أحتاج إلى أن يلقي صراحة إلى قائمة فرزها؟ أو ربما أحتاج إلى تنفيذ خوارزمية بحث ثنائية مخصصة ث / مقارنة؟

أي رؤى موضع تقدير.

شكرا.

المحلول

لماذا لا تستخدم انضمام؟

var query = 
   from a in list1
   join b in list2 on a.ChildID equals b.ID
   select new {Item1 = a, Item2 = b};

foreach(var item in query)
{
   item.Item1.Child = item.Item2;
}

نصائح أخرى

ماذا عن هذا:

        var joined = list1.Join(list2, x => x.ChildID, x => x.ID, (x, y) => new { x, y });

        foreach (var j in joined)
        {
            j.x.Child = j.y;
        }

كان لدي هذه المشكلة من قبل، البحث المستند إلى LINQ بطيئة للغاية مقارنة ب DB مقرها لأنه لا يستخدم أي فهرس.

هل فكرت في استخدام قاموس بدلا من القائمة؟

يمكنك تطبيق قاموس ثم بدلا من استخدام المكان الذي يمكنك استخدامه واقية وإذا كان موجودا، احصل على القيمة باستخدام فهرس الملحق.

عينة من الرموز:

Dictionary<int, Child> list2 = ...;

...

foreach(var item in list1)
{
  if (list2.ContainsKey(item.ChildID))
    item.Child = list2[item.ChildID];
}

سيكون الوصول باستخدام الفهرس أسرع بكثير من البحث في قائمة، بتكلفة الذاكرة الإضافية المطلوبة للفهرس.

نظرا لأن كلا القوائم يتم فرزها بنفس القيمة، يمكنك فقط حلقة من خلالها بالتوازي:

int index1 = 0, index2 = 0;
while (index1 < list1.Count && index2 < list2.Count) {
   while (index1 < list1.Count && list1[index1].ChildId < list2[index2].Id) index1++;
   if (index1 < list1.Count) {
      while (index2 < list2.Count && list2[index2].Id < list1[index1].ChildId) index2++;
      if (index2 < list2.Count && list1[index1].ChildId == list2[index2].Id) {
         list1[index].Child = list2[index2];
         index1++;
         index2++;
      }
   }
}

أو:

int index1 = 0, index2 = 0;
while (index1 < list1.Count && index2 < list2.Count) {
   if (list1[index1].ChildId == list2[index2].Id) {
      list1[index].Child = list2[index2];
      index1++;
      index2++;
   } else {
      if (list1[index1].ChildId < list2[index2].Id) {
         index1++;
      } else {
         index2++;
      }
   }
}

بديل فعال آخر، ولكنه لا يستفيد من ترتيب القوائم، هو إنشاء فهرس عن طريق وضع أحد القوائم في القاموس:

Dictionary<int, TypeOfChild> index = new Dictionary<int, TypeOfChild>();
foreach (TypeOfChild child in list2) {
   index.Add(child.Id, child);
}
foreach (TypeOfParent parent in list1) {
   TypeOfChild child;
   if (index.TryGetValue(parent.ChildId, out child) {
      parent.Child = child;
   }
}

لم أستطع مقاومة الإجابة على هذا :-)

السبب الرئيسي الذي يحصل عليه الشفرة بطيئا، هو أن البنود الخاصة بك سيتم قراءها عدة مرات. فن السرعة هو: قراءة الذاكرة فقط إذا كنت بحاجة إلى وإذا كنت بحاجة إلى قراءتها، اقرأها بأقل قدر ممكن.

هنا مثال:

الشفرة

public class Item
{    
   private int _id;
   private List<ItemDetails> _detailItems = new List<ItemDetails>();

   public Item(int id)<br>
   {
       _id = id;
   }

   public void AddItemDetail(ItemDetails itemDetail)
   {
       _detailItems.Add(itemDetail);
   }

   public int Id
   {
       get { return _id; }
   }
   public ReadOnlyCollection<ItemDetails> DetailItems
   {
       get { return _detailItems.AsReadOnly(); }
   }
}


public class ItemDetails
{
    private int _parentId;

    public ItemDetails(int parentId)
    {
        _parentId = parentId;
    }

    public int ParentId
    {
        get { return _parentId; }
    }
}

رمز المثال:

الهدف الرئيسي هو مسح القوائم ومقارنة العنصر والبدخ على الفهارس الحالية. عندما يكون الأبوةيد مساويا لمعرف الآباء. أضفها إلى القائمة وتذهب إلى التفاصيل التالية. إذا كان الأمر مختلفا استمر إلى الوالد التالي.

// for performance tests..
DateTime startDateTime;

// create 2 lists  (master/child list)
List<Item> itemList = new List<Item>();
List<ItemDetails> itemDetailList = new List<ItemDetails>();

Debug.WriteLine("# Adding items");
startDateTime = DateTime.Now;

// add items (sorted)
for (int i = 0; i < 400000; i++)
    itemList.Add(new Item(i));

// show how long it took
Debug.WriteLine("Total milliseconds: " + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") + "ms" );

// adding some random details (also sorted)
Debug.WriteLine("# Adding itemdetails");
Random rnd = new Random(DateTime.Now.Millisecond);

startDateTime = DateTime.Now;

int index = 0;
for (int i = 0; i < 800000; i++)
{
    // when the random number is bigger than 2, index will be increased by 1
    index += rnd.Next(5) > 2 ? 1 : 0;
    itemDetailList.Add(new ItemDetails(index));
}
Debug.WriteLine("Total milliseconds: " + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") + "ms");

// show how many items the lists contains
Debug.WriteLine("ItemList Count: " + itemList.Count());
Debug.WriteLine("ItemDetailList Count: " + itemDetailList.Count());

// matching items
Debug.WriteLine("# Matching items");
startDateTime = DateTime.Now;

int itemIndex = 0;
int itemDetailIndex = 0;

int itemMaxIndex = itemList.Count;
int itemDetailMaxIndex = itemDetailList.Count;

// while we didn't reach any end of the lists, continue...
while ((itemIndex < itemMaxIndex) && (itemDetailIndex < itemDetailMaxIndex))
{
    // if the detail.parentid matches the item.id. add it to the list.
    if (itemList[itemIndex].Id == itemDetailList[itemDetailIndex].ParentId)
    {
        itemList[itemIndex].AddItemDetail(itemDetailList[itemDetailIndex]);
        // increase the detail index.
        itemDetailIndex++;
    }
    else
        // the detail.parentid didn't matches the item.id so check the next 1
        itemIndex++;
}

Debug.WriteLine("Total milliseconds: " + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") + "ms");

النتائج

أخذت 10 مرات المزيد من العناصر لرؤية نتيجة أفضل:

إضافة عناصر:
مجموع مللي ثانية: 140ms
إضافة ItemDetails:
مجموع المللي ثانية: 203ms
عدد البند: 400000
عدد ItemDeteAlist: 800000
البنود المطابقة:
مجموع مللي ثانية: 265ms

تمت كتابته بسرعة ويمكن أن يكون نظافة. لذلك آمل أن تتمكن من قراءتها. العب به.

تكره، جيروين.

لست متأكدا مما إذا كان هذا من شأنه أن يسرعه في الواقع، ولكن يمكنك نقل المسند إلى جملة Firstordefault () والتخلص من أينما.

item.Child = list2.FirstOrDefault(child => child.ID == item.ChildID)

قد لا يساعد في الواقع لأن هذا قد يدعو ضمنا إلى أين ().

فيما يلي سؤال رغم أن الطريقة بطيئة بالفعل أم أنها بطيئة فقط في المرة الأولى التي يتم تشغيلها بعد تجميعها؟ تحقق من المناقشة في هذا المنصب.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow