Domanda

I wrote this piece of Linq to handle doing a CROSS Join just like a database would between multiple lists.

But for some reason it's extremely slow when any of the lists go more than 3000. I'd wait for 30s ? These lists could go to very large numbers.

This query is looped for each relationship with the other list's data coming from ColumnDataIndex.

Any Advice ?

UPDATE ** - The data is inserted into normal lists that are built before hand from the configured sources. This is all in memory at the moment.

RunningResult[parameter.Uid] = (from source_row in RunningResult[parameter.Uid]
                            from target_row in ColumnDataIndex[dest_key]
                            where GetColumnFromUID(source_row, rel.SourceColumn) == GetColumnFromUID(target_row, rel.TargetColumn)
                            select new Row()
                            {
                                Columns = MergeColumns(source_row.Columns, target_row.Columns)

                            }).ToList();

The 2 extra functions:

MergeColumns: Takes the Columns from the 2 items and merges them into a single array.

public static Columnn[] MergeColumns(Column[] source_columns, Column[] target_columns)
{
      Provider.Data.BucketColumn[] new_column = new Provider.Data.BucketColumn[source_columns.Length + target_columns.Length];
      source_columns.CopyTo(new_column, 0);
      target_columns.CopyTo(new_column, source_columns.Length);
      return new_column;
  }

GetColumnFromUID: Returns the Value of the Column in the Item matching the column uid given.

private static String GetColumnFromUID(Row row, String column_uid)
  {
       if (row != null)
       {
           var dest_col = row.Columns.FirstOrDefault(col => col.ColumnUid == column_uid);
           return dest_col == null ? "" + row.RowId : dest_col.Value.ToString().ToLower();
       }
       else return String.Empty;

  }

Update:

Ended up moving the data and the query to a database. This reduced to the speed to a number of ms. Could have written a optimized looped function but this was the fastest way out for me.

È stato utile?

Soluzione

You don't actually need to be performing a cross join. Cross joins are inherently expensive operations. You shouldn't be doing that unless you really need it. In your case what you really need is just an inner join. You're performing a cross join which is resulting in lots of values that you don't need at all, and then you're filtering out a huge percentage of those values to leave you with the few that you need. If you just did an inner join from the start you would only compute the values that you need. That will save you from needing to create a whole lot of rows you don't need just to have them be thrown away.

LINQ has its own inner join operation, Join, so you don't even need to write your own:

RunningResult[parameter.Uid] = (from source_row in RunningResult[parameter.Uid]
                                join target_row in ColumnDataIndex[dest_key]
                                on GetColumnFromUID(source_row, rel.SourceColumn) equals
                                    GetColumnFromUID(target_row, rel.TargetColumn)
                                select new Row()
                                {
                                    Columns = MergeColumns(source_row.Columns, target_row.Columns)

                                }).ToList();

Altri suggerimenti

You're not doing a cross join, but an inner join with an ON clause, only in your case, the ON clause in the where predicate.

An inner join is typically done with two hash sets/tables, so you can quickly find the row in set X based on the value in row Y.

So 'weston's answer is OK, yet you need to use dictionaries/hashtables to make it really fast. Be aware that it might be there are more rows per key. You can use a multi-value hashtable/dictionary like this one for that: https://github.com/SolutionsDesign/Algorithmia/blob/master/SD.Tools.Algorithmia/GeneralDataStructures/MultiValueDictionary.cs

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top