Question

I'm trying to LINQ two tables based on a dynamic key. User can change key via a combo box. Key may be money, string, double, int, etc. Currently I'm getting the data just fine, but without filtering out the doubles. I can filter the double in VB, but it's slooooow. I'd like to do it in the LINQ query right out of the gate.

Here's the data:

First Table:

 -------------------------------------------------------------
| AppleIndex  | AppleCost  | AppleColor  | AppleDescription   |
 ------------------------------------------------------------
|     1       |     3      | Red         | This is an apple   |
|     2       |     5      | Green       | This is an apple   |
|     3       |     4      | Pink        | This is an apple   |
|     4       |     2      | Yellow      | This is an apple   |
|     5       |     2      | Orange      | This is an apple   |
|     1       |     3      | Red         | This is a duplicate|
|     2       |     5      | Green       | This is a duplicate|
|     3       |     4      | Pink        | This is a duplicate|
|     4       |     2      | Yellow      | This is a duplicate|
|     5       |     2      | Orange      | This is a duplicate|
 -------------------------------------------------------------

Second Table:

 ------------------------------------------------------------
| OrangeIndex | OrangeCost | OrangeColor | OrangeDescription |
 ------------------------------------------------------------
|     1       |     1      | Orange      | This is an Orange |
|     2       |     3      | Orange      |                   |
|     3       |     2      | Orange      | This is an Orange |
|     4       |     3      | Orange      |                   |
|     5       |     2      | Orange      | This is an Orange |
 ------------------------------------------------------------

Currently, I'm using the following code to get too much data:

Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
              On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows Distinct

Outcome:

 -------------------------------------------------------------------------
| 1  | 3 | Red    | This is an apple     | 1 | Orange | This is an Orange |
| 1  | 3 | Red    | This is an duplicate | 1 | Orange | This is an Orange |
| 2  | 5 | Green  | This is an apple     | 3 | Orange |                   |
| 2  | 5 | Green  | This is an duplicate | 3 | Orange |                   |
| 3  | 4 | Pink   | This is an apple     | 2 | Orange | This is an Orange |
| 3  | 4 | Pink   | This is an duplicate | 2 | Orange | This is an Orange |
| 4  | 2 | Yellow | This is an apple     | 3 | Orange |                   |
| 4  | 2 | Yellow | This is an duplicate | 3 | Orange |                   |
| 5  | 2 | Orange | This is an apple     | 2 | Orange | This is an Orange |
| 5  | 2 | Orange | This is an duplicate | 2 | Orange | This is an Orange |
 -------------------------------------------------------------------------

Desired Outcome:

 ------------------------------------------------------------------------
| 1 | 3 | Red    | This is an apple | 1 | 1 | Orange | This is an Orange |
| 2 | 5 | Green  | This is an apple | 2 | 3 | Orange |                   |
| 3 | 4 | Pink   | This is an apple | 3 | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 4 | 3 | Orange |                   |
| 5 | 2 | Orange | This is an apple | 5 | 2 | Orange | This is an Orange |
 ------------------------------------------------------------------------

I have tried the following:

'Get the original Column Names into an Array List
'MasterTableColumns = GetColumns(qMasterDS, TheMasterTable) '(external code)

'Plug the Existing DataSet into a DataView:
Dim View As DataView = New DataView(qMasterTable)

'Sort by the Primary Key:
View.Sort = ThePrimaryKey

'Build a new table listing only one column:
Dim newListTable As DataTable = _
View.ToTable("UniqueData", True, ThePrimaryKey)

This returns a unique list, but no associated data:

 -------------
| AppleIndex  |
 -------------
|     1       | 
|     2       | 
|     3       |
|     4       |
|     5       |
 -------------

So I tried this instead:

'Build a new table with ALL the columns:
Dim newFullTable As DataTable = _
View.ToTable("UniqueData", True, _
     MasterTableColumns(0), _
     MasterTableColumns(1), _
     MasterTableColumns(2), _
     MasterTableColumns(3))

Unfortunately, it yields the following... with duplicates:

 -------------------------------------------------------------
| AppleIndex  | AppleCost  | AppleColor  | AppleDescription   |
 ------------------------------------------------------------
|     1       |     3      | Red         | This is an apple   |
|     2       |     5      | Green       | This is an apple   |
|     3       |     4      | Pink        | This is an apple   |
|     4       |     2      | Yellow      | This is an apple   |
|     5       |     2      | Orange      | This is an apple   |
|     1       |     3      | Red         | This is a duplicate|
|     2       |     5      | Green       | This is a duplicate|
|     3       |     4      | Pink        | This is a duplicate|
|     4       |     2      | Yellow      | This is a duplicate|
|     5       |     2      | Orange      | This is a duplicate|
 -------------------------------------------------------------

Any ideas?

~~~~~~~~~~~~ Update: ~~~~~~~~~~~~

Jeff M suggested the following code. (Thanks Jeff) However, it gives me a error. Does anyone know the syntax for making this work in VB? I've monkeyed with it a bit and can't seem to get it right.

Dim matches = _
    From mRows In (From row In LinqMasterTable _
        Group row By row(ThePrimaryKey) Into g() _
        Select g.First()) _
    Join sRows In LinqSecondTable _
    On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
    Order By mRows(ThePrimaryKey) _
    Select mRows, sRows

Error in Third row at "row(ThePrimaryKey)":

"Range variable name can be inferred only from a simple or qualified name with no arguments."

Was it helpful?

Solution 3

Declarations and Such:

Private Sub LinqTwoTableInnerJoin(ByRef qMasterDS As DataSet, _
                                  ByRef qMasterTable As DataTable, _
                                  ByRef qSecondDS As DataSet, _
                                  ByRef qSecondTable As DataTable, _
                                  ByRef qPrimaryKey As String, _
                                  ByRef qForignKey As String, _
                                  ByVal qResultsName As String)

Dim TheMasterTable As String = qMasterTable.TableName
Dim TheSecondTable As String = qSecondTable.TableName
Dim ThePrimaryKey As String = qPrimaryKey
Dim TheForignKey As String = qForignKey
Dim TheNewForignKey As String = ""

MasterTableColumns = GetColumns(qMasterDS, TheMasterTable)
SecondTableColumns = GetColumns(qSecondDS, TheSecondTable)

Dim mColumnCount As Integer = MasterTableColumns.Count
Dim sColumnCount As Integer = SecondTableColumns.Count

Dim ColumnCount As Integer = mColumnCount + sColumnCount

Dim LinqMasterTable = qMasterDS.Tables(TheMasterTable).AsEnumerable
Dim LinqSecondTable = qSecondDS.Tables(TheSecondTable).AsEnumerable

Get the Data and order it by the Selected Key:

Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
             On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
             Order By mRows(ThePrimaryKey) _
             Select mRows, sRows

Put the Results into a Dataset Table:

' Make sure the dataset is available and/or cleared:
If dsResults.Tables(qResultsName) Is Nothing Then dsResults.Tables.Add(qResultsName)
dsResults.Tables(qResultsName).Clear() : dsResults.Tables(qResultsName).Columns.Clear()

'Adds Master Table Column Names
For x = 0 To MasterTableColumns.Count - 1
    dsResults.Tables(qResultsName).Columns.Add(MasterTableColumns(x))
Next

'Rename Second Table Names if Needed:
For x = 0 To SecondTableColumns.Count - 1
    With dsResults.Tables(qResultsName)
        For y = 0 To .Columns.Count - 1
            If SecondTableColumns(x) = .Columns(y).ColumnName Then
                SecondTableColumns(x) = SecondTableColumns(x) & "_2"
            End If
        Next
    End With
Next

'Make sure that the Forign Key is a Unique Value
If ForignKey1 = PrimaryKey Then
    TheNewForignKey = ForignKey1 & "_2"
Else
    TheNewForignKey = ForignKey1
End If

'Adds Second Table Column Names
For x = 0 To SecondTableColumns.Count - 1 
    dsResults.Tables(qResultsName).Columns.Add(SecondTableColumns(x))
Next

'Copy Results into the Dataset:
For Each Match In Matches

    'Build an array for each row:
    Dim NewRow(ColumnCount - 1) As Object

    'Add the mRow Items:
    For x = 0 To MasterTableColumns.Count - 1
        NewRow(x) = Match.mRows.Item(x)
    Next

    'Add the srow Items:
    For x = 0 To SecondTableColumns.Count - 1
        Dim y As Integer = x + (MasterTableColumns.Count)
        NewRow(y) = Match.sRows.Item(x)
    Next

    'Add the array to dsResults as a Row:
    dsResults.Tables(qResultsName).Rows.Add(NewRow)

Next

Give the user an option to clean doubles or not:

If chkUnique.Checked = True Then
    ReMoveDuplicates(dsResults.Tables(qResultsName), ThePrimaryKey)
End If

Remove the Duplicates if they so desire:

Private Sub ReMoveDuplicates(ByRef SkipTable As DataTable, _
                         ByRef TableKey As String)

    'Make sure that there's data to work with:
    If SkipTable Is Nothing Then Exit Sub
    If TableKey Is Nothing Then Exit Sub

    'Create an ArrayList of rows to delete:
    Dim DeleteRows As New ArrayList()

    'Fill the Array with Row Number of the items equal 
    'to the item above them:
    For x = 1 To SkipTable.Rows.Count - 1
        Dim RowOne As DataRow = SkipTable.Rows(x - 1)
        Dim RowTwo As DataRow = SkipTable.Rows(x)
        If RowTwo.Item(TableKey) = RowOne.Item(TableKey) Then
            DeleteRows.Add(x)
        End If
    Next

    'If there are no hits, exit this sub:
    If DeleteRows.Count < 1 Or DeleteRows Is Nothing Then
        Exit Sub
    End If

    'Otherwise, remove the rows based on the row count value:
    For x = 0 To DeleteRows.Count - 1

        'Start at the END and count backwards so the duplicate 
        'item's row count value doesn't change with each deleted row
        Dim KillRow As Integer = DeleteRows((DeleteRows.Count - 1) - x)

        'Delete the row:
        SkipTable.Rows(KillRow).Delete()

    Next
End Sub

Then clean up any leftovers:

If Not chkRetainKeys.Checked = True Then 'Removes Forign Key
    dsResults.Tables(qResultsName).Columns.Remove(TheNewForignKey)
End If

'Clear Arrays
MasterTableColumns.Clear()
SecondTableColumns.Clear()

Final Analysis: Ran this against 2 Files with 4 columns, 65,535 rows, and with some doubles. Process time, roughly 1 second. In fact it took longer to load the fields into memory than it did to parse the data.

OTHER TIPS

Well, the basic problem isn't the LINQ. It's the fact the your First Table contains "duplicates", which aren't really duplicates, since in your example, every row is distinctive.

So, our question to you is "How do we identify the duplicates in the original table?". Once that is answered, the rest should be trivial.

For example (In C# since I'm not sure of the VB syntax)

var Matches = from mRows in LinqMasterTable
                             .Where(r=>r.AppleDescription=="This is an Apple")
              join sRows in LinqSecondTable 
                   on mRows(ThePrimaryKey) equals sRows(TheForignKey)  
              orderby mRows(ThePrimaryKey) 
              select new { mRows, sRows};

Edit:
Here's how I would write the C# LINQ query. Here's an alternate version rather than using Distinct(), uses a nested query with grouping which should have similar semantics. It should be easily convertible to VB.

var matches = from mRows in (from row in LinqMasterTable
                             group row by row[ThePrimaryKey] into g
                             select g.First())
              join sRows in LinqSecondTable
                  on mRows[ThePrimaryKey] Equals sRows[TheForignKey]
              orderby mRows[ThePrimaryKey]
              select new { mRows, sRows }

and my attempt at a VB version of the above:

Edit:
As for the most recent error, I know exactly how to deal with it. When I was playing with VB LINQ, I found that the compiler doesn't like complex grouping expressions. To get around that, assign row(ThePrimaryKey) to a temporary variable and group by that variable. It should work then.

Dim matches = From mRows In (From row In LinqMasterTable _
                             Let grouping = row(ThePrimaryKey)
                             Group row By grouping Into g() _
                             Select g.First()) _
              Join sRows In LinqSecondTable _
                  On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows

Actually upon second inspection, it turns out that what is being grouped by needs a name. The following will work.

Dim matches = From mRows In (From row In LinqMasterTable _
                             Group row By Grouping = row(ThePrimaryKey) Into g() _
                             Select g.First()) _
              Join sRows In LinqSecondTable _
                  On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top