سؤال

I have been searcing for LINQ equivalent of WITH TIES in sql server lately, I came across a couple things, which couldn't proove to be useful.

I know this question was asked before and has an accepted answer, but it doesn't work the way with ties does. The solution using GroupBy() doesn't result as expected for TOP(3) WITH TIES considering a data set consisting of {3 2 2 1 1 0} the result set will be {3 2 2 1 1} where it should be {3 2 2}

Using the following sample data (taken from this question):

CREATE TABLE Person
(
    Id int primary key,
    Name nvarchar(50),
    Score float
)    

INSERT INTO Person VALUES (1, 'Tom',8.9)
INSERT INTO Person VALUES (2, 'Jerry',8.9)
INSERT INTO Person VALUES (3, 'Sharti',7)
INSERT INTO Person VALUES (4, 'Mamuzi',9)
INSERT INTO Person VALUES (5, 'Kamala',9)

Traditional OrderByDescending(p => p.Score).Take(3) will result with: Mamuzi, Kamala and one of Tom (or Jerry) where it should include BOTH

I know there is no built-in equivalent of it and i've found a way to implement it. I don't know if it is the best way to do it and open for alternative solutions.

هل كانت مفيدة؟

المحلول

var query = (from q in list.OrderByDescending(s => s.Score).Take(3).Select(s => s.Score).Distinct()
             from i in list
             where q == i.Score
             select i).ToList();

Edit:

@Zefnus

I wasn't sure in which order you wanted it but to change the order you can put a OrderBy(s => s.Score) between select i and ToList()

I don't have the possibility to check what sql statement my linq clause would produce. But your answer is much better i think. And your question was also really good. I never thought about top with ties in linq. ;)

Basically it only takes top 3 scores from the first list and compares them with the whole list and i takes only those scores which are equal to the scores of the first list.

نصائح أخرى

Do not use IEnumerable<T> with anything touching a database!

A solution aimed at LinqToSql and LinqToEntities should not be using IEnumerable<T>. Your current self answer will result in every single person being selected from the database and then being queried in memory using LinqToObjects.

To make a solution that is translated to SQL and executed by the database you have to use IQueryable<T> and Expressions instead.

public static class QueryableExtensions
{
    public static IQueryable<T> TopWithTies<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (topBy == null) throw new ArgumentNullException("topBy");
        if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));

        var topValues = source.OrderBy(topBy)
                              .Select(topBy)
                              .Take(topCount);

        var queryableMaxMethod = typeof(Queryable).GetMethods()
                                                  .Single(mi => mi.Name == "Max" &&
                                                                mi.GetParameters().Length == 1 &&
                                                                mi.IsGenericMethod)
                                                  .MakeGenericMethod(typeof(TComparand));

        var lessThanOrEqualToMaxTopValue = Expression.Lambda<Func<T, bool>>(
            Expression.LessThanOrEqual(
                topBy.Body,
                Expression.Call(
                    queryableMaxMethod,
                    topValues.Expression)),
            new[] { topBy.Parameters.Single() });

        var topNRowsWithTies = source.Where(lessThanOrEqualToMaxTopValue)
                                     .OrderBy(topBy);
        return topNRowsWithTies;
    }

    public static IQueryable<T> TopWithTiesDescending<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (topBy == null) throw new ArgumentNullException("topBy");
        if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));

        var topValues = source.OrderByDescending(topBy)
                              .Select(topBy)
                              .Take(topCount);

        var queryableMinMethod = typeof(Queryable).GetMethods()
                                                  .Single(mi => mi.Name == "Min" &&
                                                                mi.GetParameters().Length == 1 &&
                                                                mi.IsGenericMethod)
                                                  .MakeGenericMethod(typeof(TComparand));

        var greaterThanOrEqualToMinTopValue = Expression.Lambda<Func<T, bool>>(
            Expression.GreaterThanOrEqual(
                topBy.Body,
                Expression.Call(queryableMinMethod,
                                topValues.Expression)),
            new[] { topBy.Parameters.Single() });

        var topNRowsWithTies = source.Where(greaterThanOrEqualToMinTopValue)
                                     .OrderByDescending(topBy);
        return topNRowsWithTies;
    }
}

This creates queries of the following form:

SELECT [t0].[Id], [t0].[Name], [t0].[Score]
FROM [Person] AS [t0]
WHERE [t0].[Score] >= ((
    SELECT MIN([t2].[Score])
    FROM (
        SELECT TOP (3) [t1].[Score]
        FROM [Person] AS [t1]
        ORDER BY [t1].[Score] DESC
        ) AS [t2]
    ))
ORDER BY [t0].[Score] DESC

That query is only about 50% worse than the baseline query:

SELECT TOP (3) WITH TIES
    [t0].[Id], 
    [t0].[Name], 
    [t0].[Score]
FROM 
    [Person] AS [t0]
ORDER BY [t0].[Score] desc

With a data set consisting of your original 5 records and an additional 10000 records all with scores less than the original both of these are more or less instant (less than 20 milliseconds).

The IEnumerable<T> approach took a whole 2 minutes!


If the expression building and reflection seems scary the same thing can be achieved with a join:

public static IQueryable<T> TopWithTiesDescendingJoin<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
{
    if (source == null) throw new ArgumentNullException("source");
    if (topBy == null) throw new ArgumentNullException("topBy");
    if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));

    var orderedByValue = source.OrderByDescending(topBy);
    var topNValues = orderedByValue.Select(topBy).Take(topCount).Distinct();
    var topNRowsWithTies = topNValues.Join(source, value => value, topBy, (x, row) => row);
    return topNRowsWithTies.OrderByDescending(topBy);
}

With the following query as the result (with about the same performance):

SELECT [t3].[Id], [t3].[Name], [t3].[Score]
FROM (
    SELECT DISTINCT [t1].[Score]
    FROM (
        SELECT TOP (3) [t0].[Score]
        FROM [Person] AS [t0]
        ORDER BY [t0].[Score] DESC
        ) AS [t1]
    ) AS [t2]
INNER JOIN [Person] AS [t3] ON [t2].[Score] = [t3].[Score]
ORDER BY [t3].[Score] DESC

Another solution - which probably is not as efficient as the other solution - is to get TOP(3) Scores and get the rows with Score values contained in the TOP(3).

We can use Contains() as follows;

orderedPerson = datamodel.People.OrderByDescending(p => p.Score);

topPeopleList =
(
    from p in orderedPerson 
    let topNPersonScores = orderedPerson.Take(n).Select(p => p.Score).Distinct()
    where topNPersonScores.Contains(p.Score)
    select p
).ToList();

What's good about this implementation is that it's extension method TopWithTies() can be implemented easly as;

public static IEnumerable<T> TopWithTies<T, TResult>(this IEnumerable<T> enumerable, Func<T, TResult> selector, int n)
{
    IEnumerable<T> orderedEnumerable = enumerable.OrderByDescending(selector);

    return
    (
        from p in orderedEnumerable
        let topNValues = orderedEnumerable.Take(n).Select(selector).Distinct()
        where topNValues.Contains(selector(p))
        select p
    );
}

I think that maybe you can do something like:

OrderByDescending(p => p.Score).Skip(2).Take(1)

Count the number of occurrences of this element, and then:

OrderByDescending(p => p.Score).Take(2 + "The select with the number of occurrences for the third element")

I think that maybe this works ;) It´s only an idea!

I've found a solution taking the Score field value of the Nth row (3rd row in this case) using .Skip(n-1).Take(1) and selecting all rows with score value greater or equal to that as follows:

qryPeopleOrderedByScore = datamodel.People.OrderByDescending(p => p.Score);

topPeopleList =
(
    from p in qryPeopleOrderedByScore
    let lastPersonInList = qryPeopleOrderedByScore.Skip(2).Take(1).FirstOrDefault()
    where lastPersonInList == null || p.Score >= lastPersonInList.Score
    select p
).ToList();
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top