Question

I was able to solve this problem using Python but I would need to implement the solution in Excel itself as well so that I can use graphics to represent the results easily.

Given this table:

b   a   c
c   a   b 
a   c   b
a   c
a   c   d
b   c   a
d   c   a

I would like to obtain a list sorted by the number of times that a row is the repeated (in no particular order) in the table.

  • So this would count as repeated rows: "a b c", "c b a", "a c b"
  • But this wouldn't: "a b c", "b c", "b", "a b", "a c"

So, the output I'm looking for would be something like:

1st place: "b+a+c" found 4 times
2nd place: "a+c+d" found 2 twice
3rd place: "a+c" found once

The output has to say "b+a+c" even if it is also counting "a+b+c", "c+b+a" and so on... because "b+a+c" was the first one of all the other subsequent repetitions.

Would anyone be able to show me the correct way to approach the problem?

Was it helpful?

Solution

I would use a Class module and a collection object. The class module would consist of two arrays and a counter. The first array is the row in its original order; the second array is the row in sorted order. The sorted order would be used as the Key for the collection object. If you try to add a collection object where the Key already exists, it will cause an error. Trap the error and add one to the counter.

Then for the results, you would retrieve the original entries from the "original" array; and the counter. Sort on the counter and you have your results.

Here is an example of VBA code to accomplish the above.

First, insert a Class module and rename it RowEntries

Option Explicit
Private pOriginal() As Variant
Private pSorted() As Variant
Private pCount As Long

Public Property Get Original() As Variant
    Original = pOriginal
End Property
Public Property Let Original(Value As Variant)
    pOriginal = Value
End Property

Public Property Get Sorted() As Variant
    Sorted = pSorted
End Property
Public Property Let Sorted(Value As Variant)
    pSorted = Value
End Property

Public Property Get Count() As Long
    Count = pCount
End Property
Public Property Let Count(Value As Long)
    pCount = Value
End Property

Then insert a regular module. This code assumes your source data is the CurrentRegion around A1; and the results will go several columns to the right. These algorithms are easily changed.

Option Explicit
Option Compare Text  'To make comparison case insensitive, if you want
Sub RankRows()
    Dim V As Variant, VtoSort As Variant
    Dim vRes() As Variant
    Dim cRowEntries As RowEntries
    Dim colRowEntries As Collection
    Dim sKey As String, S As String
    Dim I As Long
    Dim rSrc As Range, rRes As Range  'Location for Results

Set rSrc = Range("A1").CurrentRegion
Set rRes = rSrc.Offset(columnoffset:=rSrc.Columns.Count + 3).Resize(1, 2)

V = rSrc

Set colRowEntries = New Collection
On Error Resume Next
For I = 1 To UBound(V)
    Set cRowEntries = New RowEntries
    With cRowEntries
        .Original = WorksheetFunction.Index(V, I, 0)
        VtoSort = .Original
        Quick_Sort VtoSort, LBound(VtoSort), UBound(VtoSort)
        .Sorted = VtoSort
        .Count = 1
        sKey = CStr(Join(.Sorted, ", "))
        colRowEntries.Add cRowEntries, sKey
        If Err.Number <> 0 Then
            Err.Clear
            With colRowEntries(sKey)
                .Count = .Count + 1
            End With
        End If
    End With
Next I
On Error GoTo 0

'populate results array
ReDim vRes(1 To colRowEntries.Count, 1 To 2)
For I = 1 To colRowEntries.Count
    With colRowEntries(I)
        vRes(I, 1) = Join(.Original, "+")

            'remove trailing delimiters
            Do While Right(vRes(I, 1), 1) = "+"
                vRes(I, 1) = Left(vRes(I, 1), Len(vRes(I, 1)) - 1)
            Loop

        vRes(I, 2) = .Count
    End With
Next I

Set rRes = rRes.Resize(rowsize:=UBound(vRes), columnsize:=UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    .Sort key1:=rRes.Columns(2), order1:=xlDescending, Header:=xlNo
End With

V = rRes
ReDim vRes(1 To UBound(V), 1 To 1)

For I = 1 To UBound(V)
    Select Case V(I, 2)
        Case 1
            S = "once"
        Case 2
            S = "twice"
        Case Else
            S = V(I, 2) & " times"
    End Select
    vRes(I, 1) = OrdinalNum(I) & " place: """ & V(I, 1) & """ found " & S

Next I

rRes.EntireColumn.Clear
rRes.Resize(columnsize:=1) = vRes
rRes.EntireColumn.AutoFit

End Sub


Sub Quick_Sort(ByRef SortArray As Variant, ByVal first As Long, ByVal last As Long)
Dim Low As Long, High As Long
Dim Temp As Variant, List_Separator As Variant
Low = first
High = last
List_Separator = SortArray((first + last) / 2)
Do
    Do While (SortArray(Low) < List_Separator)
        Low = Low + 1
    Loop
    Do While (SortArray(High) > List_Separator)
        High = High - 1
    Loop
    If (Low <= High) Then
        Temp = SortArray(Low)
        SortArray(Low) = SortArray(High)
        SortArray(High) = Temp
        Low = Low + 1
        High = High - 1
    End If
Loop While (Low <= High)
If (first < High) Then Quick_Sort SortArray, first, High
If (Low < last) Then Quick_Sort SortArray, Low, last
End Sub

Function OrdinalNum(num) As String
Dim Suffix As String

OrdinalNum = num
If Not IsNumeric(num) Then Exit Function
If num <> Int(num) Then Exit Function

Select Case num Mod 10
    Case Is = 1
        Suffix = "st"
    Case Is = 2
        Suffix = "nd"
    Case Is = 3
        Suffix = "rd"
    Case Else
        Suffix = "th"
End Select

Select Case num Mod 100
    Case 11 To 19
        Suffix = "th"
End Select

OrdinalNum = Format(num, "#,##0") & Suffix
End Function

The output will be just as you show in your request above. But could be easily modified:

enter image description here

OTHER TIPS

I suggest you an alternate way to solve this problem.

You can transfer a b c d to 1 2 4 8 (in binary is 01 10 100 1000).

a+b+c = a+c+b =... = 7 (111)
a+c = c+a = 5 (101)

So you can use the sum value to group by in excel.

The function to transfer single char to digital is very simple:

A B C POWER(2,CODE(A2) - 97)  POWER(2,CODE(A2) - 97)  POWER(2,CODE(A2) - 97)  SUM(D2:F2)
-+-+-+-----------------------+-----------------------+-----------------------+----------
b|a|c|2                      |1                      |4                      |7
c|a|b|4                      |1                      |2                      |7
a|c|b|1                      |4                      |2                      |7
a|c| |1                      |4                      |0                      |5
a|c|d|1                      |4                      |8                      |13
b|c|a|2                      |4                      |1                      |7
d|c|a|8                      |4                      |1                      |13

Wish this method can help you to find your own way to solve your problem.

This question is so interesting. It's a good sample for showing how to use mathematics to provide a simpler solution.

I had to add another answer because I realized to find repeat combinations of three words is the same as to calculate the distance in three-space from the zero point - it is only necessary to give each word a different number. And this answer can handle the a+a problem that Pnuts mentioned before.

Different from my last answer, if you have 200 phrases and combinations within three members, the biggest number calculated is 120000 (POWER(200,2)*3), my last answer is 1.60694E+60 (POWER(2,200)). My last answer may solve the problem logically, but cannot be implemented in Excel or many programming languages. It uses a permutations solution to solve a combinations problem.

Here is the solution using distance in three-space, it's simple and easily extendable.

enter image description here

  1. Map each word to a different number. (VLOOKUP is one method, you may have alternative ways.) The resulting numbers don't need to be continuous, only different from one another, and the maximum number should be less than SQRT(POWER(2,32)/3)).
  2. Calculate the distance using the formula in G1.
  3. Group and Count use column G. (There may ways you could find in other answers.)
  4. Notice: I use '_' replaced space cell, to map a number for space, so you can make a_a equal to aa_ (line 4 and 5). Any choice should have a number for space.

Any advice to improve this answer would be appreciated.

Almost a formula only solution, assuming data is in labelled ColumnsA:C, in D2:

=VLOOKUP(A2,weight,2,0)+IFNA(VLOOKUP(B2,weight,2,0),)+IFNA(VLOOKUP(C2,weight,2,0),)  

copied down to suit, where weight (green in the image) is a named range for a lookup table (constructed along the lines suggested by @Jaugar Chang). In E2 and copied down to suit:

 =IF(COUNTIF(D$2:D2,D2)=1,COUNTIF(D:D,D2),"")  

in G1:

=ROW()&MID("thstndrdthstndrdth",MATCH(IF(MOD(ROW(),100)>29,MOD(ROW(),10)+20,MOD(ROW(),100)),{0,1,2,3,4,21,22,23,24},1)*2-1,2)&" place: """&INDIRECT("A"&MATCH(H1,E:E,0))&"+"&INDIRECT("B"&MATCH(H1,E:E,0))&"+"&INDIRECT("C"&MATCH(H1,E:E,0))&""" found"  

in H1:

=LARGE(E:E,ROW())  

in I1:

=IF(H1>2,"times",IF(H1=1,"","twice"))

Each of the last three copied down until just short of an error message.

ColumnH formatted:

[=1] "once";General

Output is highlighted yellow:

SO25070024 example

In this example there is a surplus + and the possibility of ++ surplus.

The way I would do this is using a dictionary to go through the list and count the rows. The key would be the row itself, so I could use the Dictionary.Exists(Key) method of the dictionary to see if I have already encountered that row. The value associated to each key would be an integer which I'd increment every time I come across the same row again.

After parsing the list I would iterate the dictionary to output they key and value to a column in excel. Finally I'd use sort on the range in which I output the results to sort them by frequency.

This is pretty easy stuff, but you need to reference the Microsoft Scripting Runtime to use the dictionary object (see here for example http://www.techbookreport.com/tutorials/vba_dictionary.html).

Hope this helps.

UPDATE

Since you said you might give this method a try in vba. I'd thought I'd add something the always tripped me up when I first used the Collection and Dictionary objects. When iterating through the entries, the iteration variable has to be a Variant. I was used to having to declare the iteration variable of the same type as that of the data I was iterating, but that will give you an error in vba.

Here's my version using array manipulation, then some range manipulation.

Edit1: I've read pnut's comment about handling b only. Btw, this will not handle a+a

Sub Test()
    Dim arr, unq
    Dim orng As Range, rng As Range, srng As Range
    Dim i As Long, k As Long
    Dim check As Boolean: check = False
    Dim freq As String
    '~~> pass range data to array
    Set orng = Sheet1.Range("A1", _
        Sheet1.Range("A" & Sheet1.Rows.Count).End(xlUp))
    For Each rng In orng
        If Not IsArray(arr) Then
            arr = Array(RngToArr(rng.Resize(, 3)))
        Else
            ReDim Preserve arr(UBound(arr) + 1)
            arr(UBound(arr)) = RngToArr(rng.Resize(, 3))
        End If
    Next
    '~~> pass unique combination and count to another array
    For i = LBound(arr) To UBound(arr)
        If IsEmpty(unq) Then
            ReDim unq(1 To 2, 1 To 1)
            unq(1, 1) = arr(i)
            unq(2, 1) = unq(2, 1) + 1
        Else
            For k = LBound(unq, 2) To UBound(unq, 2)
                If CompArr(arr(i), unq(1, k)) Then
                    check = False
                    unq(2, k) = unq(2, k) + 1
                    Exit For
                Else
                    check = True
                End If
            Next
            If check Then
                ReDim Preserve unq(1 To 2, 1 To UBound(unq, 2) + 1)
                unq(1, UBound(unq, 2)) = arr(i)
                unq(2, UBound(unq, 2)) = unq(2, UBound(unq, 2)) + 1
            End If
        End If
    Next
    '~~> Transpose and tidy up the array
    ReDim tally(1 To UBound(unq, 2), 1 To 2)
    For i = LBound(unq, 2) To UBound(unq, 2)
        tally(i, 1) = Join$(unq(1, i), "+")
        tally(i, 2) = unq(2, i)
    Next
    '~~> sort in worksheet, easier than sorting array
    With Sheet1
        Set srng = .Range("E1:F" & UBound(tally, 1))
        srng = tally
        .Sort.SortFields.Clear
        .Sort.SortFields.Add Key:=srng.Offset(0, 1).Resize(, 1), _
            SortOn:=xlSortOnValues, Order:=xlDescending, _
            DataOption:=xlSortNormal
        With .Sort
            .SetRange srng
            .Header = xlGuess
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With
    End With
    '~~> do some manipulation to make it closer to what you want
    For Each rng In srng.Offset(0, 1).Resize(, 1)
        Select Case rng.Value
        Case 1: freq = "found once"
        Case 2: freq = "found twice"
        Case Else: freq = "found " & rng.Value & " times"
        End Select
        rng.Value = freq
    Next
End Sub

Private Function CompArr(list1, list2) As Boolean
    Dim j As Long: CompArr = True
    For j = LBound(list1) To UBound(list1)
        With Application
            If IsError(.Match(list1(j), list2, 0)) _
                Then CompArr = False
        End With
    Next
End Function

Private Function RngToArr(r As Range) As Variant
    Dim c As Range, a
    For Each c In r
        If Len(c.Value) <> 0 Then
            If Not IsArray(a) Then
                a = Array(c.Value)
            Else
                ReDim Preserve a(UBound(a) + 1)
                a(UBound(a)) = c.Value
            End If
        End If
    Next
    RngToArr = a
End Function

Result:

enter image description here

Not exactly the way you want it, I was not able to come-up how to set up 1st Place, 2nd Place, etc. dynamically.
Also, I did not go deep on the plus(+) sign. If there are blanks, result maybe +b+c, or a+c+ or a++c.
Anyways, HTH.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top