Question

I have two columns from which I need to remove duplicate rows. For this example:

A                b
sport 1          pippo
sport 1          pippo
sport 1          pluto
sport 2          paperino
sport 2          paperino   
sport 3          gastone

my required output is:

A                b
sport 1          pippo
sport 1          pluto
sport 2          paperino  
sport 3          gastone

I'm new to Excel so don't know what kind of formula or VBA to use.

How might I achieve this?

Was it helpful?

Solution 2

You can use Remove Duplicates manually, from the Data tab (you should do this to get an understanding of how it works)

Or, if you really want to automate it, try this

Sub Demo()
    Dim ws As Worksheet
    Dim rng As Range

    ' Get a reference to the sheet your data is on
    Set ws = ActiveSheet  '<-- change to suit

    With ws
        ' Get a reference to your data
        Set rng = Range(.Cells(1, 2), .Cells(.Rows.Count, 1).End(xlUp))

        ' Apply Remove Duplicates
        rng.RemoveDuplicates Columns:=2, Header:=xlNo
    End With
End Sub

OTHER TIPS

There are several ways to achieve what you want, of which the simplest has to be as mentioned by @chris neilsen:

Remove Duplicates

Just select your two columns then Data > Data Tools - Remove Duplicates accept the defaults (probably) and click OK. As indicated in a comment however, not all versions of Excel have such functionality.

Note the Remove - they are gone for good once the Undo stack is overwritten.

Beware also that Remove Duplicates is not totally reliable (see link in Advanced Filter below).

Advanced Filter

I tend to prefer this as Remove Duplicates may have a defect (though extremely rare to show up!):

It is mandatory (or at least highly advisable) to ensure your columns are labelled for this. Again select your two columns, then Data > Sort & Filter - Advanced, select Copy to another location, choose Copy to range (one cell is sufficient) and obviously check Unique records only.

Here Copy to is a give away to the fact that your entire original list (duplicates and all) is preserved, as may sometimes be required, without the bother of creating a copy to work on first.

COUNTIF

A formula solution may be more appropriate when the removal of duplicates is to be taken to mean both of a pair, etc.

Something like:

=COUNTIF(B:B,B1)  

in B1 (assuming your labels are in Row1) and copied down to suit will identify pairs or other multiples. Having achieved the count then filter to delete the chosen selection.

COUNTIF is generally available in all Excel versions (I don't recall whether in the very earliest ones!)

COUNTIFS

Is a function only available in the more recent versions of Excel, but allows for a more complicated definition of "Duplicate" - not applicable in your example.

PivotTable

PT's aggregate Row Labels values as a matter of course, so give the appearance of removing duplicates. PT's are so useful they may well be wanted for other reasons anyway, so no extra bother for removing duplicates.

Show in Tabular Form Table layout may be most convenient (here with A above b in Row Labels). This should display one instance of each A/b pair - unless the A value is not the first example. In other words the display of ColumnA values does not repeat ColumnA values - just blanks that imply "same as above" until A changes. I think the most recent versions of Excel have a feature for displaying the A values on every row but it is quite easy to 'make allowances' in earlier versions.

The problem though is that since the contents of the PT cannot be altered in the way proposed below this needs to be on a copy of the data showing in the PT (not merely another version of the PT!)

Select the Column with values only at the start of each 'section', Home > Editing > Find & Select - Go To Special..., Blanks. Click on one of the selected cell, enter =, Up arrow and CTRL+Enter.

VBA

This is a solution to almost anything "Excel" and viable for removing duplicates, though would probably not be 'cost effective' for a complete data sample of the size in your question - unless the process is required often.

I have probably missed a few other options - but no doubt they have been mentioned by others before I had barely started with this screed.

Edit: Seems like, as chris neilsen suggests, Remove Duplicates in the data tab handles "pairs" of duplicates automatically; I figured it would just do simple removing from each column, but it does appear to group first & then remove. But if you want an excessively manual way of doing it, read on ;)


Could be a heavy operation, but this would be a pretty easy way to get what you want, especially if you've only got a few hundred rows:

A         B         C                D
sport 1   pippo     =CONCAT(A1,B1)   =COUNTIF(C$1:C1,C1)
sport 1   pippo     =CONCAT(A2,B2)   =COUNTIF(C$1:C2,C2)
sport 1   pluto     =CONCAT(A3,B3)   =COUNTIF(C$1:C3,C3)
sport 2   paperino  =CONCAT(A4,B4)   =COUNTIF(C$1:C4,C4)
sport 2   paperino  =CONCAT(A5,B5)   =COUNTIF(C$1:C5,C5)
sport 3   gastone   =CONCAT(A6,B6)   =COUNTIF(C$1:C6,C6)

This results in:

A         B         C                D
sport 1   pippo     sport 1pippo     1
sport 1   pippo     sport 1pippo     2
sport 1   pluto     sport 1pluto     1
sport 2   paperino  sport 2paperino  1
sport 2   paperino  sport 2paperino  2
sport 3   gastone   sport 3gastone   1

Any number greater than 1 in column D is a duplicate. Then you can highlight / select all 4 columns, and sort on column D "smallest to largest":

A         B         C                D
sport 1   pippo     sport 1pippo     1
sport 1   pluto     sport 1pluto     1
sport 2   paperino  sport 2paperino  1
sport 3   gastone   sport 3gastone   1
sport 1   pippo     sport 1pippo     2
sport 2   paperino  sport 2paperino  2

And then delete the duplicate rows, column C, and column D, and you've got your output (You could also just do these calcs on a different tab, and only load the value in where D=1):

A         B        
sport 1   pippo     
sport 1   pluto     
sport 2   paperino  
sport 3   gastone  

The way it works is that column c joins the previous two columns together as a single character string, so any "duplicates" in column C represent a duplicate pair of A & B. Then column D just says, "How many times did the character string to the left occur so far?"

The C$1 term just locks excel from updating the row index (we always want the top of the range to be the first cell in column C). After writing it once, you should be able to copy-paste or drag the formula over the length of your data & it will update the row references accordingly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top