Number of Unique Obs by Variable in a Data Table

Question 1

Update: `uniqueN`

As of version 1.9.6, there is a built in (optimized) version of this solution, the uniqueN function. Now this is as simple as:

dt[ , lapply(.SD, uniqueN)]

If you want to find the number of unique values in each column, something like

 dt[, lapply(.SD, function(x) length(unique(x)))]
##     a  b c
## 1: 10 10 1

To get your function to work you need to use with=FALSE within [.data.table, or simply use [[ instead (read fortune(312) as well...)

lapply(names(df) function(x) length(unique(dt[, x, with = FALSE])))

or

 lapply(names(df) function(x) length(unique(dt[[x]])))

will work

In one step

dt[,names(dt) := lapply(.SD, function(x) if(length(unique(x)) ==1) {return(NULL)} else{return(x)})]


 # or to avoid calling `.SD` 

dt[, Filter(names(dt), f = function(x) length(unique(dt[[x]]))==1) := NULL]

Question 2

The approaches in the other answers are good. Another way to add to the mix, just for fun :

for (i in names(DT)) if (length(unique(DT[[i]]))==1) DT[,(i):=NULL]

or if there may be duplicate column names :

for (i in ncol(DT):1) if (length(unique(DT[[i]]))==1) DT[,(i):=NULL]

NB: (i) on the LHS of := is a trick to use the value of i rather than a column named "i".

Question 3

Here is a solution to your core problem (I hope I got it right).

require(data.table)

### Create a data.table
dt <- data.table(a = 1:10,
                 b = letters[1:10],
                 d1 = "",
                 c = rep(1, times = 10),
                 d2 = "")
dt
     a b d1 c d2
 1:  1 a    1   
 2:  2 b    1   
 3:  3 c    1   
 4:  4 d    1   
 5:  5 e    1   
 6:  6 f    1   
 7:  7 g    1   
 8:  8 h    1   
 9:  9 i    1   
10: 10 j    1

First, I introduce two columns d1 and d2 that have no values whatsoever. Those you want to delete, right? If so, I just identify those columns and select all other columns in the dt.

only_space <- function(x) {
  length(unique(x))==1 && x[1]==""
}
bolCols <- apply(dt, 2, only_space)
dt[, (1:ncol(dt))[!bolCols], with=FALSE]

Somehow, I have the feeling that you could further simplify it...

Output:

Question 4

There is an easy way to do that using "dplyr" library, and then use select function as follow:

library(dplyr)

newdata <- select(old_data, first variable,second variable)

Note that, you can choose as many variables as you like.

Then you will get the type of data that you want.

Many thanks,

Fadhah

Number of Unique Obs by Variable in a Data Table

Update: uniqueN

Update: `uniqueN`