How to deal with nonstandard column names (white space, punctuation, starts with numbers)

StackOverflow https://stackoverflow.com/questions/22842232

  •  27-06-2023
  •  | 
  •  

Pregunta

df <- structure(list(`a a` = 1:3, `a b` = 2:4), .Names = c("a a", "a b"
), row.names = c(NA, -3L), class = "data.frame")

and the data looks like

  a a a b
1   1   2
2   2   3
3   3   4

Following call to select

select(df, 'a a')

gives

Error in abs(ind[ind < 0]) : 
  non-numeric argument to mathematical function

How can I select "a a" and/or rename it to something without space using select? I know the following approaches:

  1. names(df)[1] <- "a"
  2. select(df, a=1)
  3. select(df, ends_with("a"))

but if I am working on a large data set, how can I get an exact match without knowing the index numer or similar column names?

¿Fue útil?

Solución

You may select the variable by using backticks `.

select(df, `a a`)
#   a a
# 1   1
# 2   2
# 3   3

However, if your main objective is to rename the column, you may use rename in plyr package, in which you can use both "" and ``.

rename(df, replace = c("a a" = "a"))
rename(df, replace = c(`a a` = "a"))

Or in base R:

names(df)[names(df) == "a a"] <- "a"

For a more thorough description on the use of various quotes, see ?Quotes. The 'Names and Identifiers' section is especially relevant here:

other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

See also ?make.names about valid names.

See also this post about renaming in dplyr

Otros consejos

Some alternatives to backticks, good as of dplyr 0.5.0, the current version as of this writing.

If you're trying to programmatically select an argument as a column and you don't want to rename or do something like paste/sprintf the column name into backticks, you can use as.name in conjunction with the non-standard evaluation version of select, which is select_:

dplyr::select_(df, as.name("a a"))

Many of the dplyr functions have non-standard versions. In the case of select specifically, you can also use the standard version in conjunction with the select helper one_of. See ?dplyr::select_helpers for documentation:

dplyr::select(df, dplyr::one_of("a a"))

you can lock your spaced var in " ` " that is located under Esc. notice that the sign is different from <'>

a b\

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top