Question

I have a data frame A which has numeric column like:

zip code
00601
00602
00607

and so on.

If I read this in R using read.csv, they are read as numeric entities. I want them as factors.

I tried converting them back to factor using

A <- as.factor(A)

But this removes starting zeroes and make A like

zip code
601
602
607

I do not want this. I want to save zeroes.

Was it helpful?

Solution

Use colClasses in your read.csv call to read them in as character or factor: read.csv(*, colClasses="factor").

OTHER TIPS

You may need to add leading zeros - as in this post. This first converts to a character class. Then, you can change this to a factor, which maintains the leading zeros.

Example

A <- data.frame("zip code"=c(00601,00602,00607))
class(A$zip.code) #numeric
A$zip.code <- sprintf("%05d", A$zip.code)
class(A$zip.code) #character
A$zip.code <- as.factor(A$zip.code)
class(A$zip.code) #factor

Resulting in:

> A$zip.code
[1] 00601 00602 00607
Levels: 00601 00602 00607

Writing A as a .csv file

write.csv(A, "tmp.csv")

results in

"","zip.code"
"1","00601"
"2","00602"
"3","00607"

everything without any text qualifier is (attempted to be) read as numeric, so the issue is basically to know how your data (in case 00607) is stored on the flat text file. If without text qualifier, you can either follow the suggestion of @Hong Ooi or use

read.csv(*, colClasses="character")

and then convert each column accordingly (in case you don' want/need all of them to factor). Once you have a character vector (a data.frame column) converting it to factor is just straightforward

> zipCode <- c("00601", "00602", "00607")
> factor(zipCode)
[1] 00601 00602 00607
Levels: 00601 00602 00607
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top