Question

I am trying to create a large empty data.frame and insert a groups of row. I have seen a few similar questions on numerous forums, however I have been unable to apply any of them successfully to the specific formatting issue I am having.

I started with rbind(df,allic) # allic is the data frame I would like to insert into df # however, given the size of my dataset the operation takes 5 1/2 minutes to complete. I understand that creating the data frame at the beginning and replacing rows improves efficiency, however I have been unable to make it work for my problem. Code is as follows:

Initial data:

  Order.ID                  Product
1    193505              Onion Rings
2    193505 Pineapple Cheddar Burger
3    193623            Fountain Soda
4    193623             French Fries
5    193623                Hamburger
6    193623                  Hot Dog
7    193631             French Fries
8    193631                Hamburger
9    193631                Milkshake 

The products won't match to below, however this being a formatting issue I figured it best to show the formatting that brought me to where I am now.

nb$Order.ID <- as.factor(nb$Order.ID)
plist <- aggregate(nb$Product,list(nb$Order.ID),list)
allp <- unique(unlist(plist$x))
allic <- expand.grid(plist$x[[1]], Var2=plist$x[[1]], Var3=1)


                      Var1                     Var2 Var3
1              Onion Rings              Onion Rings    1
2 Pineapple Cheddar Burger              Onion Rings    1
3              Onion Rings Pineapple Cheddar Burger    1
4 Pineapple Cheddar Burger Pineapple Cheddar Burger    1

Now I create an empty dataframe (df) using:

df <- data.frame(factor=rep(NA, rcnt), factor=rep(NA,rcnt), stringsAsFactors=FALSE)

rcnt being a large, arbitrary number which I plan to trim once the operation is complete. My issue comes when I try to insert these lines using:

df[1:4,] <- allic
head(df, n=10)


  factor factor.1
1      47       47
2      51       47
3      47       51
4      51       51
5      NA       NA
6      NA       NA
7      NA       NA
8      NA       NA

How can I insert rows in a dataframe without losing the format of my values? I would greatly appreciate any help I can get at this point.

EDIT Per comment below:

>df[i] <- for(i in 1:nrow(plist)) {
>       allic <- expand.grid(plist$x[[i]], Var2=plist$x[[i]], Var3=1) 
>       df[i:nrow(allic),] <- sapply(allic, as.character)

I'm still very new with R, however this was working when I was using df <- rbind(df,allic). nrow(df) is 4096.

Was it helpful?

Solution

Try wrapping allic in as.character as follows:

df[1:4,] <- sapply(allic, as.character)


> df
                     factor                 factor.1
1               Onion Rings              Onion Rings
2  Pineapple Cheddar Burger              Onion Rings
3               Onion Rings Pineapple Cheddar Burger
4  Pineapple Cheddar Burger Pineapple Cheddar Burger
5                      <NA>                     <NA>
6                      <NA>                     <NA>
7                      <NA>                     <NA>
8                      <NA>                     <NA>
9                      <NA>                     <NA>
10                     <NA>                     <NA>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top