R - Populating an empty data.frame with character() touples

https://stackoverflow.com/questions/20873205

23-09-2022
|

Question

Have a list of dataframe elements of IDs and sentences of the form:

(EDIT-START The code is required to work inside a loop - so I do want explicitly to first create an empty dataframe, then populate it, then delete the content, then repopulate,.. EDIT-END)

# creating an empty dataframe
sent.df <- data.frame(ID=character(), Sentences=character()) 

# have IDs are like:
id1 <- "01xx"
id2 <- "02xx"
id3 <- "03xx"
id4 <- "04xx"

# have sentences are like:
sent1 <- "ab"
sent2 <- "bc"
sent3 <- "cd"
sent4 <- "de"

PROBLEM 1) When I populate the dataframe with

sent.df <- rbind(sent.df, c(id1, sent1))
sent.df <- rbind(sent.df, c(id2, sent2)) #*
sent.df <- rbind(sent.df, c(id3, sent3))
sent.df <- rbind(sent.df, c(id4, sent4))

#I get this unexplicable errors, after the second command marked with "#*"

s:
1: In `[<-.factor`(`*tmp*`, ri, value = "03xx") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "cd") :
  invalid factor level, NA generated

PROBLEM 2) Also the column names are not preserved after executing the first dataframe line

> sent.df
  X.02xx. X.bc.
1    02xx    bc

OBSERVATION: the following code works - although it seems inconsistent like commented

sent.df <- data.frame(ID=numeric(), Sentences=numeric()) # inconsistent class initialization

sent.df[1,] <- c("01xx", "ab")           # rbind doesn't work. see above.
sent.df <- rbind(sent.df, c(id2, sent2))
sent.df <- rbind(sent.df, c(id3, sent3))
sent.df <- rbind(sent.df, c(id4, sent4))

Desired OUTPUT

> sent.df
    ID Sentences
1 01xx   ab
2 02xx   bc
3 03xx   cd
4 04xx   de

Solution 2

Maybe you could do

sent.df <- data.frame( 
   id=c("01xx", "02xx", "03xx", "04xx"),
   sentences=c("ab", "bc", "cd", "de")
)

If you need to work iteratively, this would work

sent.df <- data.frame()
adding <- TRUE
while(adding) {
  current_id <- "next_id"
  current_sent <- "next_sent"
  sent.df <- rbind(sent.df, data.frame(id=current_id, sentences=current_sent))
  adding <- FALSE
}

but this is really slow and should be avoided.

OTHER TIPS

sent.df <- data.frame(ID=id1, Sentences=sent1, stringsAsFactors=FALSE)
sent.df <- rbind(sent.df, c(id2, sent2))
sent.df <- rbind(sent.df, c(id3, sent3))
sent.df <- rbind(sent.df, c(id4, sent4))
sent.df
#     ID Sentences
# 1 01xx        ab
# 2 02xx        bc
# 3 03xx        cd
# 4 04xx        de

If you want to do it easily just avoid factor while binding by setting this option before your operation :

options(stringsAsFactors = FALSE)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow