Replacing NAs in R dataframe with zeros conditional on an id

Question 1

Here is another ddply alternative where you don't have to specify the variable names which the function should be applied on. By using numcolwise, the function operates on all numerical columns.

library(plyr)

myfun <- function(x){
  x[is.na(x) & (sum(!is.na(x) & x == 0) > 0)] <- 0
  x}

ddply(df, .(FIRMID), numcolwise(myfun))

#   FIRMID VAR1 VAR2
# 1  FIRM1    0    1
# 2  FIRM1    0   NA
# 3  FIRM2    1    0
# 4  FIRM2   NA    0

Or in base R, where I assume that the first column contains the grouping variable (dat[ , -1]). You could of course refer to it by name instead.

df2 <- do.call(rbind, by(df, df[ , "FIRMID"], function(dat){
  sapply(dat[ , -1], function(x){
    myfun(x)
  })
}))

data.frame(FIRMID = df$FIRMID, df2)

#   FIRMID VAR1 VAR2
# 1  FIRM1    0    1
# 2  FIRM1    0   NA
# 3  FIRM2    1    0
# 4  FIRM2   NA    0

Update 'myfun' can be written much simpler. Thanks @Arun for the suggestion!

myfun <- function(x){
  x[is.na(x) & any(x == 0)] <- 0
  x}

Question 2

If you don't have only integers you might need to adjust this for comparing floating point numbers:

DF <- read.table(text="FIRMID    VAR1     VAR2        
FIRM1     0        1
FIRM1     NA       NA
FIRM2     1        0
FIRM2     NA       NA", header=TRUE)

na_replace <- function(x) {
  if (any(na.omit(x)==0L)) x[is.na(x)] <- 0L
  x
}

library(plyr)
ddply(DF, .(FIRMID), transform, 
      VAR1=na_replace(VAR1),
      VAR2=na_replace(VAR2))

#  FIRMID VAR1 VAR2
#1  FIRM1    0    1
#2  FIRM1    0   NA
#3  FIRM2    1    0
#4  FIRM2   NA    0

Question 3

you can use ddply here. but it would be very inefficient if the data.frame is very large. if not, then you could try:

your.data.frame<-ddply(your.data.frame,~FIRMID,function(x){
if ( any(x[!is.na(x$VAR1),"VAR1"]==0)){x[is.na(x$VAR1),"VAR1"]<-0}
if ( any(x[!is.na(x$VAR2),"VAR2"]==0)){x[is.na(x$VAR2),"VAR2"]<-0}
x})

but very inelegant

EDIT: my code before didn't worked, so I fixed it :)