문제

I think there's a bug in how MonetDB.R filters NAs, see example code below:

Handy utility function for doing general SQL-queries on monet.frame objects:

#' Apply general SQL queries to a monet.frame object and return the 
#' result in a new monet.frame.
#' 
#' @note Likely to break if  \code{attr(data, "query")} contains 
#'      LIMIT or OFFSET statements.
#' 
#' @param _data a monet.frame object
#' @param query an SQL query, using "_DATA_" as the placeholder for the
#'     name of the table underlying the \code{_data}-object.
#' @param keep_order should ORDER BY statements in the original query be kept? 
#'     Will break if columns in the ORDER BY statement are not in the returned 
#'     table.
#' @importFrom stringr str_extract_all 
#' @export   
transform.monet.frame <- function(`_data`, query, keep_order=TRUE, ...){
    stopifnot(require(stringr))
    nm <- paste(sample(letters, 15, rep=TRUE), collapse="")
    oldquery <- attr(`_data`, "query")
    if(has_order <- grepl("(ORDER BY)", attr(`_data`, "query"))){
        pattern <- "(ORDER BY[[:space:]]+[[:alnum:]]+((,[[:space:]]*[[:alnum:]]+)*))"
        pattern <- ignore.case(pattern)
        orderby <- str_extract_all(oldquery, pattern)[[1]]
        oldquery <- gsub(pattern, "", oldquery, ignore.case = TRUE)
    } 
    query <- gsub("_DATA_", paste("(", oldquery, ") AS", nm), query)
    if(has_order & keep_order) query <- paste(query, orderby)
    monet.frame(attr(`_data`, "conn"), query)
}

Example:

# library(MonetDB.R); monetdb <- dbConnect( MonetDB.R(), ... etc
set.seed(1212)
tablename <- paste(sample(letters, 10), collapse="")
data  <- data.frame(x=rnorm(100), f=gl(2, 50))

# introduce some NAs ...
data$xna <- data$x
data$xna[1:10] <- NA

dbWriteTable(monetdb, tablename, data)
dm <- monet.frame(monetdb, tablename)   

str(na.omit(dm$xna))
# MonetDB-backed data.frame surrogate
# 1 column, 100 rows
# Query: SELECT xna FROM gcxinabtme WHERE (  NOT (('xna') IS NULL) ) 
# Columns: xna (numeric)

100 rows !?! should be 90...

nrow(transform(dm, "SELECT xna FROM _DATA_ WHERE (xna IS NOT NULL)"))
# 90 
## as it should be
nrow(transform(dm, "SELECT xna FROM _DATA_ WHERE ('xna' IS NOT NULL)"))
# 100
## so quoting the column name seems to mess this up..   

I think I understand why quoting the column name is necessary (so this works for non-standard column names as well, right?), but why would this mess up the query result? Shouldn't these two be perfectly equivalent queries? Also, if it's really necessary to quote the column names, why is the first occurence of xna not quoted in

# Query: SELECT xna FROM gcxinabtme WHERE (  NOT (('xna') IS NULL) ) 

I noticed this because it also makes other monet.frame-methods behave unexpectedly, e.g.:

 quantile(dm$xna, na.rm=TRUE)
 # 0%        25%        50%        75%       100% 
 # NA -0.9974738 -0.3033412  0.4272321  2.6715264 

EDITED to add:

na.fail seems to be broken as well:

It does not raise an error, but instead returns NULL when applied to a column holding NAs, with a cryptic warning that would indicate at first glance that there are, in fact, no NAs:

str(na.fail(dm$xna))
# NULL
# Warning message:
# In monet.frame.internal(attr(x, "conn"), nquery, .is.debug(x), nrow.hint = NA,  :
#   SELECT xna FROM gcxinabtme WHERE ( ('xna') IS NULL )  has zero-row result set.

If there are no NAs, na.fail() should return its argument unchanged according to the generic's documentation, but it doesn't do that either:

str(na.fail(dm$x))
# NULL
# Warning message:
# In monet.frame.internal(attr(x, "conn"), nquery, .is.debug(x), nrow.hint = NA,  :
#   SELECT x FROM gcxinabtme WHERE ( ('x') IS NULL )  has zero-row result set.
도움이 되었습니까?

해결책

the quoting of the column name should use double quotes. Will investigate why it is not doing so.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top