Вопрос

I am receiving the following error:

'pattern' must be a non-empty character string 

when trying to run the following:

rapply(as.list(Database1), function(x) agrep(x,Database2, max.distance=c(cost=1), value=T))

with large databases

> length(Database1)
[1] 15876500

> length(Database2)
[1] 605

But not when I run it with small ones

> length(Database1)
[1] 29

> length(Database2)
[1] 8

I know I should put up reproducible code so the databases are just 15-25 character strings of random letters that can be generated using the following:

Database1<- unlist(replicate(n, paste0(sample(LETTERS, m), collapse="")))

where "n" is the length and "m" is an integer between 15-25.

Это было полезно?

Решение

Well I can get that error message by supplying "" to pattern. as seen here but not with other potentially bad patterns:

agrep("", "hello")
agrep(" ", "hello")
agrep(NA, "hello")
agrep(NULL, "hello")


## > agrep("", "hello")
## Error in agrep("", "hello") : 
##   'pattern' must be a non-empty character string

## > agrep(" ", "hello")
## [1] 1

## > agrep(NA, "hello")
## [1] NA

## > agrep(NULL, "hello")
## Error in agrep(NULL, "hello") : invalid 'pattern' argument

So I'm guessing you got a "" in Database`. To check use:

which(Database1 == "")

EDIT:

Use:

rapply(as.list(Database1), function(x) {
    try(agrep(x,Database2, max.distance=c(cost=1), value=T))
)

This will tell you where the errors are and then you can hone in on that element and figure out what's causing it. I'd try on multiple subsets of the data.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top