문제

I am receiving the following error:

'pattern' must be a non-empty character string 

when trying to run the following:

rapply(as.list(Database1), function(x) agrep(x,Database2, max.distance=c(cost=1), value=T))

with large databases

> length(Database1)
[1] 15876500

> length(Database2)
[1] 605

But not when I run it with small ones

> length(Database1)
[1] 29

> length(Database2)
[1] 8

I know I should put up reproducible code so the databases are just 15-25 character strings of random letters that can be generated using the following:

Database1<- unlist(replicate(n, paste0(sample(LETTERS, m), collapse="")))

where "n" is the length and "m" is an integer between 15-25.

도움이 되었습니까?

해결책

Well I can get that error message by supplying "" to pattern. as seen here but not with other potentially bad patterns:

agrep("", "hello")
agrep(" ", "hello")
agrep(NA, "hello")
agrep(NULL, "hello")


## > agrep("", "hello")
## Error in agrep("", "hello") : 
##   'pattern' must be a non-empty character string

## > agrep(" ", "hello")
## [1] 1

## > agrep(NA, "hello")
## [1] NA

## > agrep(NULL, "hello")
## Error in agrep(NULL, "hello") : invalid 'pattern' argument

So I'm guessing you got a "" in Database`. To check use:

which(Database1 == "")

EDIT:

Use:

rapply(as.list(Database1), function(x) {
    try(agrep(x,Database2, max.distance=c(cost=1), value=T))
)

This will tell you where the errors are and then you can hone in on that element and figure out what's causing it. I'd try on multiple subsets of the data.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top