Question

In a webcrawler/webscraper-setting, I'd like to dynamically extend my base Reference Class URL in order to be able to write specific methods for respective hosts/domains. Just to be clear, by dynamically I mean something like "automatically generate class definitions as new domains are encountered (e.g. class URL_something.com which would inherit from class URL)".

Works a treat, the only problem is that my class WebPage expects the value of field url to be of class URL. It will accept objects of class URL_something.com as this inherits from class URL, but then actually turns the object into an instance of class URL. So I lose the information that it's actually of class URL_something.com.

Do you have any idea of how I can prevent losing that crucial information?

Code Example

setRefClass(Class="URL", fields=list(x="character"))
setRefClass(Class="WebPage", fields=list(url="URL"))

obj <- new("WebPage", url=new("URL", x="http://www.something.com/home/index.html"))
obj$url

# Method would recognize that there is no class 'URL_something.com' 
# yet and thus create it:
setRefClass(Class="URL_something.com", contains="URL")

# Another method would take care of mapping field values to 
# an instance of the new class:
> url.obj <- new("URL_something.com", x="http://www.something.com/home/index.html")
> inherits(url.obj, "URL")
[1] TRUE

> obj$url <- url.obj
> class(obj$url)
[1] "URL"
# So I lose the information that it was actually of class "URL_something.com"
Was it helpful?

Solution

Picking up on what Martin said (see comments above): R 2.14.0 fixes what I described above.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top