Is there a size limit for R models to be translated to PMML on Windows 8?

Question 1

Unfortunately, the R pmml package does have memory as well as speed limitations. When I released the present version, I did not realize how big "big data" could be! I should add that Windows is not very good at memory efficiency. There have been many models I could not output in a Windows machine....but was able to produce the exact same model faster and with better usage of memory in a linux or mac computer. I have been making improvements on both for the next release version, but for now, based on an experiment for a RF model with 500 trees, applied to a dataset with 50 variables and 50000 rows (~18Mb), the time taken to create a pmml model was 5hrs (linux machine). The average number of nodes in a tree was 4000. A general rule of thumb would be that the memory used to save a pmml object ~2.5x the R object....as you found. The memory used just to save the object as an xml file is a major factor. In the present state of the package, (not yet released), instead of 5hrs, it took 1hr15min. The numbers above are for a linux machine....I expect them to be more than double for a windows machine. Please consider using a non-windows machine for analysis of large datasets; I am sure this applies to most R packages...not just PMML!

Question 2

You could use the r2pmml package when working with large Random Forest models. This package relies on Java PMML class model and XML libraries. As a result, it is a thousand times faster than the standard pmml package. The performance is the same whether you use it on Windows or *NIX. All things considered, your model should be exportable in a couple of seconds time.

I have used the r2pmml library to export a 5 GB Random Forest PMML file in about one minute on my laptop. The trick is to give JVM enough heap size so that it doesn't need to do much garbage collection:

options(java.parameters = c("-Xms8G", "-Xmx16G"))

library("r2pmml")

model.rf <-randomForest(x = data[,2:76], y = data[,1], type = 'regression', ntree = 150)

r2pmml(model.rf, "/tmp/rf.pmml")