Question

We have an R program that filters some data in a table and creates a new table with the results. On Windows and OSX, the program runs and our table is created properly. However, on our Linux (Ubuntu 12.04) server, the same R program produces a table with garbage data.

When we compare the garbage data produced on Linux to the proper data, we find that:

  • Seemingly arbitrary numbers in columns that should have text values
  • Extra rows

We think the issue is something with encoding, but all of our efforts to change the encoding of the database have failed so far.

Our R script uses RMySQL to connect with a MySQL Database, filter the contents, and write it to a new table (using the dbReadTable and dbWriteTable commands). We know that the commands themselves are not the problem, as we are able to examine the data.frame before and after filtering them - the problem is with dbWriteTable.

These two links seem to be closest to the solution to our problem, but we have to wait for the pull request to go through:

  1. https://github.com/jeffreyhorner/RMySQL/issues/6
  2. https://github.com/gagern/RMySQL/commit/b0fbef105ca61d69992a2ec5a5eafde30530b8d5

And these are also relevant:

  1. http://zee.balogh.sk/?p=928
  2. What does character set and collation mean exactly?
Was it helpful?

Solution

From past experience I will suggest that this is not a problem in dbWriteTable; and is not even an encoding issue!

It is likely that you have stringsAsFactors = T when writing the data.frame, and those numbers are the factor numbers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top