Вопрос

I'm trying to run a data science experiment as explained in this tutorial. The tutorial has 5 lessons, and each lesson has a few sub-sections in it.

For this tutorial, I am using

  • a SQL Server 2016 RC3 Virtual Machine on Azure, with R Services enabled.
  • RRE for Windows 8.0.0 as a data science client / R Client (for remote connection to SQL Server)
  • A new SQL Login created with read, write and ddl access to the database - used to connect to the SQL Server via the R client.

I've successfully completed Lesson 1, i.e. Create SQL Server data objects from my R client, Query and Modify SQL Server data, and Define/set compute contexts.

I'm stuck with an error in the beginning of Lesson 2 of the tutorial.
Once I change the compute context from local to sql server, a simple summarize function (rxsummary) is throwing an error.

The error looks like this:

C:\Users\...\Project0\DeepDive Experiment.R(109): Error in try({ : 
  ODBC statement error: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure 'master..xp_ScaleR_init_job'.
Error in rxInDbJobIdParam(schedulerJobInstance, FALSE) : 
  hpcServerJob object has an invalid id. Ensure it was returned from a prior rxStartClusterJob() call
Error in rxStartClusterJob(hpcServerJob, timesIsValidated = TRUE, continueOnFailure = FALSE) : 
  Error in try({ : 
  ODBC statement error: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure 'master..xp_ScaleR_init_job'.

Error in rxInDbJobIdParam(schedulerJobInstance, FALSE) : 
  hpcServerJob object has an invalid id. Ensure it was returned from a prior rxStartClusterJob() call


Any help on

  • why this error is occurring?
  • how to find/check stored procedures in the master database - how to check if xp_scaleR_init_job exists?
  • how to add/create the stored procedure if it does not exist?

will be appreciated.


For easy access, here's the full commented script until I reach the error:

###########################################DATA SCIENCE DEEP DIVE TUTORIAL###############################################

##Create the SQL Server Data Objects##

#Provide your database connection string in an R variable. 
    #DDUser01 is a login created on the sql server instance for remote login. 
    #It has read, write and ddl access to the DeepDive database.
sqlConnString <- "Driver=SQL Server;Server=*ip address*; Database=DeepDive;Uid=DDUser01;Pwd=*******"   
#Specify the name of the table you want to create, and save it in an R variable.
sqlFraudTable <- "ccFraudSmall"
#Chunking
sqlRowsPerRead = 5000
#Define a variable to store the new data source
sqlFraudDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlFraudTable, rowsPerRead = sqlRowsPerRead) 
#Create a new R variable, sqlScoreTable, to store the name of the table used for scoring.
sqlScoreTable <- "ccFraudScoreSmall"
#Define a second data source object 
sqlScoreDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlScoreTable, rowsPerRead = sqlRowsPerRead)  

##Load Data into SQL Tables Using R##

#Create an R variable, and assign to the variable the file path for the CSV file.
ccFraudCsv <- file.path(rxGetOption("sampleDataDir"), "ccFraudSmall.csv")
#RxTextData function to specify the text data source.
inTextData <- RxTextData(file = ccFraudCsv,      colClasses = c(   
    "custID" = "integer", "gender" = "integer", "state" = "integer",   
    "cardholder" = "integer", "balance" = "integer",    
    "numTrans" = "integer",   
    "numIntlTrans" = "integer", "creditLine" = "integer",    
    "fraudRisk" = "integer"))  
#Call rxDataStep to insert the data into the SQL Server table
rxDataStep(inData = inTextData, outFile = sqlFraudDS, overwrite = TRUE)   

#Variable for creating a path to the source file - score
ccScoreCsv <- file.path(rxGetOption("sampleDataDir"), "ccFraudScoreSmall.csv") 
#RxTextData function to get the data and save it in the variable
inTextData <- RxTextData(file = ccScoreCsv,      colClasses = c(   
    "custID" = "integer", "gender" = "integer", "state" = "integer",   
    "cardholder" = "integer", "balance" = "integer",    
    "numTrans" = "integer",   
    "numIntlTrans" = "integer", "creditLine" = "integer"))  
#Call rxDataStep to overwrite the current table with the new schema and data.
rxDataStep(inData = inTextData, sqlScoreDS, overwrite = TRUE)

##Query the Data ##

#Use the function rxGetVarInfo and specify the data source you want to analyze
rxGetVarInfo(data = sqlFraudDS)   

##Modify Metadata##

#Mapping of USA State abbreviations (categorical) to their integer identifiers
#Create an R variable that holds the vector of strings to add to it - different states of the USA.
stateAbb <- c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC",     
    "DE", "FL", "GA", "HI","IA", "ID", "IL", "IN", "KS", "KY", "LA",   
    "MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NB", "NC", "ND",   
    "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI","SC",   
    "SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY")  
#Create a column information object that specifies the mapping of the existing integer values to the categorical levels
    #This statement also creates factor variables for gender and cardholder.
ccColInfo <- list(         
    gender = list(   
        type = "factor",    
        levels = c("1", "2"),  
        newLevels = c("Male", "Female")),      
    cardholder = list(type = "factor",    
        levels = c( "1", "2"),     
        newLevels = c("Principal", "Secondary")),     
        state = list(type = "factor", levels = as.character(1:51), newLevels = stateAbb)   
        )  
#Update the SQL Server data source that uses the updated data
sqlFraudDS <- RxSqlServerData(connectionString = sqlConnString,  
table = sqlFraudTable, colInfo = ccColInfo,  
rowsPerRead = sqlRowsPerRead) 
#Query new information
rxGetVarInfo(data = sqlFraudDS)     

##Create and Set a Compute Context##

#Specify the connection string for the instance where computations will take place.
sqlConnString <- "Driver=SQL Server;Server=*ip address*; Database=DeepDive;Uid=DDUser01;Pwd=*******"   
#Specify the location of the shared directory (temp folder for workspace objects) and save it in a variable.
sqlShareDir <- paste("c:AllShare", Sys.getenv("USERNAME"), sep="")   
#Create shared directory if it does not exist
if (!file.exists(sqlShareDir)) dir.create(sqlShareDir, recursive = TRUE)
#Specify how you want the output handled. 
    #Here, you are indicating that the R session on the workstation should always wait for R job results, 
    #but not return console output from remote computations.
sqlWait <- TRUE   
sqlConsoleOutput <- FALSE 
#Define the compute context object
sqlCompute <- RxInSqlServer(  
     connectionString = sqlConnString,        
     shareDir = sqlShareDir,       
     wait = sqlWait,   
     consoleOutput = sqlConsoleOutput)  
#ALTERNATIVE:Enable Tracing on the Compute Context
sqlComputeTrace <- RxInSqlServer(   
    connectionString = sqlConnString,        
    shareDir = sqlShareDir,  
    wait = sqlWait,   
    consoleOutput = sqlConsoleOutput,       
    traceEnabled = TRUE,    traceLevel = 7)  
#Change Compute Context to the Server
rxSetComputeContext(sqlCompute)   

##Compute Summary Statistics##

#Compute summary statistics for several of the variables
##THIS IS WHERE I FACE THE ERROR##
sumOut <- rxSummary(formula = ~gender + balance + numTrans + numIntlTrans + creditLine, data = sqlFraudDS)  
Это было полезно?

Решение

Apparently, SQL Server 2016 needs an R Server Client (RRE) that is atleast 8.0.3. This website talks about it. I also got the same answer from Microsoft Support.

The R server I have is RRE 8.0.0. That was probably the reason for the error that I received. When I installed the Microsoft R Client instead, the script worked (barring rxCube) ! I could push computations to SQL Server and finish the tutorial.

Другие советы

If you type rxGetOption("demoScriptsDir") in your R console you should see the directory where the sample scripts are, eg

enter image description here

Work through the script RevoScaleR_SqlServer_GettingStarted.r as it has all the code - I think you are missing some bits (like the sqlShareDir bit). I would also say these scripts aren't perfect, I had a lot of issues with them, sometimes running them twice helped.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с dba.stackexchange
scroll top