Running ObjectScript programs against Intersystems Cache to provide data with SQL Server

https://stackoverflow.com/questions/11111190

15-06-2021
|

Question

Background

The main application where I work is based heavily on the MUMPS-esque Caché database engine from InterSystems. Everything is stored in global arrays. Getting data out of the system for external reporting ranges from simply being a pain to being egregiously slow and painful.

Caché provides an ODBC driver for the database but unless the global arrays involved happen to be keyed by the selection criteria, it resorts to scans and a simple query will take hours to run. For scale, the entire Caché production namespace is about 100GB. I can write ObjectScript (Intersystems' dialect of MUMPS) programs that pull data much faster than the ODBC driver in these cases.

Part of the problem I think is that the application vendor doesn't use Caché's object persistence support but instead has the SQL tables defined as a façade over the global arrays, and it often doesn't work well for batch requests.

I built a reporting database in MS SQL Server that pulls the most common data (2.5GB worth) and even if it has to scan every table, all results are returned within 3 seconds. Unfortunately, it takes a long time to refresh the data so I can only do a full refresh once a week and an active refresh once a day. This is good enough for most needs, but I want to do better.

I'm on Caché 2007, SQL Server 2008 R2, VS2010 on Windows 7 and Windows Server 2008 R2.

Scope of question

I need a way to integrate live data from the source Caché database with other data on SQL Server. I want to be able to integrate views or table valued functions into a SQL query and have it pull live data from the source db.

The live data must be available within SQL Server for processing. Doing it with a secondary application would be a huge pain and wouldn't work with reporting tools that just expect to push a query over ODBC and get a final dataset in the right format.
I understand that there are ways to get data into SQL Server or accomplish the same general things I want to do. That's not what this question is about.
The data needs to come from ObjectScript programs run on Caché since not all the data I need is exposed through the SQL defined tables and I get the control I need to make the performance usable with ObjectScript.
I am looking for advice on any new options or how I can improve one of the options I've tried or considered or other pros or cons for those approaches.

What I've tried so far

This project has been an exercise in frustration where each promising avenue I've looked into is either terrible or doesn't work for some reason. Often the reason is some unnecessary restriction on SQLCLR assemblies.

Pulling everything through InterSystem's Caché ODBC driver via a linked server. SQL Server often resorts to scans if it can't push conditions to the remote server or has to perform a join locally. A scan of any nontrivial table takes many hours and is unacceptable. Also, the length of many columns is incorrectly defined by the SQL table definitions in Caché; SQL Server doesn't like that and aborts the query. See this SO question. I can't change the table defs and the vendor doesn't think it's a problem because it works with MS Access.
Using OPENQUERY on demand. This works to some extent but I can still have the column length problem from the previous item and there's no way to parameterize OPENQUERY queries so that makes it pretty useless to pull contextual data.
Using SQLCLR to call the ODBC data provider through CLR table valued functions. This takes care of the parameterization and data length issues, although it does require me to define or modify a function each time I need a new piece of data. Unfortunately, not all the data elements I'm interested in are available through SQL. For some things, I need to access the global arrays directly.
Intersystems provides an ActiveX control that lets you run ObjectScript programs over TCP on the server and get the results. This works great in a stand-alone C# app but as soon as I try to make a connection from a SQLCLR assembly I get a ridiculous URI error:

A .NET Framework error occurred during execution of user-defined routine or aggregate "GetActiveAccounts": System.UriFormatException: Invalid URI: The URI is empty. System.UriFormatException: at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind) at System.Uri..ctor(String uriString) at System.ComponentModel.Design.RuntimeLicenseContext.GetLocalPath(String fileName) at System.ComponentModel.Design.RuntimeLicenseContext.GetSavedLicenseKey(Type type, Assembly resourceAssembly) at System.ComponentModel.LicenseManager.LicenseInteropHelper.GetCurrentContextInfo(Int32& fDesignTime, IntPtr& bstrKey, RuntimeTypeHandle rth) at FacsAccess.GetActiveAccounts.Client.connect() at FacsAccess.GetActiveAccounts.Client..ctor() at FacsAccess.GetActiveAccounts.E1.GetEnumerator()

See this unanswered SO question. There are other postings about it on the net but no one seems to have a clue. This is an extremely simple COM wrapper over a C++ DLL; it's not doing anything with licensing and has no reason to be in the managed licensing libraries. I wonder if this is some kind of boilerplate that's trying to get the name for an assembly that doesn't have a name because it's been loaded into the SQL database.
Intersystems also provides a more direct unmanaged interface but those interfaces are all C++, which I can't use through P/Invoke and I can't load a C++/CLI mixed mode impure assembly in SQLCLR.

Options I've considered but seem kind of terrible

I've considered trying the ActiveX control through SQL Server's COM support but that's terribly slow and really cumbersome.
I could create an out of process service to proxy the traffic but I can't use .NET remoting from SQLCLR and you're not supposed to use WCF and it would be really heavy weight for a such a simple interface anyway. I'd sooner roll my own IPC interface.
I could write some kind of extra unmanaged wrapper with a C style interface for the VisM or CacheDirect interfaces and access THAT through P/Invoke.

It doesn't seem like this should be so hard but it's really driving me up the wall and I need some perspective.

Solution

I think you can use ODBC via a linked server accessing stored procedures on the Cache database that are visible to the ODBC driver and that return result sets, but are not implemented using SQL.

I am 100% certain you can create such stored procedures and access them via ODBC, but I have never tried accessing them from SQL server as a linked server. Even if the linked server doesn't work, it seems like it would be preferable to access via the Intersystem's ODBC driver rather than the Active X control or CacheDirect.

I have an example of such a procedure for this question.

In case that link dies, here is the code:

Query OneGlobal(GlobalName As %String) As %Query(ROWSPEC = "NodeValue:%String,Sub1:%String,Sub2:%String,Sub3:%String,Sub4:%String,Sub5:%String,Sub6:%String,Sub7:%String,Sub8:%String,Sub9:%String") [SqlProc]
{
}

ClassMethod OneGlobalExecute(ByRef qHandle As %Binary, GlobalName As %String) As %Status
{
    S qHandle="^"_GlobalName
    Quit $$$OK
}

ClassMethod OneGlobalClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = OneGlobalExecute ]
{
    Quit $$$OK
}

ClassMethod OneGlobalFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = OneGlobalExecute ]
{

    S Q=qHandle  
    S Q=$Q(@Q)  b  
    I Q="" S Row="",AtEnd=1 Q $$$OK
    S Depth=$QL(Q)
    S $LI(Row,1)=$G(@Q)
    F I=1:1:Depth S $LI(Row,I+1)=$QS(Q,I)
    F I=Depth+1:1:9 S $LI(Row,I+1)=""
    S AtEnd=0
    S qHandle=Q
    Quit $$$OK
}

FYI, that is not an example to use in production, since it exposes all the data from all the globals.

However, it sounded like you were prepared to write Cache Object Script to directly get the data anyway - this is a template for doing so. The main thing to understand is that qHandle will be passed back by the ODBC driver on each call, so you can use it to store state. If there is a lot of state, make qHandle be an integer index into a temporary global holding the "real" state, and clean that up in the close method.

Since you are concerned about performance, you may want to also implement a

MyQueryFetchRows (ByRef qHandle As %Binary, FetchCount As %Integer = 0, ByRef RowSet As %List, ByRef ReturnCount As %Integer, ByRef AtEnd As %Integer) As %Status

method - see the documentation for %Library.Query for more details.

If you really need this to appear to ODBC as a (read-only) table rather than a stored procedure I think it might be possible - but I've never before tried to see if an arbitrary stored procedure can be exposed as a read-only table and I'm not sure how easy it is, or if it's actually always possible.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow