Global mapping of one subscript dimension to another database

https://stackoverflow.com/questions/16901565

30-05-2022
|

Question

I have a vendor defined database (about 140GB total) on Caché 2007. It uses the old style MUMPS programming environment and accesses globals directly in a hierarchical style. There is one global that accounts for about 75% of the total database size. The first subscript in this table is an artificial integer account number. The next 2-3 subscripts are constant subrecord identifiers that break up blocks of fields and denote repeating sub record kinds.

One of these repeating subrecords (record type 30) is for notes on an account. Because of the way the system is used, this dimension accounts for a very large portion of the global's total space; I'd estimate it to be at least 50%. Because of the way Caché stores data physically in the database, a scan of this global ends up loading all or most of these notes as a side effect even though they aren't relevant to most operations. It has the effect of greatly increasing the cost of IO operations on the global, especially when you only want one tiny detail from a bunch of accounts.

Example subscript references for this global:

^ACCT(3461,10,1)="SOME^DATA"
^ACCT(3461,10,2)="MORE^DATA"
...
^ACCT(3461,30,1)="NOTE1 blah blah"
^ACCT(3461,30,2)="NOTE2 blah blah"
...
^ACCT(3461,30,100)="NOTE100 blah blah"

I can't change the design of the database. It's controlled by an outside vendor and there is a large amount of MUMPS style hardcoded references in the database. I'm thinking that a big reason that batch operations are so slow on the system are due to the high cost of these mostly irrelevant notes coming along for the IO ride whenever account data is accessed. Scanning this whole global (i.e. when there is no useful application maintained index) takes at least 8 hours.

One thought I had is to shift the note data from being stored along side other details in the global to a separate database file by using the global mapping facility described in the Guide to Using Caché Globals and Guide to System Administration. If I could map all the subscript 30s to a separate database file in the same Caché database, most data operations (the ones that don't even care about notes) wouldn't be bringing those in to memory along with the details they do care about.

In the global structure guide (1st link), this looks plausible as they show a particular 2nd subscript mapping separately than the 1st subscript. What they don't show in any of the examples is what the syntax is to make that happen. In the "Add a new global mapping" screen in the Caché Management Portal, I should be able to do something like

Global name: ACCT
Subscripts to be mapped: (BEGIN:END)(30)

But whatever variations I try in the syntax, I always get ERROR #657: Invalid subscript in reference 1 subscript #1.

StackExchange note: This question would possibly be better suited to dba.stackexchange.com but there are apparently zero Intersystems questions there and I don't think it would get any attention.

Solution

Unfortunately, while it's possible to map 2nd level subscripts of a particular node, it's not possible to map 2nd level subscripts of all nodes.

There is an experienced Performance team on WRC, did you try to contact them?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow