Question

Looking for a way to speed up reading and processing a large text file (basically csv; stream_lf).

Should I bypass RMS? Solution may be asynchronous or synchronous.

Current implementation is synchronous, but is too slow.

Implementation is in HP Pascal, and using the pascal run-time library (OPEN/READLN/EOF/CLOSE). Bypassing the pascal run-time library is acceptable.

Examples may be in C or Pascal.

Was it helpful?

Solution

For system block was set to 32. I tried SET RMS/BLOCK=32/BUF=8. That already gave an improvement.

[edit: If there is no process setting, then the system setting us used. So the test done added buffers, but did not make them bigger]

32 is just 16KB. Great for 1992, lame for 2012. If more buffers already helped, then larger buffers is likely to help even more. The larger the better. Multiples of 8KB may help just a but extra. Thus try 128, and also try 255 at the SET RMS process level. If it brings happiness, then you may want to adapt the process to select its own RMS settings and not rely on DCL settings.

The RMS $GET call will normally only get a single record, but you could 'lie' about the the file, with SET FIL/ATTR=(RFM=UDF) or perhaps (RFM=FIX,LRL=8192). You can do that temporarily in a program using SYS$MODIFY. After that you can read in big chunks but your program will need to decode the real records in the spoofed records. That will be much like using SYS$READ / SYS$QIOW (BlockIO) but sticking to record mode will give you free 'read ahead'. Yeah you can code that yourself with aysnc IO, but that's a hassle.

Btw... don't go crazy on the number of buffers. In benchmarks (many years ago) I saw little or negative benefits with more than 10 or so. The reason is that RMS does 'read ahead' but not 'keep ahead'. It fills all buffers asynchroneously, but then posts no additional read as buffers get processed. Only when all data is consumed will is re-issue IOs for all buffers, instead ot trying to keep ahead as buffers are processed. Those 'waves' of IOs can confuse storage subsystem, and the first IO in the wave may be slowed down by the rest of the wave... so the program waits.

How much data is in play? tens of megabytes or gigabytes> Will the XFC cache have a change to cache it between the exports and the processing?

Met vriendelijke groetjes. Hein.

OTHER TIPS

For seqeuntial files: Take a look into the WASD or VWCMS code (look at http://www.vsm.co.au/wasd). I know these bypass RMS as well in favour of speed for web services, but I don't know in whar sources that is done. For relative files, take sequence into account. Records could be non-exixtant (empty?) For indexed files: DON'T. Use RMS instead, because of the internal structure (indexes are interwoven with data in these files. In a new/reoriganized file, it's Ok, but lack of maintenance will cause problems in access outside RMS)

Hmm, not much concrete to go on here.

Do you know for a fact that RMS is slowing you down?

Compare your processing time (IO, CPU, Elapsed) with SEARCH/STAT/WIN=0 One indication could be low USER mode, high EXEC more, high IDLE. Use MONI MODE, or GETJPI with EXECTIM, USERTIM

Optimal RMS reading through PASRTL, or directly, would probably mean several large buffer with read-ahead an NO SHARING or readonly sharing.

Re-try after: $ SET /RMS/BLO=127/BUF=8 ! $! Block=255 or 128 for recent OpenVMS versions (8.4, or 8.3+patches)

IF there are many small records, and EXEC mode is high, then there may be too much time getting in and out of RMS to extract the records. In that case try C or COBOL to read the file non-shared. The RTL (Run time library) for both will use BLOCKIO, not record IO to avoid RMS overhead. They still honor the RMS buffer settings mentioned above. Try?

Good luck and... let us know how you make out? Some before/after numbers perhaps.

Cheers, Hein

Use C. Bypass RMS.

fopen the file.

fseek to end.

ftell to get file size

malloc a chunk of memory that size

fread it in one go.

One might suspect if your file is a lot larger than your working set, that paging may be what is eating your wall clock.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top