Oracle analytic function window defined by data
-
22-09-2019 - |
Question
I have a table which represents a line-by-line dump of the data read from a particular text file format. Each line may represent a "master" or a "detail" line, indicated via rec_type
code. I'd like to write a query that gets the "master" lines alongside the associated detail lines. I've come up with something that does the job, but it seems a bit hackish and am interested in better ways if any.
CREATE TABLE mdtest
(rec_seq NUMBER PRIMARY KEY
,rec_type VARCHAR2(3) NOT NULL
,rec_data VARCHAR2(100) NOT NULL);
INSERT INTO mdtest VALUES (1, '100', 'Bill Jones');
INSERT INTO mdtest VALUES (2, '200', '20080115,100.25');
INSERT INTO mdtest VALUES (3, '100', 'John Smith');
INSERT INTO mdtest VALUES (4, '200', '20090701,80.95');
INSERT INTO mdtest VALUES (5, '200', '20091231,110.35');
Desired Result:
SEQ_EMP EMP_NAME SEQ_DATA EMP_DATA
======= ========== ======== ===============
1 Bill Jones 2 20080115,100.25
3 John Smith 4 20090701,80.95
3 John Smith 5 20091231,110.35
Assumptions:
- records are processed in sequence of rec_seq
- first record type is a "
100
" - each "
100
" record has 1 or more "200
" records following
Note: this is for Oracle 9i, however we should be upgrading to 11g R1 this year.
Solution
Here's what I have so far:
SELECT seq_emp
,SUBSTR(emp_seq_name,10) emp_name
,seq_data
,emp_data
FROM (SELECT MAX(CASE WHEN rec_type = '100' THEN rec_seq END)
OVER (ORDER BY rec_seq
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) seq_emp
,MAX(CASE
WHEN rec_type = '100'
THEN TO_CHAR(rec_seq,'fm00000000') || '|' || rec_data
END)
OVER (ORDER BY rec_seq
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) emp_seq_name
,rec_seq seq_data
,rec_type
,rec_data emp_data
FROM mdtest)
WHERE rec_type = '200'
ORDER BY seq_data;
As you can see, I'm using the MAX reporting analytic function with a window starting from the top of the set down to the current row, to get the relevant "100" record for the current "200" record; then in the outer query I'm discarding the unneeded "100" records.
To get emp_name, I've had to append the rec_seq with the data so that the MAX function still picks the correct header record; then in the outer query I chop the rec_seq off.
I've played with other analytic functions and syntax including FIRST_VALUE and the KEEP syntax, but neither of these seem to make this job simpler; the difficulty is that the window is defined by the value of rec_type instead of being some constant offset.
OTHER TIPS
In a desire for simplicity, do you think it's worth loading each record type into a separate import table before processing?
create table mdtest100 as select * from mdtest where rec_type = 100;
create table mdtest200 as select * from mdtest where rec_type = 200;
with mdtest_detail as
(
select
(select max(m.rec_seq) from mdtest100 m
where m.rec_seq < r200.rec_seq) master_rec_seq,
r200.*
from
mdtest200 r200
)
select
m.rec_seq seq_emp,
m.rec_data emp_name,
d.rec_seq seq_data,
d.rec_data emp_data
from
mdtest_detail d
inner join mdtest100 m on m.rec_seq = d.master_rec_seq
order by
seq_emp,
seq_data;
SEQ_EMP EMP_NAME SEQ_DATA EMP_DATA
1 Bill Jones 2 20080115,100.25
3 John Smith 4 20090701,80.95
3 John Smith 5 20091231,110.35
This might lend itself to a more maintainable solution and will let you parse out and validate the comm-separated EMP_DATA field separately.
Just a thought - sorry if you're only looking for an Analytics solution.