This was a bit of a challenge - so here is my attempt to code this up in a reasonable format. So first lets make a dataset that looks like yours.
DATA LIST FREE / x y z (3A1).
BEGIN DATA
* b #
g # i
# * l
+ k ""
"" m ""
END DATA.
Now what I do is make a consistent list of symbols and then a dummy variable signaling if that symbol is contained in the original variable.
VARSTOCASES /MAKE V from x TO z /INDEX OrigVar (V).
SORT CASES BY V OrigVar.
CASESTOVARS /ID = V /VIND ROOT = "D" /INDEX = OrigVar.
You will see that the data now looks like below:
V Dx Dy Dz
- -- -- --
* 1 1 0
# 1 1 1
+ 1 0 0
b 0 1 0
g 1 0 0
i 0 0 1
k 0 1 0
l 0 0 1
m 0 1 0
Now if you multiply Dx
and Dy
and then sum the column that is the calculation of the intersection of your two sets. Here I make a macro to ease calculating all of those multiplications over a list. (Unfortunately you can not use the TO
convention here, you will need to list out all 25 variables for your use application on this macro.)
DEFINE !PairInter (!POSITIONAL = !CMDEND).
!DO !I !IN (!1)
!DO !J !IN (!1)
COMPUTE !CONCAT(!I,"_",!J) = !I*!J.
FORMATS !CONCAT(!I,"_",!J) (F3.0).
!DOEND
!DOEND
!ENDDEFINE.
!PairInter Dx Dy Dz.
You will see you now have a list of variables Dx_Dx Dx_Dy Dx_Dz Dy_Dx ..... Dz_Dz
that is the full set of interactions of those variables. I have intentionally written the redundant interactions as it makes making the table easier later on (although I might suggest when displaying the table to only display the lower half).
So now if we sum over the columns we will have the cardinality of each set along with its intersection. Here I use LAG
and just keep the final value in the dataset.
DO REPEAT D = Dx_Dx TO Dz_Dz.
IF ($casenum<>1) D = LAG(D) + D.
END REPEAT.
COMPUTE Order = $casenum.
SORT CASES BY Order (D).
SELECT IF ($casenum = 1).
MATCH FILES FILE = * /DROP V TO Dz Order.
EXECUTE.
Now you can write a MATRIX
procedure to reshape the dataset and print out the table in a nicer format. Here I FLIP
the dataset and then grab the original variable names.
STRING I (A1).
COMPUTE I = "I".
FLIP /NEWNAMES = I.
RENAME VARIABLES (CASE_LBL = V).
COMPUTE V = CHAR.SUBSTR(V,LENGTH(V)).
EXECUTE.
MATRIX.
GET I /FILE = * /VARIABLE = I.
GET V /FILE = * /VARIABLE = V.
COMPUTE I2 = RESHAPE(I,3,3).
COMPUTE V2 = V(1:3).
PRINT I2 /RNAMES =V2 /CNAMES = V2.
END MATRIX.
The printed MATRIX
statement then reads the table of intersections you wanted.
Run MATRIX procedure:
I2
x y z
x 4 2 1
y 2 5 1
z 1 1 3
------ END MATRIX -----
I've made this into a macro, available here. After defining the macro you can simply run
!InterSet x y z.
and it will print the table.