Save Cluster Variables / Variable PSPP

https://stackoverflow.com/questions/21783727

11-10-2022
|

Question

I am using PSPP (NOT SPSS since I can't get that running on my Ubuntu machine) and having my set of ~100k records clustered with a k-means cluster. Now what I really need is a more detailed output than just how many records are in each cluster. I need the cluster variable saved i.e.

row 1 => cluster 1

row 2 => cluster 4

row 3 => cluster 1

etc...

Essentially I need the extra field that saves the resulting cluster affinity of each record. My current syntax is:

QUICK CLUSTER  cat1 cat2 cat3 cat4 cat5 cat6 cat7 cat8 cat9 cat10 cat11 cat12
/CRITERIA=CLUSTERS(12) MXITER(100000000).

SPSS and PSPP share a lot of the same syntax so if there is an option in SPSS it might work here too.

Solution

Statistics should run on Ubuntu, but the Statistics QUICK CLUSTER command has a subcommand

/SAVE CLUSTER

that should do what you want. You can optionally specify a variable name in parentheses after CLUSTER.

OTHER TIPS

The PSPP does not handle /SAVE CLUSTER subcommand. Try it!

QUICK CLUSTER var_list
      [/CRITERIA=CLUSTERS(k) [MXITER(max_iter)] CONVERGE(epsilon) [NOINITIAL]]
      [/MISSING={EXCLUDE,INCLUDE} {LISTWISE, PAIRWISE}]
      [/PRINT={INITIAL} {CLUSTER}]

See on GNU page of PSPP

I know you're looking for something in PSPP, but your best bet is probably to save the output as an open doc, open up your data file as a .csv in a spreadsheet, then copy in the cluster members ships (assuming you added /print=cluster to your command line).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow