SPSS - How to isolate the maximum value in a variable so you can use it in syntax?

StackOverflow https://stackoverflow.com/questions/17727903

  •  03-06-2022
  •  | 
  •  

Pregunta

I have a table of areas with data. For a particular operation, I want to exclude the top and bottom 1% of areas, as they include extreme outliers.

Seems to me that a way forward is:

SORT CASES BY theVariableIwantToAnalyse  (A) .
NUMERIC id (F12.0) .  * create a casenum label "id"
COMPUTE id = $CASENUM. * populate it with casenum
EXECUTE.
NUMERIC idmax (F12.4) .   * create a variable to contain the highest value for "id" 
NUMERIC id1perc (F12.4) . * create a variable to contain 1% of the highest value for "id"  
COMPUTE idmax = MAX(id) .    * determine the highest value for id. This 'mock-syntax' line does not work.   
COMPUTE id1perc = idmax / 100 . * 1% of the highest value for "id"  
SELECT CASES WHERE ID >= id1perc or ID <= idmax - id1perc .

Draw graphs etc. I then need to

SORT CASES BY theNextVariableIwantToAnalyse  (A) .
COMPUTE id = $CASENUM. * populate it with the NEW casenum order
EXECUTE.

etc ...

¿Fue útil?

Solución

Try this to simply filter out the top and bottom 1% - just add FILTER BY filter. to turn off all extreme cases, or SELECT IF filter. ... EXECUTE. to delete them

EDIT: note that repeated values will be condensed by the RANK method (specifically the /TIESoption). This might not be ideal if you have the possibility of repeated values. Change the /TIES option if that's the case.

************* GENERATE RANDOM DATA *****************.
INPUT PROGRAM.
-       LOOP #I = 1 TO 1000.
-             COMPUTE Y = RV.NORMAL(100,10).
-           END CASE.
-       END LOOP.
-       END FILE.
END INPUT PROGRAM.

dataset name exampleData WINDOW=front.
EXECUTE.


************* RANK DATA  *************.
DATASET ACTIVATE exampleData.
RANK VARIABLES=Y (A)
  /RFRACTION INTO fractile
  /TIES=CONDENSE.

************* MAKE A FILTER  *************.
COMPUTE filter = (fractile>0.01 AND fractile < 0.99).
EXECUTE.

* Chart Builder.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y filter MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y=col(source(s), name("Y"))
  DATA: filter=col(source(s), name("filter"), unit.category())
  GUIDE: axis(dim(1), label("Y"))
  GUIDE: axis(dim(2), label("Frequency"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("filter"))
  ELEMENT: interval.stack(position(summary.count(bin.rect(Y))), color.interior(filter), 
    shape.interior(shape.square))
END GPL.

Otros consejos

A MUCH easier solution is just to use RANK and then select on the ranks you want to exclude.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top