Pergunta

Is there any way to import SPSS dataset into Python, preferably NumPy recarray format? I have looked around but could not find any answer.

Joon

Foi útil?

Solução

Maybe this will help: Python reader + writer for spss sav files (Linux, Mac & Windows) http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/

Outras dicas

SPSS has an extensive integration with Python, but that is meant to be used with SPSS (now known as IBM SPSS Statistics). There is an SPSS ODBC driver that could be used with Python ODBC support to read a sav file.

Option 1 As rkbarney pointed out, there is the Python savReaderWriter available via pypi. I've run into two issues:

  1. It relies on a lot of extra libraries beyond the seemingly pure-python implementation. SPSS files are read and written in nearly every case by the IBM provided SPSS I/O modules. These modules differ by platform and in my experience "pip install savReaderWriter" doesn't get them running out of the box (on OS X).
  2. Development on savReaderWriter is, while not dead, less up-to-date than one might hope. This complicates the first issue. It relies on some deprecated packages to increase speed and gives some warnings any time you import savReaderWriter if they're not available. Not a huge issue today but it could be trouble in the future as IBM continues to update the SPSS I/O modules to deal new SPSS formats (they're on version 21 or 22 already if memory serves).

Option 2 I've chosen to use R as a middle-man. Using rpy2, I set up a simple function to read the file into an R data frame and output it again as a CSV file which I subsequently import into python. It's a bit rube-goldberg but it works. Of course, this requires R which may also be a hassle to install in your environment (and has different binaries for different platforms).

gretl claims to import SPSS and export in a variety of formats, as does the R statistical suite. I've never dealt with SPSS data so cannot speak to their relative merits.

You could have Python make an external call to spssread, a Perl script that outputs the content of SPSS files in the way you want.

To be clear, the SPSS ODBC driver does not require an SPSS installation.

Maybe this will be helpful for someone:

http://sourceforge.net/search/?q=python+SPSS

good luck!

Michal

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top