I'm looking to automate the process of converting many .CSV files into .DTA files via Python. .DTA files is the filetype that is handled by the Stata Statistics language.

I have not been able to find a way to go about doing this, however.

The R language has write(.dta) which allows a dataFrame in R to be converted to a .dta file, and there is a port to the R language from Python via RPy, but I can't figure out how to use RPy to access the write(.dta) function in R.

Any ideas?

有帮助吗?

解决方案

You need rpy2 for Python and also the foreign package installed in R. You do that by starting R and typing install.packages("foreign"). You can then quit R and go back to Python.

Then this:

import rpy2.robjects as robjects
robjects.r("require(foreign)")
robjects.r('x=read.csv("test.csv")')
robjects.r('write.dta(x,"test.dta")')

You can construct the string passed to robjects.r from Python variables if you want, something like:

robjects.r('x=read.csv("%s")' % fileName)

其他提示

(copypasting from my answer to a previous question)

pandas DataFrame objects now have a "to_stata" method. So you can do for instance

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). Spacedman's answer may look less elegant, but it is probably more efficient.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top