Question

I have a large dataset in dbf file and would like to export it to the csv type file. Thanks to SO already managed to do it smoothly.

However, when I try to import it into R (the environment I work) it combines some characters together, making some rows much longer than they should be, consequently breaking the whole database. In the end, whenever I import the exported csv file I get only half of the db. Think the main problem is with quotes in string characters, but specifying quote="" in R didn't help (and it helps usually).

I've search for any question on how to deal with quotes when exporting in visual foxpro, but couldn't find the answer. Wanted to test this but my computer catches error stating that I don't have enough memory to complete my operation (probably due to the large db).

Any helps will be highly appreciated. I'm stuck with this problem on exporting from the dbf into R for long enough, searched everything I could and desperately looking for a simple solution on how to import large dbf to my R environment without any bugs.

(In R: Checked whether have problems with imported file and indeed most of columns have much longer nchars than there should be, while the number of rows halved. Read the db with read.csv("file.csv", quote="") -> didn't help. Reading with data.table::fread() returns error

Expected sep (',') but '0' ends field 88 on line 77980:

But according to verbose=T this function reads right number of rows (read.csv imports only about 1,5 mln rows)

Count of eol after first data row: 2811729 Subtracted 1 for last eol and any trailing empty lines, leaving 2811728 data rows

Was it helpful?

Solution 2

You might have to write code to do the export, rather than simply using the COPY TO ... DELIMITED command.

SELECT thedbf
mfld_cnt = AFIELDS(mflds)

fh = FOPEN(m.filename, 1)

SCAN

  FOR aa = 1 TO mfld_cnt
    mcurfld = 'thedbf.' + mflds[aa, 1]
    mvalue = &mcurfld
    ** Or you can use:
    mvalue = EVAL(mcurfld)

    ** manipulate the contents of mvalue, possibly based on the field type
    DO CASE
      CASE mflds[aa, 2] = 'D'
        mvalue = DTOC(mvalue)

        CASE mflds[aa, 2] $ 'CM'
          ** Replace characters that are giving you problems in R
          mvalue = STRTRAN(mvalue, ["], '')

        OTHERWISE
          ** Etc.
    ENDCASE

    = FWRITE(fh, mvalue)
    IF aa # mfld_cnt
      = FWRITE(fh, [,])
    ENDIF

  ENDFOR

  = FWRITE(fh, CHR(13) + CHR(10))
ENDSCAN
= FCLOSE(fh)

Note that I'm using [ ] characters to delimit strings that include commas and quotation marks. That helps readability.

OTHER TIPS

When exporting to TYPE DELIMITED You have some control on the VFP side as to how the export formats the output file.

To change the field separator from quotes to say a pipe character you can do:

copy to myfile.csv type delimited with "|"

so that will produce something like:

|A001|,|Company 1 Ltd.|,|"Moorfields"|

You can also change the separator from a comma to another character:

copy to myfile.csv type delimited with "|" with character "@"

giving

|A001|@|Company 1 Ltd.|@|"Moorfields"|

That may help in parsing on the R side.

There are three ways to delimit a string in VFP - using the normal single and double quote characters. So to strip quotes out of character fields myfield1 and myfield2 in your DBF file you could do this in the Command Window:

close all
use myfile 
copy to mybackupfile   
select myfile
replace all myfield1 with chrtran(myfield1,["'],"")
replace all myfield2 with chrtran(myfield2,["'],"")

and repeat for other fields and tables.

*create a comma delimited file with no quotes around the character fields copy to TYPE DELIMITED WITH "" (2 double quotes)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top