PowerBuilder - Dash character turned to invalid character

Question 1

Any chance this file being imported was data entered in Word or Excel? Have you looked at the data file with a hex editor? Odds are the dash character was "intelligently" substituted with an extended character, and you have a character set clash going on. My bet is on the fix being in the data file, not the code.

Good luck,

Terry

Question 2

I did a little bit of research, and learned that the character in question is a unicode dash U+002D. Now if the data looked fine in your input file and was corrupt upon importing the problem may due to PB not handling the data as Unicode so you can fix the situation using functions in PB.

It could be that the database interface you are using doesn't support the conversion between ansii and unicode (see page 7).. not sure if you were using a pipeline object or anything where database drivers come into play.

Either way knowing it is a character encoding issue, fixing this should be pretty simple, just use the the "EncodingASNI!" or "EncodingUnicode!" enumerated argument on the String and Blob methods prior to importing the text into the datawindow. If that isn't possible than you could write a quick routine to read through the file, convert, and save before importing.

If you don't want to convert before importing you can do it by looping through the datawindow/datastore before actually performing the update to the database.

You can find examples of code on my blog on converting between ANSI and Unicode, but basically you just use one of these encoding parameters on String and Blob functions.

EncodingANSI!
Encoding UTF8!
EncodingUTF16LE! – UTF-16 Little Endian encoding (PowerBuilder 10 default)
EncodingUTF16BE! – UTF-16 Big Endian encoding

Question 3

Appreciate all the comments. And I would like to share on how did I addressed the question. I've added a script to handle file encoding validation and conversion as well.

ll_FileNum = FileOpen(ls_sourcepath, StreamMode!, Read!, LockWrite!, Replace!)
ll_FileLength = FileLength(ls_sourcepath)
eRet = FileEncoding(ls_sourcepath)
IF eRet = EncodingANSI! and ll_filelength <= 32765 THEN 
    li_bytes = FileReadEx(ll_FileNum, lbl_data)     
    ls_unicode = String(lbl_data, EncodingUTF8!)    
    FileClose(ll_FileNum)
END IF

    IF (ids_edihdr.ImportString(ls_unicode,1,1) = 1 ) AND (ids_edidtl.ImportString(ls_unicode,2) > 0 ) THEN 
         <some conditions here....>
    END IF