Question

I have a string variable. I need to convert all non-digit characters to spaces (" "). I have a problem with unicode characters. Unicode characters (the characters outside the basic charset) are converted to some invalid characters. See the code for example.

Is there any other way how to achieve the same result with procedure which would not choke on special unicode characters?

new file.

set unicode = yes.
show unicode.

data list free
 /T (a10).
begin data
1234
5678
absd
12as
12(a
12(vi
12(vī
12āčž
end data.

string Z (a10).
comp Z = T.

loop #k = 1 to char.len(Z).
if ~range(char.sub(Z, #k, 1), "0", "9") sub(Z, #k, 1) = " ".
end loop.

comp Z = normalize(Z).

comp len = char.len(Z).

list var = all.

exe.

The result:

T          Z               len

1234       1234              4
5678       5678              4
absd                         0
12as       12                2
12(a       12                2
12(vi      12                2
12(vī     12   �          6

>Warning # 649
>The first argument to the CHAR.SUBSTR function contains invalid characters.
>Command line: 1939  Current case: 8  Current splitfile group: 1

12āčž   12   �ž        7


Number of cases read:  8    Number of cases listed:  8
Was it helpful?

Solution 2

How about instead of replacing non-numeric characters, you cycle though and pull out the numeric characters and rebuild Z? (Note my version here is pre CHAR. string functions.)

data list free
 /T (a10).
begin data
1234
5678
absd
12as
12(a
12(vi
12(vī
12āčž
12as23
end data.

STRING Z (a10).
STRING #temp (A1).
COMPUTE #len = LENGTH(RTRIM(T)).
LOOP #i = 1 to #len.
  COMPUTE #temp = SUBSTR(T,#i,1).
  DO IF INDEX('0123456789',#temp) > 0.
    COMPUTE Z = CONCAT(SUBSTR(Z,1,#i-1),#temp).
  ELSE.
    COMPUTE Z = CONCAT(SUBSTR(Z,1,#i-1)," ").
  END IF. 
END LOOP.
EXECUTE.

OTHER TIPS

The substr function should not be used on the left hand side of an expression in Unicode mode, because the replacement character may not be the same number of bytes as the character(s) being replaced. Instead, use the replace function on the right hand side.

The corrupted characters you are seeing are due to this size mismatch.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top