Tcl for getting ASCII code for every character in a string

https://stackoverflow.com/questions/1675677

16-09-2019
|

Question

I need to get the ASCII character for every character in a string. Actually its every character in a (small) file. The following first 3 lines successfully pull all a file's contents into a string (per this recipe):

set fp [open "store_order_create_ddl.sql" r]
set data [read $fp]
close $fp

I believe I am correctly discerning the ASCII code for the characters (see http://wiki.tcl.tk/1497). However I'm having a problem figuring out how to loop over every character in the string.

First of all I don't think the following is an especially idiomatic way of looping over characters in a string with Tcl. Second and more importantly, it behaves incorrectly, inserting an extra element between every character.

Below is the code I've written to act on the contents of the "data" variable set above, followed by some sample output.

CODE:

for {set i 0} {$i < [string length $data]} {incr i} {
  set char [string index $data $i]
  scan $char %c ascii
  puts "char: $char (ascii: $ascii)"
}

OUTPUT:

char: C (ascii: 67)
char:  (ascii: 0)
char: R (ascii: 82)
char:  (ascii: 0)
char: E (ascii: 69)
char:  (ascii: 0)
char: A (ascii: 65)
char:  (ascii: 0)
char: T (ascii: 84)
char:  (ascii: 0)
char: E (ascii: 69)
char:  (ascii: 0)
char:   (ascii: 32)
char:  (ascii: 0)
char: T (ascii: 84)
char:  (ascii: 0)
char: A (ascii: 65)
char:  (ascii: 0)
char: B (ascii: 66)
char:  (ascii: 0)
char: L (ascii: 76)
char:  (ascii: 0)
char: E (ascii: 69)

Solution

The following code should work:

set data {CREATE TABLE}
foreach char [split $data ""] {
    lappend output [scan $char %c]
}
set output ;# 67 82 69 65 84 69 32 84 65 66 76 69

As far as the extra characters in your output, it seems like the problem is with your input data from the file. Is there some reason there would be null characters (\0) in between every character in the file?

OTHER TIPS

Came across this older question while looking for something else.. Going to answer it for the benefit of anyone else who may be looking for an answer to this question..

First off, understand what character encodings are. The source data in the example is NOT ASCII character encoding, so the ASCII character codes (codes 0-127) really have no meaning--Except in this example, the encoding appears to be UTF-16, which includes ASCII codes as a subset. What you probably want is the full range of "character" codes from 0 to 255, but depending on your system, the source of the data, etc, codes 128-255 may be ANSI, ISO, or some other strange code page. What you want to do is convert the data in to a format you know how to handle, such as the very common ISO 8859-1 code (encoding "iso8859-1"), which is very similar to Windows 1252 standard encoding (encoding "cp1252"), or UTF-8 (encoding "utf-8") with the "encoding" command:

set data [encoding convertto utf-8 $data] ;# For UTF-8

set data [encoding convertto iso8859-1 $data] ;# For ISO 8859-1

and so on. If you're reading the data from a file, you may want to set the file encoding (via fconfigure) prior to reading the data as well, to make sure you're reading the file data correctly. Look up the man pages for "encoding" (and "fconfigure") for more details on handing character set encoding.

Once you have the encoding of the data under control, the rest of the example code should work as expected.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow