Question

So I'm trying to convert a binary to a string. This code:

t = [{<<71,0,69,0,84,0>>}]
String.from_char_list(t)

But I'm getting this when I try this conversion:

** (ArgumentError) argument error
    (stdlib) :unicode.characters_to_binary([{<<70, 0, 73, 0, 78, 0>>}])
    (elixir) lib/string.ex:1161: String.from_char_list/1

I'm assuming the <<70, 0, etc. is likely a list of graphemes (it's the return from an API call and the API is not quite documented) but do I need to specify the encoding somehow?

I know I'm likely missing something obvious (maybe that's not the right function to use?) but I can't seem to figure out what to do here.


EDIT:

For what it's worth, the binary above is the return value of an Erlang ODBC call. After a little more digging I found that the binary in question is actually a "Unicode binary encoded as UTF16 little endian" (see here: http://www.erlang.org/doc/apps/odbc/odbc.pdf pg. 9 re: SQL_WVARCHAR) Doesn't really change the issue but it does add some context.

Was it helpful?

Solution

There's a couple of things here:

1.) You have a list with a tuple containing one element, a binary. You can probably just extract the binary and have your string. Passing the current data structure to to_string is not going to work.

2.) The binary you used in your example contains 0, an unprintable character. In the shell, this will not be printed properly as a string, due to the fact that Elixir can't tell the difference between just a binary, and a binary representing a string, when the binary representing a string contains unprintable characters.

3.) You can use pattern matching to convert a binary to a particular type. For instance:

iex> raw = <<71,32,69,32,84,32>>
...> Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>)
"G E T "
...> <<c::utf8, _::binary>> = raw
"G"

Also, if you are getting binary data from a network connection, you probably want to use :erlang.iolist_to_binary, since the data will be an iolist, not a charlist. The difference is that iolists can contain binaries, nested lists, as well as just be a list of integers. Charlists are always just a flat list of integers. If you call to_string, on an iolist, it will fail.

OTHER TIPS

I made a function to convert binary to string

def raw_binary_to_string(raw) do
   codepoints = String.codepoints(raw)  
      val = Enum.reduce(codepoints, 
                        fn(w, result) ->  
                            cond do 
                                String.valid?(w) -> 
                                    result <> w 
                                true ->
                                    << parsed :: 8>> = w 
                                    result <>   << parsed :: utf8 >>
                            end
                        end)

  end

Executed on iex console

iex(6)>raw=<<65, 241, 111, 32, 100, 101, 32, 70, 97, 99, 116, 117, 114, 97, 99, 105, 111, 110, 32, 65, 99, 116, 117, 97, 108>>
iex(6)>raw_binary_to_string(raw)
iex(6)>"Año de Facturacion Actual"

Not sure if OP has since solved his problem, but in relation to his remark about his binary being utf16-le: for specifically that encoding, I found that the quickest (and to those more experienced with Elixir, probably-hacky) way was to use Enum.reduce:

# coercing it into utf8 gives us ["D", <<0>>, "e", <<0>>, "v", <<0>>, "a", <<0>>, "s", <<0>>, "t", <<0>>, "a", <<0>>, "t", <<0>>, "o", <<0>>, "r", <<0>>]
<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0>>  
|> String.codepoints()
|> Enum.reduce("", fn(codepoint, result) ->
                     << parsed :: 8>> = codepoint
                     if parsed == 0, do: result, else: result <> <<parsed>>
                   end)

# "Devastator"
|> IO.puts()

Assumptions:

  • utf16-le encoding

  • the codepoints are backwards-compatible with utf8 i.e. they use only 1 byte

Since I'm still learning Elixir, it took me a while to get to this solution. I looked into other libraries people made, even using something like iconv at a bash level.

Ecto.UUID.load/1 will convert a binary to string and return a tuple:

binary = Ecto.UUID.bingenerate()
<<99, 148, 189, 126, 144, 154, 71, 236, 160, 110, 149, 143, 67, 162, 177, 192>>

Ecto.UUID.load(binary)
{:ok, "6394bd7e-909a-47ec-a06e-958f43a2b1c0"}

credit: https://stackoverflow.com/a/43530427/2091331

The last point definitely does change the issue, and explains it. Elixir uses binaries as strings but assumes and demands that they are UTF8 encoded, not UTF16.

In reference to http://erlang.org/pipermail/erlang-questions/2010-December/054885.html

You can use :unicode.characters_to_list(binary_string, {:utf16, :little}) to verify result and store too

IEX eval

iex(1)> y                                                
<<115, 0, 121, 0, 115, 0>>
iex(2)> :unicode.characters_to_list(y, {:utf16, :little})
'sys'

Note : Value printed as sys for <<115, 0, 121, 0, 115, 0>>

You can use Comprehensions

    defmodule TestModule do
      def convert(binary) do
        for c <- binary, into: "", do: <<c>>
      end
    end
    TestModule.convert([71,32,69,32,84,32]) |> IO.puts
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top