Question

I am reading PostgreSQL protocol document. The document specifies message flow and containment format, but doesn't mention about how actual data fields are encoded in text/binary.

For the text format, there's no mention at all. What does this mean? Should I use just SQL value expressions? Or there's some extra documentation for this? If it's just SQL value expression, does this mean the server will parse them again?

And, which part of source code should I investigate to see how binary data is encoded?

Update

I read the manual again, and I found a mention about text format. So actually there is mention about text representation, and it was my fault that missing this paragraph.

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type.

Was it helpful?

Solution

There are two possible data formats - text or binary. Default is a text format - that means, so there is only server <-> client encoding transformation (or nothing when client and server use same encoding). Text format is very simple - trivial - all result data is transformed to human readable text and it is send to client. Binary data like bytea are transformed to human readable text too - hex or Base64 encoding are used. Output is simple. There is nothing to describing in doc

 postgres=# select current_date;
     date    
 ────────────
  2013-10-27
 (1 row)

In this case - server send string "2013-10-27" to client. First four bytes is length, others bytes are data.

Little bit difficult is input, because you can separate a data from queries - depends on what API you use. So if you use most simple API - then Postgres expect SQL statement with data together. Some complex API expected SQL statement and data separately.

On second hand a using of binary format is significantly difficult due wide different specific formats for any data type. Any PostgreSQL data type has a two functions - send and recv. These functions are used for sending data to output message stream and reading data from input message stream. Similar functions are for casting to/from plain text (out/in functions). Some clients drivers are able to cast from PostgreSQL binary format to host binary formats.

Some information:

  • libpq API http://www.postgresql.org/docs/9.3/static/libpq.html
  • you can look to PostgreSQL src to send/recv and out/in function - look on bytea or date implementation src/backend/utils/adt/date.c. Implementation of libpq is interesting too src/interfaces/libpq
  • -

OTHER TIPS

The things closest to a spec of a PostgreSQL binary format I could find were the documentation and the source code of the "libpqtypes" library. I know, a terrible state of the documentation for such a huge product.

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)

Binary representations for integers use network byte order (most significant byte first). For other data types consult the documentation or source code to learn about the binary representation. Keep in mind that binary representations for complex data types might change across server versions; the text format is usually the more portable choice.

(quoted from the documentation, link)

So the binary protocol is not stable across versions, so you probably should treat it as an implementation detail and not use the binary representation. The text representation is AFAICT just the format of literals in SQL queries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top