How to best store pixels in a database?

https://dba.stackexchange.com/questions/198408

11-10-2020
|

Question

When loading data into a table, I get the following error:

ERROR:  row is too big: size 8680, maximum size 8160

The table has 1000+ columns in it, which appears to be the problem. The general internet advice is "refactor!" or "normalize!". For instance, this post. Unfortunately, I don't believe such advice applies to my situation.

The table is to store data collected from a device. The device produces a PNG image as part of an analysis. The PNG consists of 1024 pixels. Each pixel has an associated numeric value. Along with the pixel data are various other fields related to the analysis. Breaking the table into parts doesn't really make sense. The fields are all logically associated with the particular object being analyzed.

Postgres doesn't seem to like that each pixel has its own field. The table has fields of the form: pixel_1, pixel_2, ..., pixel_1024. Note that this is fundamentally different from the usual example of phone_number_1, phone_number_2, etc. Each pixel is a unique object by virtue of its location. pixel_1 has a different position than pixel_123 and each pixel has an associated value. The common aspect between them is that they both are used to describe the same analysis object. They are the quantitative analog to the visual representation given in the PNG.

Is there a way to increase the row size?
If the table simply cannot have 1000+ columns, how could I refactor this?
Assuming the first two answers are "No.", should I just stick the 1024 columns into an XML and throw that in a text field?

I hope I have made the context clear. I have tried to boil the problem down to its essence, but I suspect some clarification may be needed. Please let me know if clarification is needed.

EDIT: As an experiment, I tried breaking the pixels into a separate table. That seems to be the only possible way to refactor. But the 1024 columns produces the same error.

Solution

I would go for an array:

create table device
(
   id      integer primary key,
   pixels  integer[]
);

The drawback is, that you always need to read and write all pixels as it is a single column.

Note that Postgres does not enforce array limits. Even if you declare the column as integer[1024] you can still store more or less than 1024 pixels in it. If you need to put a constraint on that, you can use a check constraint.

An array is stored with a variable width and thus it's compressed.

Another option would be JSONB as Json offers at least some kind of data type information. I wouldn't go for XML nowadays. The JSON support is much better, the functions to query and manipulate JSON are more flexible and powerful than the XML functions (and given the current JSON hype, there is more momentum there as well). It seems that Postgres 11 will support the JSON functions from the SQL:2016 standard.

OTHER TIPS

Raster Data

The table is to store data collected from a device. The device produces a PNG image as part of an analysis. The PNG consists of 1024 pixels. Each pixel has an associated numeric value. Along with the pixel data are various other fields related to the analysis. Breaking the table into parts doesn't really make sense. The fields are all logically associated with the particular object being analyzed.

That's called a "raster", not a PNG image. A PNG is in fact an RGB[A] raster stored with DEFLATE compression, but what you want to store has nothing to do with it being a PNG. If these terms don't make sense, it's like telling someone you want to store a PDF. That's possible, but likely what they want is to store the underlying text.

Here your underlying text from the PNG is the raster.

PostGIS

PostgreSQL supports storing actual raster types with PostGIS. I suggest checking out the PostGIS Raster Reference. If your rasters are currently encoded in the PNG channels, you can use GDAL to put them in the database

I would suggest checking that out for information on how to query, process, modify, and export raster data.

Native

Even if you decide not use PostGIS, I don't believe the other answers are adequate. As mentioned above, a RGB pixel is a 3-channel compound data type. So you'll need at least three dimensions for the pixel data.

In addition, you'll likely want information on the XY pixel data, or a method to calculate with that meta data later.

You could use a cube to represent the pixel data

-- white and black pixels (RGB)
SELECT ARRAY['(255,255,255)','(0,0,0)']::cube[]

You can use an array to store each channel (likely not useful for processing), and importing and exporting will be difficult.
You could use a type for the pixel., though a record type even of "char" is substantially larger than our solution below.
You could store each pixel as an int and ignore the channels (assuming you have less than four 1-byte channels). If you need to process this will get super ugly very fast, make analysis near impossible, and exporting and querying very difficult.

Or you could create your own DOMAIN, but at this point your basically recreating PostGIS's raster. With this approach you get to store all four bands as a single int. Access those bands with minimal cost, and construct the pixel from the bands. You can easily expand this to int8 if you need 8 bands.

CREATE DOMAIN pixel AS int4;

CREATE OR REPLACE FUNCTION get_band(p pixel, b int)
RETURNS int AS $$
  SELECT ((p::bit(32)<< ((4-b)*8)) >> (3*8))::int
$$ LANGUAGE sql
IMMUTABLE;

CREATE OR REPLACE FUNCTION create_pixel( int, int, int, int )
RETURNS pixel AS $$
  SELECT (
    $1::bit(32) |
    ($2::bit(32)<<8) |
    ($3::bit(32)<<16) |
    ($4::bit(32)<<24)
  )::pixel;
$$ LANGUAGE sql
IMMUTABLE;

-- Returns 128.
SELECT get_band(create_pixel(255,0,128,32),3)::int;

-- You can use this type in an array too 
CREATE TABLE f ( ps pixel[] );

As a side note, if using LIDAR you should check out PG Point Cloud

See https://www.postgresql.org/docs/10/static/storage-toast.html :

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or “the best thing since sliced bread”). The TOAST infrastructure is also used to improve handling of large data values in-memory.

So you have no way of going over 8 kB, except by recompiling PostgreSQL with a different page size value which is a wrong idea. Indeed having 1000 columns in a table is a huge design smell...

What are the current data types of your pixel columns? I do not think integers and numbers are TOAST-able. Hence the limit you run into. As written in https://wiki.postgresql.org/wiki/TOAST, do a \d+ to see how your current columns are handled storage-wise. And indeed it says:

Also, fixed-width field types such as integers don't support being toasted, so having lots of columns of those types can cause problems with the row width limit too.

The refactoring you need depends on your application and how it access this value. With things not naturally relational, like files, you have always the option of storing them on disk instead of in the database. See https://wiki.postgresql.org/wiki/BinaryFilesInDB for many good points.

You could use also a bytea type. Or a separate table with one row per pixel since you have far less constraints on the number of rows than the number/types of volumes and it can make easy partitioning. Like: id, device_id, measurement_id, position_x, position_y, pixel_value

But this all depends on the kind of queries you do on the data. For example, do you need all pixels at once and always all of them, or do you need to be able to query only a specific random one? Etc.

Going the XML route (or even the JSON one) is clearly not the recommended way.

I’d go with two tables - one for PNG data:

CREATE TABLE images (
    images_id SERIAL PRIMARY KEY NOT NULL,
    raw_image BYTEA NOT NULL
)

...and another one for metadata:

CREATE TABLE image_metadata (
    image_metadata_id SERIAL PRIMARY KEY NOT NULL,
    images_id INT NOT NULL REFERENCES images (images_id) ON DELETE CASCADE,
    pixel_x INT NOT NULL,
    pixel_y INT NOT NULL,

    -- Some more columns for the metadata, or JSONB for a freeform metadata definition
)

That way:

You don’t go overboard with column / row count
Fetching the whole (PNG compressed) image and extracting the right pixel is going to be faster than fetching 1024 individual uncompressed pixels and “concatenating” them into an image
Foreign keys will take care of the cleanup
Overall rather straightforward IMHO

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange