문제

I have a table with 1699 columns and when I'm trying to insert more columns I get,

Error Code: 1117. Too many columns

In this table I have only 1000 rows. For me the most important thing is the number of columns. Are there any limitations on the table? I want to create 2000 columns. Is that possible?

도움이 되었습니까?

해결책

Why would you need to create a table with even 20 columns, let alone 2000 ???

Granted, denormalized data can prevent having to do JOINs to retrieve many columns of data. However, if you have over 10 columns, you should stop and think about what would happen under the hood during data retrieval.

If a 2000 column table undergoes SELECT * FROM ... WHERE, you would generate large temp tables during the processing, fetching columns that are unnecessary, and creating many scenarios where communication packets (max_allowed_packet) would be pushed to the brink on every query.

In my earlier days as a developer, I worked at a company back in 1995 where DB2 was the main RDBMS. The company had a single table that had 270 columns, dozens of indexes, and had performance issues retrieving data. They contacted IBM and had consultants look over the architecture of their system, including this one monolithic table. The company was told "If you do not normalize this table in the next 2 years, DB2 will fail on queries doing Stage2 Processing (any queries requiring sorting on non-indexed columns)." This was told to a multi-trillion dollar company, to normalize a 270 column table. How much more so a 2000 column table.

In terms of mysql, you would have to compensate for such bad design by setting options comparable to DB2 Stage2 Processing. In this case, those options would be

Tweeking these settings to make up for the presence of dozens, let alone hundreds, of columns works well if you have TBs of RAM.

This problem multiplies geometrically if you use InnoDB as you will have to deal with MVCC (Multiversion Concurrency Control) trying to protect tons of columns with each SELECT, UPDATE and DELETE through transaction isolation.

CONCLUSION

There is no substitute or band-aid that can make up for bad design. Please, for your sake of your sanity in the future, normalize that table today !!!

다른 팁

I'm having trouble imagining anything where the data model could legitimately contain 2000 columns in a properly normalised table.

My guess is that you're probably doing some sort of "fill in the blanks" denormalised schema, where you're actually storing all different sorts of data in the one table, and instead of breaking the data out into separate tables and making relations, you've got various fields that record what "type" of data is stored in a given row, and 90% of your fields are NULL. Even then, though, to want to get to 2000 columns... yikes.

The solution to your problem is to rethink your data model. If you're storing a great pile of key/value data that's associated with a given record, why not model it that way? Something like:

CREATE TABLE master (
    id INT PRIMARY KEY AUTO_INCREMENT,
    <fields that really do relate to the
    master records on a 1-to-1 basis>
);

CREATE TABLE sensor_readings (
    id INT PRIMARY KEY AUTO_INCREMENT,
    master_id INT NOT NULL,   -- The id of the record in the
                              -- master table this field belongs to
    sensor_id INT NOT NULL,
    value VARCHAR(255)
);

CREATE TABLE sensors (
    id INT PRIMARY KEY AUTO_INCREMENT,
    <fields relating to sensors>
);

Then to get all of the sensor entries associated with a given "master" record, you can just SELECT sensor_id,value FROM sensor_readings WHERE master_id=<some master ID>. If you need to get the data for a record in the master table along with all of the sensor data for that record, you can use a join:

SELECT master.*,sensor_readings.sensor_id,sensor_readings.value
FROM master INNER JOIN sensor_readings on master.id=sensor_readings.master_id
WHERE master.id=<some ID>

And then further joins if you need details of what each sensor is.

It's a measurement system with 2000 sensors

Ignore all the comments shouting about normalization - what you are asking for could be sensible database design (in an ideal world) and perfectly well normalized, it is just very unusual, and as pointed out elsewhere RDBMSs are usually simply not designed for this many columns.

Although you are not hitting the MySQL hard limit, one of the other factors mentioned in the link is probably preventing you from going higher

As others suggest, you could work around this limitation by having a child table with id, sensor_id, sensor_value, or more simply, you could create a second table to contain just the columns that will not fit in the first (and use the same PK)

MySQL 5.0 Column-Count Limits (emphasis added):

There is a hard limit of 4096 columns per table, but the effective maximum may be less for a given table. The exact limit depends on several interacting factors.

  • Every table (regardless of storage engine) has a maximum row size of 65,535 bytes. Storage engines may place additional constraints on this limit, reducing the effective maximum row size.

    The maximum row size constrains the number (and possibly size) of columns because the total length of all columns cannot exceed this size.

...

Individual storage engines might impose additional restrictions that limit table column count. Examples:

  • InnoDB permits up to 1000 columns.

First some more flaming, then a real solution...

I mostly agree with the flames already thrown at you.

I disagree with key-value normalization. Queries end up being horrible; performance even worse.

One 'simple' way to avoid the immediate problem (limitation of number of columns) is to 'vertically partition' the data. Have, say, 5 tables with 400 columns each. They would all have the same primary key, except one might have it being AUTO_INCREMENT.

Perhaps better would be to decide on the dozen fields that are most important, put them into the 'main' table. Then group the sensors in some logical way and put them into several parallel tables. With the proper grouping, you might not have to JOIN all the tables all the time.

Are you indexing any of the values? Do you need to search on them? Probably you search on datetime?

If you need to index lots of columns -- punt.

If you need to index a few -- put them into the 'main table.

Here's the real solution (if it applies)...

If you don't need the vast array of sensors indexed, then don't make columns! Yes, you heard me. Instead, collect them into JSON, compress the JSON, store it into a BLOB field. You will save a ton of space; you will have only one table, with not column limit problems; etc. Your application will uncompress, and then use the JSON as a structure. Guess what? You can have structure -- you can group the sensors into arrays, multilevel stuff, etc., just like your app would like. Another 'feature' -- it is open-ended. If you add more sensors, you don't need to ALTER the table. JSON if flexible that way.

(Compression is optional; if your dataset is huge, it will help with disk space, hence overall performance.)

I see this as a possible scenario in the world of big data, where you may not be performing the traditional select * type of queries. We deal with this in the predictive modeling world at a customer level where we model a customer across thousands of dimensions (all of them having values of 0 or 1). This way of storage makes the downstream model building activities etc easier when you have the risk factors in the same row and the outcome flag in the same row as well.. This can be normalized from a storage stand point with a parent child structure, but the predictive model downstream will need to convert it back into flat schema. We use redshift which does columnar storage, so your 1000+ columns when you load up the data, actually are stored in a columnar format...

There is a time and place for this design. Absolutely. Normalization is not the solution for every problem.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top