UTF8 vs ASCII or custom binary format: high-performance optimization on very big tables

dba.stackexchange https://dba.stackexchange.com/questions/43162

  •  31-10-2019
  •  | 
  •  

문제

The summary of my question is whether there are advantages, even if minimal, to use ASCII, or even a smaller format designed yet created specifically for the situation, instead of UTF-8 for strings.

Is possible to use webserver who will access the data translate one ASCII string to a more compact format.

On this question, just 1-3 tables on database handle a massive amount of data, so any byte can make difference if could fit on memory instead of access disk. The information will be accessed via a RESTfull service

Read vs Writes:

Project need much more read than writes. But writing has a peculiarity: every 10 seconds an amount between 40-300 new rows are inserted into the main table. These could be written in parallel, since do not depend of each other.

Memory vs Disk usage:

Recent inserted rows, that will be used immediately, will be inserted also on a cache for the webservice use, so no need to read they again. But for search on old records, the database will need, and it should be fast.

This is why I suppose that use less bytes to store some fields will make diference: even for larger amount of data, will be more easy for fit on memory.

If I cannot fit data on memory, and database cannot abstract to me some speed, or I will need to force slow table/partition scans each 10 seconds for just one user, or I will be forced to do single select and cache it on Webserver, but this break concept of "stateless" from REST concept.

Characters that must be supported

0-9,A-Z, "-", "_". Maybe will need "a-z". Just 38 or 64 caracters, and never more than this.

For now, most columns are

CHAR(3), CHAR(6), VARCHAR(8), VARCHAR(10). 

Examples:

  • ABC, XYZ
  • AB-EFG, XY-XPT
  • ABC123457
  • E47F6C, 34210A, E48D37 (hex strings, maybe exist a format specific to this?)

Technologies used

Database will be MariaDB. Maybe part of RAW data will be on some NoSQL database. The language of webservice does not really makes diference here, but will be PHP 5.4 with framework Phalcon PHP.

Different types of cache could be used, from Varnish-cache to APC/Mencached until caches inside database.

My initial hypothesis:

One field with VARCHAR(10) needs 80bits. If I use one custom table to use just 64 or less instead 255 characters, I save around 4 times, so instead of 80bits will use just 20bits.

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top