wagnerbianchi.com

The MariaDB Storage-Engine Independent Column Compression

One of the features DBAs have on their sleeves is the compression of the data living on databases. For the MariaDB Server, this is not any different. Among some options, we see available for compressing data and save sometimes a bunch of space, one of them certainly is the Storage-Engine Independent Column Compression, which makes it possible to compress data on the column level.

The motivation to analyse this feature on MariaDB Community Server came by after reading MDEV-22367, which claims that MariaDB should retain tables created with the InnoDB having the ROW_FORMAT=COMPRESSED as read-only by default, as mentioned on the notable changes for the MariaDB 10.6 (InnoDB). So, the Colum Compression appears to be, at the initial moment, an alternative if you want to make the compression a little more granular instead of running your databases with the innodb_read_only_compressed as OFF.

You must understand that compressing the whole table with the InnoDb ROW_FORMAT as COMPRESSED is different from compressing the columns of a table with the Storage-Engine Independent Column Compression.

So, the Storage-Engine Independent Column Compression will help compress columns of tables of one of the following data types: TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, VARCHAR, and VARBINARY. You only need to worry about adding the COMPRESSED work to the columns, initially, to get it compressed. I would also like to call the attention that you’re pretty much covered if you are using the JSON data type added to the MariaDB Server at its version 10.2.7. The JSON data type is mapped out to the LONGTEXT data type mentioned previously.

Before we start creating tables and adding compression, let’s see system and status variables available:

MariaDB [(none)]> show global variables where variable_name in ('column_compression_threshold','column_compression_zlib_level','column_compression_zlib_strategy','column_compression_zlib_wrap');
+----------------------------------+------------------+
| Variable_name                    | Value            |
+----------------------------------+------------------+
| column_compression_threshold     | 100              |
| column_compression_zlib_level    | 6                |
| column_compression_zlib_strategy | DEFAULT_STRATEGY |
| column_compression_zlib_wrap     | OFF              |
+----------------------------------+------------------+
4 rows in set (0.003 sec)

MariaDB [(none)]> show global status where variable_name in ('Column_compressions','Column_decompressions');
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Column_compressions   |  0    |
| Column_decompressions |  0    |
+-----------------------+-------+
2 rows in set (0.001 sec)

OK, having said that, let’s create a simple table having three columns and one of them that we are supposed to add compression soon issuing an ALTER TABLE command. All the exercises here will be done using the MariaDB Community Server 10.6.3 on Ubuntu 20.04.2 LTS (mariadb:latest docker image), but the Columnar Compression sed here is supported since MariaDB Server 10.3.2.

CREATE TABLE `t1` (
`a` int(11) NOT NULL AUTO_INCREMENT,
`b` varchar(255) DEFAULT NULL,
`c` blob DEFAULT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=84525 DEFAULT CHARSET=utf8mb4;

Let’s add some rows to the table:

root@e7bc0381525d:/# for i in {01..120176}; do mariadb -e 'INSERT INTO test.t1 set a=null, b=REPEAT('b',255), c=REPEAT('c', 65535);'; done
root@e7bc0381525d:/#

Let’s check the size of the table t1 tablespace:

root@e7bc0381525d:/# mariadb -e 'select count(*) from test.t1'
+----------+
| count(*) |
+----------+
| 120176 |
+----------+
root@e7bc0381525d:/# ls -lh /var/lib/mysql/test
total 12M
-rw-rw---- 1 mysql mysql 67 Aug 2 15:32 db.opt
-rw-rw---- 1 mysql mysql 2.0K Aug 2 17:39 t1.frm
-rw-rw---- 1 mysql mysql 11M Aug 2 18:52 t1.ibd

Let’s add compression to columns b and c:

root@e7bc0381525d:/# mariadb -e 'alter table test.t1 change b b varchar(255) compressed, change c c blob compressed;'

root@e7bc0381525d:/# mariadb -e 'show create table test.t1'
CREATE TABLE `t1` (
 `a` int(11) NOT NULL AUTO_INCREMENT,
 `b` varchar(255) /*!100301 COMPRESSED*/ DEFAULT NULL,
 `c` blob /*!100301 COMPRESSED*/ DEFAULT NULL,
 PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=120620 DEFAULT CHARSET=utf8mb4;
root@e7bc0381525d:/# ls -lh /var/lib/mysql/test
total 4.2M
-rw-rw---- 1 mysql mysql 67 Aug 2 15:32 db.opt
-rw-rw---- 1 mysql mysql 2.0K Aug 2 19:00 t1.frm
-rw-rw---- 1 mysql mysql 4.0M Aug 2 19:00 t1.ibd

Let’s check our status variables:

MariaDB [test]> show global status where variable_name in ('Column_compressions','Column_decompressions');
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Column_compressions   | 22    |
| Column_decompressions | 22    |
+-----------------------+-------+
2 rows in set (0.001 sec)

So, from 11MB to 4 MB, we’re talking about a compression rate of ~63%. The remaining question here is that this rate can vary considering the size of the tablespace? Maybe you can share your experience by adding your comment – any comments are really welcome.

Attention:

This blog post isn’t to encourage or discourage anything; this is meant to mainly exercise the column compression, which appears to be a good feature and must be more used to offer insights for improving it.

 


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

Leave a Reply