1

Topic: Question: Why not use utf8mb4 as default?

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 1.3.1
- Deployed with iRedMail Easy or the downloadable installer? installer
- Linux/BSD distribution name and version: Debian 10
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MariaDB
- Web server (Apache or Nginx):Nginx
- Manage mail accounts with iRedAdmin-Pro? yes
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

Hello Zhang,

is there any specific reason not to use utf8mb4 by default in MySQL/MariaDB?

I ask because just recently I got an error where an invalid string is reported which was exported from another database:

ERROR 1366 (22007) at line 16: Incorrect string value: '\xF0\x9F\xA4\xAB N...' for column `amavisd`.`msgs`.`from_addr` at row 101

The reason is that \xF0\x9F\xA4\xAB is a utf8mb4 character. In my source table the field was VARBINARY(255) but when importing the data into a field with VARCHAR(255) above error is generated.

When altering the field to use utf8mb4, all works fine.

Above error was encountered in amavisd.msgs table.

Some funny people start using emojis or other special unicode characters in the subject to attract the readers attention.
Saving those in the database should be done with the correct charset IMHO.

I'd recommend using utf8mb4 all over the place.

What do you think?

Any remarks?

Is there any drawback to use utf8mb4? At least in amavisd.msgs table?

Another point of view: When using MySQL Administrator or any other tool, the column is shown as "BLOB", the contents hidden from my eye.
I'd wish to actually see what is saved in the column smile


Best regards,

Bernhard

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: Question: Why not use utf8mb4 as default?

Good catch, will fix it in next release.

3

Re: Question: Why not use utf8mb4 as default?

Thanks a lot!

I will implement a temporary fix in amavis table and wait for your next update instruction to get the rest of the fields fixed.

4

Re: Question: Why not use utf8mb4 as default?

Fix it by running MySQL commands as SQL root user:

USE amavisd;
ALTER TABLE msgs MODIFY COLUMN from_address VARBINARY(255) NOT NULL DEFAULT '';

5

Re: Question: Why not use utf8mb4 as default?

VARBINARY does actually work well but creates a little issue when analyzing data via SQL.

That problem was caused by my old amavisd table structure which used VARBINARY.

New iRedMail-1.3.1 installation uses

...
`from_addr` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '',
...

which makes more sense but may cause troubles when importing data from an older installation where fields are VARBINARY

IMHO at least the field from_addr should be utf8mb4, e.g.

...
`from_addr` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT '',
...

On a global level I'd still suggest to use utf8mb4 as default character set.


At the end the data still stays the same but we change the way MySQL/MariaDB treats the data for queries, display and verification.

6

Re: Question: Why not use utf8mb4 as default?

So you suggest to create db like this?

CREATE DATABASE amavisd DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

7

Re: Question: Why not use utf8mb4 as default?

Yes, this looks fine for amavisd database.
Please note that field-specific configuration should be omitted, e.g. for 'from_addr' the type VARCHAR(n) is enough.




broth wrote:

On a global level I'd still suggest to use utf8mb4 as default character set.

I think there are no drawbacks to use utf8mb4 as default for all databases where

- no workarounds are in place (like defining field as VARBINARY)
- customer input is possible (comment fields in iredadmin, contact lists, names)

Fields, which for technical reasons do not permit utf8, should be defined as ascii.
This was already done here:

CREATE TABLE `share_folder` (
  `from_user` varchar(255) CHARACTER SET ascii NOT NULL,
...

A quick quote from a blog article:

https://medium.com/@adamhooper/in-mysql … 761243e434

If you need to use MySQL or MariaDB, never use “utf8”. Always use “utf8mb4” when you want UTF-8

8 (edited by broth 2020-07-31 12:13:18)

Re: Question: Why not use utf8mb4 as default?

BTW: Just try by yourself, inserting some emoji or other special characters to your "Company/Organization Name" in iRedAdmin-Pro, e.g.:

Edit: Forum software does not support UTF8MB4 wink

9

Re: Question: Why not use utf8mb4 as default?

@Zhang: Thanks for your iRedMail 1.3.2 release smile

I saw that in the update guide (SQL section) that the from_addr shall be set to VARBINARY(255)

This works but makes it impossible to properly analyze SQL data via MySQL-Administrator or any other tool.
It will just show binary data.
Please check my previous posts.

As mentioned the proper way IMHO would be to use "utf8mb4"

Thanks for your consideration!

10

Re: Question: Why not use utf8mb4 as default?

There's an error during my testing while switching to utf8mb4 (sorry i cannot remember the detailed error sad ), so we stay with utf8 first.

11

Re: Question: Why not use utf8mb4 as default?

Thanks for your quick reply, hopefully this will get solved in one of the next releases smile

THANKS!