1 (edited by srosoftware 2017-11-10 20:43:06)

Topic: amavisd SQL backup fills entire server

======== Required information ====
- iRedMail version: 0.9.7 MARIADB edition.
- Linux distribution : Centos 7.4.1708
- Backend: MySQL (version 5.5.52)
- Web server: Apache
- Manage iRedAdmin-Pro?  No
====

Began experiencing errors "Server error: '451 4.3.0  Temporary lookup failure'" last night.  SSH'd into server and found disk 100% full.  Quick search showed disk was filled by the amavisd SQL backup (/var/vmail/backup/mysql/2017/11/10/amavisd-2017-11-10-01-01-01.sql).  Amavisd SQL dump was over 82GB before the server ran out of disk space and  MariaDB server crashed.  Amavisd table is only 9.3 MiB (12,494, InnoDB, utf8_general_ci, 9.3 MiB), so it seems it's an issue with mysqldump.

==== 2017-11-10-01-01-01.log  ====
* Starting backup: 2017-11-10-01-01-01.
* Backup directory: /var/vmail/backup/mysql/2017/11/10.
* Backing up databases: mysql vmail roundcubemail amavisd iredadmin sogo iredapd.
* File size:
----
82G     amavisd-2017-11-10-01-01-01.sql
104K    mysql-2017-11-10-01-01-01.sql.bz2
8.0K    roundcubemail-2017-11-10-01-01-01.sql.bz2
24K     vmail-2017-11-10-01-01-01.sql.bz2
----
* Backup completed (Success? NO).
================================

Ever seen this before?  Issue just began last night.  The only changes made recently were a few custom Spamassassin rules.  I Googled "mysqldump runs forever", "mysqldump massive sql file" and found nothing so far.
 
Would appreciate any help as I must temporarily disable Amavisd database backup until this is resolved.

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: amavisd SQL backup fills entire server

*) Didn't get similar report before.
*) Can you reproduce this issue by running the backup script manually? If not, try to wait for one more night and see whether it occurs tomorrow.

3

Re: amavisd SQL backup fills entire server

ZhangHuangbin wrote:

*) Didn't get similar report before.
*) Can you reproduce this issue by running the backup script manually?

Yes. I can also confirm the same issue happens manually running mysqldump.

# mysqldump -uroot --events --skip-comments amavisd > amavisd_test.sql

It appears any mysqldump of our amavisd database creates this issue.

I will check backup of individual tables next.

4

Re: amavisd SQL backup fills entire server

Seems to be limited to the amavisd msgrcpt table.  All others dump normally. 

# mysqldump -uroot --events --skip-comments amavisd msgrcpt > amavisd.msgrcpt.sql     (crashes)

5 (edited by srosoftware 2017-11-13 04:51:48)

Re: amavisd SQL backup fills entire server

Looks like that was just a symptom of a larger problem.  The mysql server is now crashing when doing inserts on the same table (in bold below):

171110 19:20:44 [ERROR] mysqld got signal 6 ;

Server version: 5.5.52-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=28
max_threads=153
thread_count=25
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466712 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7febcac8cda0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7feb9d94edc0 thread_stack 0x48000
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x7febc75742dd]
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x7febc71898d5]
/lib64/libpthread.so.0(+0xf370)[0x7febc68b9370]
/lib64/libc.so.6(gsignal+0x37)[0x7febc4fee1d7]
/lib64/libc.so.6(abort+0x148)[0x7febc4fef8c8]
/usr/libexec/mysqld(+0x66611f)[0x7febc734f11f]
/usr/libexec/mysqld(+0x666775)[0x7febc734f775]
/usr/libexec/mysqld(+0x66fa32)[0x7febc7358a32]
/usr/libexec/mysqld(+0x74088f)[0x7febc742988f]
/usr/libexec/mysqld(+0x744a0a)[0x7febc742da0a]
/usr/libexec/mysqld(+0x744c97)[0x7febc742dc97]
/usr/libexec/mysqld(+0x619ad4)[0x7febc7302ad4]
/usr/libexec/mysqld(+0x5ff88c)[0x7febc72e888c]
/usr/libexec/mysqld(_ZN7handler12ha_write_rowEPh+0xb6)[0x7febc7191c16]
/usr/libexec/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x6f)[0x7febc7040d0f]
/usr/libexec/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0xcb4)[0x7febc7044304]
/usr/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x2cc0)[0x7febc705b970]
/usr/libexec/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x125)[0x7febc7060315]
/usr/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1753)[0x7febc7062373]
/usr/libexec/mysqld(_Z24do_handle_one_connectionP3THD+0x1c2)[0x7febc7114022]
/usr/libexec/mysqld(handle_one_connection+0x4a)[0x7febc71140ca]
/lib64/libpthread.so.0(+0x7dc5)[0x7febc68b1dc5]
/lib64/libc.so.6(clone+0x6d)[0x7febc50b076d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7feb7c04f088): INSERT INTO msgrcpt (partition_tag, mail_id, rseqnum, rid, is_local, content, ds, rs, bl, wl, bspam_level, smtp_resp) VALUES ('0','uIvRucHxC_Vc','1','23','Y','C','P',' ','N','N','-7.398','250 2.0.0 from MTA(smtp:[127.0.0.1]:10025): 250 2.0.0 Ok: queued as C0BA418542')
Connection ID (thread ID): 174
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
171110 19:20:45 mysqld_safe Number of processes running now: 0
171110 19:20:45 mysqld_safe mysqld restarted
171110 19:20:45 [Note] /usr/libexec/mysqld (mysqld 5.5.52-MariaDB) starting as process 24531 ...
171110 19:20:45 InnoDB: The InnoDB memory heap is disabled
171110 19:20:45 InnoDB: Mutexes and rw_locks use GCC atomic builtins
171110 19:20:45 InnoDB: Compressed tables use zlib 1.2.7
171110 19:20:45 InnoDB: Using Linux native AIO
171110 19:20:45 InnoDB: Initializing buffer pool, size = 128.0M
171110 19:20:45 InnoDB: Completed initialization of buffer pool
171110 19:20:45 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 871863934
171110 19:20:45  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 871865424
171110 19:20:46  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
171110 19:20:46  InnoDB: Waiting for the background threads to start
171110 19:20:47 Percona XtraDB (http://www.percona.com) 5.5.49-MariaDB-38.0 started; log sequence number 871865424
171110 19:20:47 [Note] Plugin 'FEEDBACK' is disabled.
171110 19:20:48 [Note] Server socket created on IP: '0.0.0.0'.
171110 19:20:48 [Warning] Found an entry in the 'db' table with empty database name; Skipped
171110 19:20:48 [Note] Event Scheduler: Loaded 0 events
171110 19:20:48 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.52-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
171110 19:20:48  InnoDB: Assertion failure in thread 140515964544768 in file btr0btr.c line 2504
InnoDB: Failing assertion: btr_page_get_next(prev_block->frame, mtr) == buf_block_get_page_no(block)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/ … overy.html
InnoDB: about forcing recovery.

Do you know if anything will be hurt by completely dumping and restoring the msgrcpt table?

6 (edited by srosoftware 2017-11-13 04:51:00)

Re: amavisd SQL backup fills entire server

RESOLVED

ran mysqlcheck --all-databases
mysqlcheck failed (hung) checking the amavisd database, so I followed standard innodb repair process:

* stop mariadb service
* backup everything in the /var/lib/mysql folder
* add innodb_force_recovery = 1 in the my.cnf file and restart mariadb service
* attempt to dump the amavisd table (hung/failed)
* drop the affected table (amavisd)
* stop mariadb service and remove innodb_force_recovery from my.cnf file
* restart mariadb and import week old iRedmail amavisd sql backup from /var/vmail/backup/mysql folder.

mysql error logs are clean for a full day since import.

This was an interesting failure.  We had a disk fail a week before this.  We imported a backup VM and ran checks on the filesystem and databases, but this innodb corruption didn't show up until a week later. 

Very glad iRedmail configured daily SQL backups by default!  Without those, I would have needed to restore the entire mail server VM from a week old backup, then resync mail folders.