1

Topic: Slow response & degraded performance at 09:00 UTC every day

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 1.6.8 MARIADB edition.
- Deployed with iRedMail Easy or the downloadable installer? Downloadable
- Linux/BSD distribution name and version: Debian GNU/Linux 12 (bookworm), Kernel: Linux 6.1.0-18-cloud-amd64
- Store mail accounts in which backend (LDAP/MySQL/PGSQL):  MySQL
- Web server (Apache or Nginx): Nginx
- Manage mail accounts with iRedAdmin-Pro? No
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

Hello,

I am noticing reduced responsiveness and performance of my iRedMail server at precisely 09:00 UTC every day.

I'm unsure what is causing this. I've looked through crontab and /etc/cron.d entries and can't find anything that starts at this time.

My iRedMail server is all defaults & nothing extra with the exception of CertBot, though CertBot doesn't run at 09:00 UTC.

Would anyone know of a process that starts at this time?

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2 (edited by wayne.workman 2024-03-27 01:55:36)

Re: Slow response & degraded performance at 09:00 UTC every day

Interestingly, despite 9 UTC being the exact problem time for the last week+, the last problem time was 9:51 UTC.

I've been continuing to dig into this, and setup some process-level CPU & RAM monitoring. The issue appears to be something to do with fail2ban, but I am not sure exactly what. The logging I setup gets inconsistent and out of sync during the problem time.

Looking into fail2ban logs, I see a big gap during the problem period starting 10 minutes before the issue. There were no logs from this period written by fail2ban that I can find. The logs within /var/log/fail2ban.log are silent from 9:40 UTC until an automated recovery reboot which occurred at 10:12 UTC.

It's strange, I am continuing to dig. I'm also evaluating replacing fail2ban with CrowdSec.

3

Re: Slow response & degraded performance at 09:00 UTC every day

Is it running cron job to backup SQL databases at that time?

4

Re: Slow response & degraded performance at 09:00 UTC every day

ZhangHuangbin wrote:

Is it running cron job to backup SQL databases at that time?

I see inside crontab, the mysql backup occurs at 3:30 UTC. I forgot to mention this issue lasts for 20 minutes straight every night, until an automated reboot happens to fix it. I'm continuing to dig, but would appreciate any pointers.

5

Re: Slow response & degraded performance at 09:00 UTC every day

Every day I've been conducting a different test in attempt to determine what the problem is, and what the problem is not. I think the test that ran overnight finally showed what it is. ClamAV.

Now, I need to deep dive into that, and figure out how I can keep it, but it not bring my system down every night with scans.

6

Re: Slow response & degraded performance at 09:00 UTC every day

i guess it is freshclam then

7 (edited by wayne.workman 2024-03-30 23:14:34)

Re: Slow response & degraded performance at 09:00 UTC every day

Cthulhu wrote:

i guess it is freshclam then

You are exactly right, it is. After more digging, I've identified that clamav-freshclam is what initiates the problem. I am unsure, but I  think that clamav-daemon is the actual problem.

There has been a change in default configuration values for ClamAV between 0.104 and 0.105. There are several open GitHub issues about this, people are saying it's causing performance problems and increased scan times. My old iRedMail server that I built a couple years ago didn't have this issue, but my newly built one from maybe a month ago has this issue I've been digging into within this thread.

I found a tutorial online about fixing high resource usage via a systemd method.

I've modified this file:
/lib/systemd/system/clamav-daemon.service

And have added the following lines underneath the [Service] section of the file.

IOSchedulingPriority = 7
CPUSchedulingPolicy = 5
MemoryLimit=256M
CPUQuota=30%
Nice = 19

Then, I've reloaded configurations and restarted clamav-daemon with the following commands:

systemctl daemon-reload
systemctl restart clamav-daemon


I will report back if this has helped or not in the coming days.

8

Re: Slow response & degraded performance at 09:00 UTC every day

Also I just came across this post from another website that could be related. I'll have to test each potential solution one at a time:

So, after days of testing how to control clamd to not overload server I found my solution. My server went down every morning and I founded that was affected by the fact that during a database reload clamd will load the new database (freshclam) first and then drop the old one. This database reloading strategy allows to keep scanning files while loading the new database, and clamd at this time requires twice or more much memory as during normal scanning/operations. So I have changed << ConcurrentDatabaseReload >> to << no> in << /etc/clamd.conf >> and after that I had no issues.

9

Re: Slow response & degraded performance at 09:00 UTC every day

Clamd 0.103.0 release notes say this:

Major changes

    clamd can now reload the signature database without blocking scanning. This multi-threaded database reload improvement was made possible thanks to a community effort.
        Non-blocking database reloads are now the default behavior. Some systems that are more constrained on RAM may need to disable non-blocking reloads, as it will temporarily consume double the amount of memory. We added a new clamd config option ConcurrentDatabaseReload, which may be set to no.

10

Re: Slow response & degraded performance at 09:00 UTC every day

Thanks for sharing. smile

11 (edited by wayne.workman 2024-04-02 07:22:42)

Re: Slow response & degraded performance at 09:00 UTC every day

My issue is now resolved.

ROOT CAUSE:

Clamd 0.103.0 released a new feature to reload the signature database without blocking scanning. When this new feature is used, the amount of memory used by clamd is doubled briefly. This new functionality is enabled by default. On my iRedMail server, I have 4GB of ram. Clamd was using this new feature and consuming all available memory, causing system unresponsiveness.


CHOSEN SOLUTION:

Turn off the new clamd feature.

Modify this file:
/etc/clamav/clamd.conf

Add this line:
ConcurrentDatabaseReload false

Save the file. Then do the following to restart clamd:
systemctl restart clamav-daemon

OTHER SOLUTION:

Increase system memory to allow for the new clamd feature to operate.

12

Re: Slow response & degraded performance at 09:00 UTC every day

"ConcurrentDatabaseReload no" will be the default setting in next iRedMail release.