1

Topic: 504 Gateway Timeout / uwsgi: IOError

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release):  0.9.9
- Deployed with iRedMail Easy or the downloadable installer? installer
- Linux/BSD distribution name and version: Ubuntu 16.04
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MySql
- Web server (Apache or Nginx): nginx
- Manage mail accounts with iRedAdmin-Pro? Yes
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

Woke up today to learn that iredadmin interface is now returning an /'504 Bad Gateway' error.

Digging thru logs, I find in syslog:
Aug 28 10:23:41 mail-cluster-1 uwsgi: Wed Aug 28 10:23:41 2019 - uwsgi_response_write_body_do(): Broken pipe [core/writer.c line 331] during GET /iredadmin/dashboard?checknew (x.x.x.x)
Aug 28 10:23:41 mail-cluster-1 uwsgi: IOError
Aug 28 10:23:41 mail-cluster-1 uwsgi: :
Aug 28 10:23:41 mail-cluster-1 uwsgi: write error
Aug 28 10:23:41 mail-cluster-1 uwsgi:
Aug 28 10:23:41 mail-cluster-1 uwsgi: my.server [pid: 2134|app: 1|req: 7/15] x.x.x.x () {62 vars in 1164 bytes} [Wed Aug 28 10:22:34 2019] GET /iredadmin/dashboard?checknew => generated 21742 bytes in 66790 msecs (HTTP/1.1 200) 2 headers in 146 bytes (0 switches on core 0)

In other forum posts it was suggested to kill/restart uwsgi... did that, no change. I also have restarted this entire server instance, and the problem persists. I fail to see what it could possibly be hanging on. Any suggestions?

Mike-

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: 504 Gateway Timeout / uwsgi: IOError

I should add that I have restored the virtual server image from 2 weeks prior, at a time when I know for a fact that iredadmin was in fact working, and it still even fails on this, so this would preclude any sort of software or config file change. I think it means then there is some sort of db issue affoot, but I can't seem to isolate calls to the database from iredadmin alone, the constant hammering of active users of the mail system gets this lost in the noise.

Would love some debugging help here.

Mike-

3

Re: 504 Gateway Timeout / uwsgi: IOError

I think I found the issue and a workaround.

Entering the iredadmin, the code executes an sql query in order to gather statistics for the dashboard:

   -- Get number of incoming mails.
                        SELECT COUNT(msgs.mail_id) AS total
                        FROM msgs
                        LEFT JOIN msgrcpt ON (msgs.mail_id = msgrcpt.mail_id)
                        LEFT JOIN maddr AS sender ON (msgs.sid = sender.id)
                        LEFT JOIN maddr AS recip ON (msgrcpt.rid = recip.id)
                        WHERE msgs.quar_type <> 'Q'  AND recip.domain IN (list of my hosted domains)  AND msgs.time_num > UNIX_TIMESTAMP(DATE_SUB(NOW(), INTERVAL 86400 SECOND))

Doing query profiling, I discovered on my system that this query was taking just over 30 seconds to complete. I don't know why it's so dammed dog assed slow, (there are > 550,000 records however, but I still don't think it should be this bad). However, the issue is that there is an interaction between uswgi and nginx; If uswgi does not return in less than 30 seconds, nginx will assume that the script has an error of some kind and it will take it upon itself to close the connection and report a gateway timeout.

A workaround is to edit /etc/nginx/templates/iredadmin.tmpl and add the following to the section "location ~ ^/iredadmin(.*)":

uwsgi_read_timeout 600;
uwsgi_send_timeout 600;
uwsgi_connect_timeout 60;


and reload nginx.

What this does it to tell nginx to give uwsgi more time to return results before deciding that its not working and aborting it itself.

Now, the next issue is - why is this sql taking so long to execute?

4

Re: 504 Gateway Timeout / uwsgi: IOError

- Try to clean up old SQL records with script "tools/cleanup_amavisd_db.py":

cd /opt/www/iredadmin/tools/
python cleanup_amavisd_db.py

- It's hard to know why without digging. But 550000 records is not that much, it shouldn't take 30 seconds.

5

Re: 504 Gateway Timeout / uwsgi: IOError

ZhangHuangbin wrote:

- Try to clean up old SQL records with script "tools/cleanup_amavisd_db.py":

cd /opt/www/iredadmin/tools/
python cleanup_amavisd_db.py

- It's hard to know why without digging. But 550000 records is not that much, it shouldn't take 30 seconds.


The cleanup already happens in cron and I checked running it by hand and yeah, 550000 was current count. Something just makes the DB dog assed slow and the problem doesn't show up until you get to this amount of entries. For the curious, my DB is mysql+wsrep with 4 hosts on the lan @ 10gbps and serving the db from local disk.

At least we now have a resolution for 'bad gateway'.

6

Re: 504 Gateway Timeout / uwsgi: IOError

Did you keep old records for a long time? Check parameters below in both /opt/www/iredadmin/settings.py and /opt/www/iredadmin/libs/default_settings.py:

AMAVISD_REMOVE_MAILLOG_IN_DAYS = 
AMAVISD_REMOVE_QUARANTINED_IN_DAYS = 

Try to reduce the days and run "python /opt/www/iredadmin/tools/cleanup_amavisd_db.py" manually to remove old records.

7

Re: 504 Gateway Timeout / uwsgi: IOError

ZhangHuangbin wrote:

Did you keep old records for a long time? Check parameters below in both /opt/www/iredadmin/settings.py and /opt/www/iredadmin/libs/default_settings.py:

AMAVISD_REMOVE_MAILLOG_IN_DAYS = 
AMAVISD_REMOVE_QUARANTINED_IN_DAYS = 

Try to reduce the days and run "python /opt/www/iredadmin/tools/cleanup_amavisd_db.py" manually to remove old records.

AMAVISD_REMOVE_MAILLOG_IN_DAYS = 3
AMAVISD_REMOVE_QUARANTINED_IN_DAYS = 7


As I said, the cleanup does run from cron already and the above are the default settings I find here.

The issue I found was that the database was just taking a very long time to produce the data for the dashboard, and very specfically, the database query I referenced above:;

  SELECT COUNT(msgs.mail_id) AS total
                        FROM msgs
                        LEFT JOIN msgrcpt ON (msgs.mail_id = msgrcpt.mail_id)
                        LEFT JOIN maddr AS sender ON (msgs.sid = sender.id)
                        LEFT JOIN maddr AS recip ON (msgrcpt.rid = recip.id)
                        WHERE msgs.quar_type <> 'Q'  AND recip.domain IN (list of my hosted domains)  AND msgs.time_num > UNIX_TIMESTAMP(DATE_SUB(NOW(), INTERVAL 86400 SECOND))

This was taking > 30 seconds to complete. My database a replicated SQL w/wsrep. The performance problem was that somehow I had bungled the innodb_buffer_pool_size setting and had left it at a paltry 122M. Bumping this single parameter to 6G immediately lowered the time for this query down to ~3 seconds. I still feel it should run a lot faster than even that but it's acceptable and nginx doesn't time out any longer, although I did up those timeouts to as per above.