1

Topic: Activesync connection fail and web mail bad gateway after upgrade SOGo

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 0.9.9
- Deployed with iRedMail Easy or the downloadable installer? downloadable
- Linux/BSD distribution name and version: CentOS 7
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): LDAP
- Web server (Apache or Nginx): Nginx
- Manage mail accounts with iRedAdmin-Pro? Yes
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

After recently (May 2019) upgrade to SOGo 4.0.7, the Activesync will always fail after running for few hours to 1-2 days. Outlook and iPhone users will lost connection to mail server. When the issue occurs, the web mail page also cannot display and will show Bad Gateway. Restart server or sogo service will resolve the issue, but it will come back again after several hours.

When checking sogo.log, we noticed there will be always error message of resource unavailable prior and during the issue occurs, and one child process will keep changing the process id by terminating and spawning :

Jun 09 22:10:40 sogod [14755]: [ERROR] <0x0x55db8660ac60[WOWatchDogChild]> FAILURE receiving status for child 27238

Jun 09 22:10:40 sogod [14755]: [ERROR] <0x0x55db8660ac60[WOWatchDogChild]>   socket: <NGActiveSocket[0x0x55db8668cd60]: mode=<closed> address=(null) receive-timeout=5.000s>

Jun 09 22:10:40 sogod [14755]: [ERROR] <0x0x55db8660ac60[WOWatchDogChild]>   exception: <NGSocketShutdownDuringReadException: 0x55db8660c680> NAME:NGSocketShutdownDuringReadException REASON:the socket was shutdown INFO:{errno = 11; error = "Resource temporarily unavailable"; stream = "{object = 0x55db8668cd60;}"; }

Jun 09 22:10:40 sogod [14755]: <0x0x55db8660ac60[WOWatchDogChild]> sending terminate signal to pid 27238

Jun 09 22:10:40 sogod [14755]: <0x0x55db8660ac60[WOWatchDogChild]> child 27238 exited

Jun 09 22:10:40 sogod [14755]: <0x0x55db86631ff0[WOWatchDog]> child spawned with pid 27253
Jun 09 22:10:40 sogod [27253]: <0x0x55db86354120[WOHttpAdaptor]> notified the watchdog that we are ready
Jun 09 22:10:50 sogod [14755]: [ERROR] <0x0x55db8660ac60[WOWatchDogChild]> FAILURE receiving status for child 27253

Jun 09 22:10:50 sogod [14755]: [ERROR] <0x0x55db8660ac60[WOWatchDogChild]>   socket: <NGActiveSocket[0x0x55db865fb900]: mode=rw address=(null) connectedTo=<0x0x55db8672bd70[NGLocalSocketAddress]: /tmp/_ngsocket_14755_0x55db8635b290_000> receive-timeout=5.000s>

Jun 09 22:10:50 sogod [14755]: [ERROR] <0x0x55db8660ac60[WOWatchDogChild]>   exception: <NGSocketTimedOutException: 0x55db8662b790> NAME:NGSocketTimedOutException REASON:the socket was shutdown INFO:{errno = 11; error = "Resource temporarily unavailable"; stream = "{object = 0x55db865fb900;}"; }

Jun 09 22:10:50 sogod [14755]: <0x0x55db8660ac60[WOWatchDogChild]> sending terminate signal to pid 27253

Below is Nginx error log when the issue occurs, these lines will continuously show with different users:

2019/06/10 15:14:11 [error] 26028#0: *376 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.1.9, server: mail1.cehgroup.com, request: "POST /Microsoft-Server-ActiveSync?User=xxx@cehgroup.com&DeviceId=M899P2LBO92439O3965GVQ7EP0&DeviceType=iPhone&Cmd=Ping HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync?User=xxx@cehgroup.com&DeviceId=M899P2LBO92439O3965GVQ7EP0&DeviceType=iPhone&Cmd=Ping", host: "mail1.cehgroup.com"

2019/06/10 15:14:13 [error] 26028#0: *224 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 36.88.53.251, server: mail1.cehgroup.com, request: "POST /Microsoft-Server-ActiveSync?Cmd=FolderSync&User=yyy@flexindomas.cehgroup.com&DeviceId=A8DED9D0A32C4C0FAAB6DEBE4FACBE8F&DeviceType=WindowsOutlook15 HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync?Cmd=FolderSync&User=yyy@flexindomas.cehgroup.com&DeviceId=A8DED9D0A32C4C0FAAB6DEBE4FACBE8F&DeviceType=WindowsOutlook15", host: "mail1.cehgroup.com"

2019/06/10 15:14:13 [error] 26028#0: *216 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.1.129, server: mail1.cehgroup.com, request: "PROPFIND /SOGo/dav/zzz%40cehgroup.com/Calendar/personal/ HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/dav/zzz%40cehgroup.com/Calendar/personal/", host: "mail1.cehgroup.com"

2019/06/10 15:14:18 [error] 26028#0: *227 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.1.156, server: mail1.cehgroup.com, request: "PROPFIND /SOGo/dav/aaa%40cehgroup.com/Calendar/personal/ HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/dav/aaa%40cehgroup.com/Calendar/personal/", host: "mail1.cehgroup.com"

2019/06/10 15:14:36 [error] 26028#0: *110 connect() failed (110: Connection timed out) while connecting to upstream, client: 52.125.132.69, server: mail1.cehgroup.com, request: "POST /Microsoft-Server-ActiveSync?User=bbb%40in.cehgroup.com&DeviceId=3D68816F2758A3A0&DeviceType=Outlook&Cmd=Settings HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync?User=bbb%40in.cehgroup.com&DeviceId=3D68816F2758A3A0&DeviceType=Outlook&Cmd=Settings", host: "mail1.cehgroup.com"

2019/06/10 15:14:36 [error] 26028#0: *111 connect() failed (110: Connection timed out) while connecting to upstream, client: 52.125.132.69, server: mail1.cehgroup.com, request: "POST /Microsoft-Server-ActiveSync?User=bbb%40in.cehgroup.com&DeviceId=3D68816F2758A3A0&DeviceType=Outlook&Cmd=Sync HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync?User=bbb%40in.cehgroup.com&DeviceId=3D68816F2758A3A0&DeviceType=Outlook&Cmd=Sync", host: "mail1.cehgroup.com"

2019/06/10 15:14:38 [error] 26028#0: *119 connect() failed (110: Connection timed out) while connecting to upstream, client: 110.54.200.239, server: mail1.cehgroup.com, request: "POST /Microsoft-Server-ActiveSync?User=ccc@ph.cehgroup.com&DeviceId=ABHR60DPA17SNCHDLRJ254KQ6C&DeviceType=iPhone&Cmd=Ping HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync?User=ccc@ph.cehgroup.com&DeviceId=ABHR60DPA17SNCHDLRJ254KQ6C&DeviceType=iPhone&Cmd=Ping", host: "mail1.cehgroup.com"

2019/06/10 15:14:48 [error] 26028#0: *107 connect() failed (110: Connection timed out) while connecting to upstream, client: 52.125.132.69, server: mail1.cehgroup.com, request: "POST /Microsoft-Server-ActiveSync?User=bbb%40in.cehgroup.com&DeviceId=3D68816F2758A3A0&DeviceType=Outlook&Cmd=ResolveRecipients HTTP/1.1", upstream: "http://127.0.0.1:20000/SOGo/Microsoft-Server-ActiveSync?User=bbb%40in.cehgroup.com&DeviceId=3D68816F2758A3A0&DeviceType=Outlook&Cmd=ResolveRecipients", host: "mail1.cehgroup.com"

We tried to adjust sogo and nginx configures but still cannot resolve the issue.
Anyone has the similar issue? Any solutions or suggestion? Thanks!

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: Activesync connection fail and web mail bad gateway after upgrade SOGo

Did you try to increase preforked SOGo child processes (/etc/sysconfig/sogo)?

3

Re: Activesync connection fail and web mail bad gateway after upgrade SOGo

Yes, we have increased it to 200 since using iRedMail and all along it is working fine. We have also increased vMem size limit to 1024, but seems no improvement.

4

Re: Activesync connection fail and web mail bad gateway after upgrade SOGo

We have about 300 EAS devices. We have set preforked SOGo child processes to 200 but it is still not working.

5

Re: Activesync connection fail and web mail bad gateway after upgrade SOGo

You have 300 EAS devices, the processes should be set to slightly higher than 300.

BTW, 1 EAS connection (in SOGo) will use 1 long run http connection, 2 SQL connections, 20-50MB memory.