1 (edited by BeforeTheFall 2023-01-21 16:20:32)

Topic: Connection Timeout errors shortly after installing Let's Encrypt Certs

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 1.6.2
- Deployed with iRedMail Easy or the downloadable installer? Downloadable
- Linux/BSD distribution name and version: Ubuntu 20.04
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MariaDB
- Web server (Apache or Nginx):Nginx
- Manage mail accounts with iRedAdmin-Pro? No
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====
LOGS
====

Following are excerpts from logs I knew of to check...  Only thing I can make of them is there's seems to vbe a bad cert/key combination (or is that the dkim key?)
------
# cat /var/log/mail.log | grep error | more
Jan 21 04:08:52 casa postfix/smtps/smtpd[3086]: SSL_accept error from unknown[15
2.231.17.195]: lost connection

-----
cat /var/log/nginx/ | grep error | more
2023/01/21 03:48:59 [error] 757#757: *1489 open() "/var/www/html/media/wp-includ
es/wlwmanifest.xml" failed (2: No such file or directory), client: 159.223.45.10
6, server: _, request: "GET //media/wp-includes/wlwmanifest.xml HTTP/1.1", host:
"emailaddress.one"
2023/01/21 03:48:59 [error] 757#757: *1489 open() "/var/www/html/wp2/wp-includes
/wlwmanifest.xml" failed (2: No such file or directory), client: 159.223.45.106,
server: _, request: "GET //wp2/wp-includes/wlwmanifest.xml HTTP/1.1", host: "em
ailaddress.one"
2023/01/21 03:48:59 [error] 757#757: *1489 open() "/var/www/html/site/wp-include
s/wlwmanifest.xml" failed (2: No such file or directory), client: 159.223.45.106
, server: _, request: "GET //site/wp-includes/wlwmanifest.xml HTTP/1.1", host: "
emailaddress.one"
2023/01/21 03:49:00 [error] 757#757: *1489 open() "/var/www/html/cms/wp-includes
/wlwmanifest.xml" failed (2: No such file or directory), client: 159.223.45.106,
server: _, request: "GET //cms/wp-includes/wlwmanifest.xml HTTP/1.1", host: "em
ailaddress.one"
2023/01/21 03:49:00 [error] 757#757: *1489 open() "/var/www/html/sito/wp-include
s/wlwmanifest.xml" failed (2: No such file or directory), client: 159.223.45.106
, server: _, request: "GET //sito/wp-includes/wlwmanifest.xml HTTP/1.1", host: "
emailaddress.one"
2023/01/21 04:37:12 [error] 711#711: *860 FastCGI sent in stderr: "Primary scrip
t unknown" while reading response header from upstream, client: 194.38.20.161, s
erver: _, request: "GET /joobi/inc/openflashchart/php-ofc-library/ofc_upload_ima
ge.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9999", host: "ninesongmedia.com
"
2023/01/21 04:51:49 [error] 717#717: *155 open() "/var/www/html/actuator/health"
failed (2: No such file or directory), client: 198.199.95.27, server: _, reques
t: "GET /actuator/health HTTP/1.1", host: "170.187.142.226"

-----
cat /var/log/dovecot/*.log |grep error|more
Jan 21 01:23:58 casa dovecot: pop3-login: Disconnected (no auth attempts in 0 se
cs): user=<>, rip=24.239.78.182, lip=170.187.142.226, TLS handshaking: SSL_accep
t() failed: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certific
ate: SSL alert number 42, session=<J4M2BbzyGQUY7062>

-----
cat /var/log/syslog | grep error | more
Jan 21 04:08:52 casa postfix/smtps/smtpd[3086]: SSL_accept error from unknown[15
2.231.17.195]: lost connection
Jan 21 04:44:41 casa systemd[1]: Condition check resulted in Process error repor
ts when automatic reporting is enabled (file watch) being skipped.
Jan 21 04:44:41 casa kernel: [   16.902720] EXT4-fs (sda): re-mounted. Opts: err
ors=remount-ro

====
END OF LOGS
====

Visble Problem is that after a basic install, before installing Let's Encrypt Certificates, One can access the admin and webmail web pages (after approving the Exception for the self-signed ssl certs), add/edit domains, add/edit users, even send email through web interface to outside domains that aren't persnicative like Google, Microsft, and so on--then following a succesful certbot install and Let's Encrypt certificate generation, upon backing up the self signed certs,

# mv /etc/ssl/certs/iRedMail.crt{,.bak}
# mv /etc/ssl/private/iRedMail.key{,.bak}

then doing the symbolics links,

# ln -s /etc/letsencrypt/live/mx.mydomain.tld/fullchain.pem /etc/ssl/certs/iRedMail.crt
# ln -s /etc/letsencrypt/live/mx.mydomain.tld/privkey.pem /etc/ssl/private/iRedMail.key

/mail & /iredadmin pages load with the SSL locks appearing (suggesting that the SSL certs are making the browser happy), one can still login a few times/for a few minutes, THEN all the pages start returning "Connection Timed Out" errors.

I wondered if, given the inconsistent behavior, perhaps a service was stopping, so I did this :
systemctl status
State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Sat 2023-01-21 04:44:38 UTC; 2h 22min ago
   CGroup: /
           ├─user.slice
           │ └─user-0.slice
           │   ├─session-4.scope
           │   │ ├─  754 /bin/login -p --
           │   │ ├─ 2640 -bash
           │   │ └─10887 systemctl status
           │   └─user@0.service …
           │     └─init.scope
           │       ├─2632 /lib/systemd/systemd --user
           │       └─2635 (sd-pam)
           ├─init.scope
           │ └─1 /sbin/init
           └─system.slice
             ├─fail2ban.service
             │ └─700 /usr/bin/python3 /usr/bin/fail2ban-server -xf start
             ├─haveged.service
             │ └─504 /usr/sbin/haveged --Foreground --verbose=1 -w 1024
             ├─clamav-daemon.service
             │ └─706 /usr/sbin/clamd --foreground=true
             ├─systemd-networkd.service
             │ └─526 /lib/systemd/systemd-networkd
             ├─amavis.service
             │ ├─10741 /usr/sbin/amavisd-new (master)
│ ├─10741 /usr/sbin/amavisd-new (master)
             │ ├─10745 /usr/sbin/amavisd-new (virgin child)
             │ ├─10746 /usr/sbin/amavisd-new (virgin child)
             │ ├─10747 /usr/sbin/amavisd-new (virgin child)
             │ ├─10748 /usr/sbin/amavisd-new (virgin child)
             │ ├─10749 /usr/sbin/amavisd-new (virgin child)
             │ ├─10750 /usr/sbin/amavisd-new (virgin child)
             │ ├─10751 /usr/sbin/amavisd-new (virgin child)
             │ └─10752 /usr/sbin/amavisd-new (virgin child)
             ├─systemd-udevd.service
             │ └─439 /lib/systemd/systemd-udevd
             ├─cron.service
             │ └─671 /usr/sbin/cron -f
             ├─nginx.service
             │ ├─716 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             │ └─717 nginx: worker process
             ├─mariadb.service
             │ └─846 /usr/sbin/mysqld
             ├─polkit.service
             │ └─687 /usr/lib/policykit-1/polkitd --no-debug ├─networkd-dispatcher.service
             │ └─684 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers

             ├─ModemManager.service
             │ └─737 /usr/sbin/ModemManager
             ├─systemd-journald.service
             │ └─394 /lib/systemd/systemd-journald
             ├─atd.service
             │ └─696 /usr/sbin/atd -f
             ├─unattended-upgrades.service
             │ └─743 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
             ├─ssh.service
             │ └─841 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
             ├─snapd.service
             │ └─689 /usr/lib/snapd/snapd
             ├─clamav-freshclam.service
             │ └─809 /usr/bin/freshclam -d --foreground=true
             ├─rsyslog.service
             │ └─688 /usr/sbin/rsyslogd -n -iNONE
             ├─netdata.service
│ ├─2001 /opt/netdata/bin/srv/netdata -P /opt/netdata/var/run/netdata/netdata.pid -D
             │ ├─2030 /opt/netdata/bin/srv/netdata --special-spawn-server
             │ ├─2202 /opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin 3
             │ ├─2204 /opt/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin 3
             │ ├─2210 /opt/netdata/usr/libexec/netdata/plugins.d/apps.plugin 3
             │ ├─2216 /usr/bin/python3 /opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin 3

             │ └─9128 bash /opt/netdata/usr/libexec/netdata/plugins.d/tc-qos-helper.sh 3
             ├─iredadmin.service
             │ ├─718 /usr/bin/uwsgi --ini /opt/www/iredadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/iredadmin/iredadmin.pid
             │ ├─960 /usr/bin/uwsgi --ini /opt/www/iredadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/iredadmin/iredadmin.pid
             │ ├─961 /usr/bin/uwsgi --ini /opt/www/iredadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/iredadmin/iredadmin.pid
             │ ├─962 /usr/bin/uwsgi --ini /opt/www/iredadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/iredadmin/iredadmin.pid
             │ ├─964 /usr/bin/uwsgi --ini /opt/www/iredadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/iredadmin/iredadmin.pid
             │ └─966 /usr/bin/uwsgi --ini /opt/www/iredadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/iredadmin/iredadmin.pid
             ├─system-postfix.slice
             │ └─postfix@-.service
             │   ├─ 1981 /usr/lib/postfix/sbin/master -w
             │   ├─ 1983 qmgr -l -t unix -u
             │   ├─ 3852 tlsmgr -l -t unix -u
             │   ├─ 7727 pickup -l -t unix -u -o content_filter=smtp-amavis:[127.0.0.1]:10026
             │   └─10064 showq -t unix -u
             ├─dovecot.service
             │ ├─814 /usr/sbin/dovecot -F
             │ ├─871 dovecot/lmtp -L│   └─10064 showq -t unix -u
             ├─dovecot.service
             │ ├─814 /usr/sbin/dovecot -F
             │ ├─871 dovecot/lmtp -L
             │ ├─872 dovecot/anvil
             │ ├─873 dovecot/log
             │ ├─874 dovecot/lmtp -L
             │ ├─875 dovecot/lmtp -L
             │ ├─876 dovecot/lmtp -L
             │ ├─877 dovecot/lmtp -L
             │ ├─878 dovecot/config
             │ └─883 dovecot/stats
             ├─iredapd.service
             │ └─1108 /usr/bin/python3 /opt/iredapd/iredapd.py
             ├─systemd-resolved.service
             │ └─3672 /lib/systemd/systemd-resolved
             ├─php7.4-fpm.service
             │ ├─ 686 php-fpm: master process (/etc/php/7.4/fpm/php-fpm.conf)
             │ ├─9690 php-fpm: pool inet
             │ ├─9692 php-fpm: pool inet
             │ ├─9694 php-fpm: pool inet
             │ ├─9697 php-fpm: pool inet
             │ ├─9699 php-fpm: pool inet
             │ └─9892 php-fpm: pool inet
             ├─dbus.service
             │ └─672 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
             ├─systemd-timesyncd.service
             │ └─491 /lib/systemd/systemd-timesyncd
             ├─system-getty.slice
             │ └─getty@tty1.service
             │   └─815 /sbin/agetty -o -p -- \u --noclear tty1 linux
             ├─mlmmjadmin.service
             │ ├─719 /usr/bin/uwsgi --ini /opt/mlmmjadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/mlmmjadmin/mlmmjadmin.pid
             │ ├─898 /usr/bin/uwsgi --ini /opt/mlmmjadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/mlmmjadmin/mlmmjadmin.pid
             │ ├─899 /usr/bin/uwsgi --ini /opt/mlmmjadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/mlmmjadmin/mlmmjadmin.pid
             │ ├─900 /usr/bin/uwsgi --ini /opt/mlmmjadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/mlmmjadmin/mlmmjadmin.pid
             │ ├─901 /usr/bin/uwsgi --ini /opt/mlmmjadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/mlmmjadmin/mlmmjadmin.pid
             │ └─902 /usr/bin/uwsgi --ini /opt/mlmmjadmin/rc_scripts/uwsgi/debian.ini --pidfile /var/run/mlmmjadmin/mlmmjadmin.pid
             └─systemd-logind.service
               └─693 /lib/systemd/systemd-logind

THEN I thought maybe the machine was running out of memory, so I rescaled my Linode from 4GB 2 Core to 8GB 4 Core--No Change.

Next I read up on and tried various tweeks of the /etc/hosts file...it may have mistakes, BUT, wouldn't they either work or not, versus work for a little then stop?

Then I scoured back and forth through the DNS records; DMarc, DKIM, SPF, MX, A...just can't see anything wrong, and even if there is, again, why would it work fine for a bit then fail...I'm sure there are ways, but, I'm just not guessing what they might be.

In the process I tried many rebuilds of the node from scratch, employing Ubuntu18.02, 20.04, and 22.04; I tried different sources for certbot; and several restarts ago I tried all sorts of config file changes I read about different places on this forum and elsewhere, and many times I used the command below yielding no errors...  Nevertheless, I just tried again an hour or so ago and got this...  A clue perhaps?

#amavisd-new testkeys
TESTING#1 MyDomain.TLD: dkim._domainkey.mydomain.tld => fail (bad RSA signature)

I put up iRedMail 1/4/0 a few months ago, on a Ubuntu instance, with Let's Encrypt cert, and five domains.  Took all of two days, and ran trouble free until Jan 2, then suddenly started rejecting SMTP connections, and I came to find that one of My registrars had reset the DNS records for three of my domains that the mail server had been serving, so I blamed that for the crash, at first, even though the troubles weren't limited to those thre, then I thought something in the network policy at Linode may have changed, perhaps via their Network Helper feature that is on by default, and actually changes config files on client machines, or perhaps it was attributable to some aspect of the gestapo ramrodding of ipv6 services on the world, but, again, none of that seemed to pan out.

So, as of this writing, I've been at this night and day for about 20 days now...  I can't shake the feeling there is something super simple which I am daftly missing, perhaps due in part to a dose of fatugue, adding to my already generous endowment of inate ineptitude. 

'Be most grateful if someone will be so kind as to enlighten me.  Thanks to all in advance!

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: Connection Timeout errors shortly after installing Let's Encrypt Certs

Well, the error logs are very common for ANY mailserver which is connected to the internet:

- bots checking for wordpress and/or any other exploitable kit and try to RCE it
- bots portscanning and/or trying to bruteforce mailserver as open relay ( or just gaining information, likely censys scanner)

for your other problems, it COULD be that you get autobanned by fail2ban, but i can't tell you exactly what is wrong

i could check into your config with proper domain informations if you wish, but overall i don't think it will be hard to fix

3

Re: Connection Timeout errors shortly after installing Let's Encrypt Certs

Most grateful for your reply.

Yes, I'm certainly no expert, but, the stuff in the logs looked pretty familiar, I just wondered if the mention of a bad certificate might be a clue.

Clearly my troubleshooting skills are currently little hit and mostly miss, so please let me know what you need me to do so as for you to have a better look. :-)

Cthulhu wrote:

Well, the error logs are very common for ANY mailserver which is connected to the internet:

- bots checking for wordpress and/or any other exploitable kit and try to RCE it
- bots portscanning and/or trying to bruteforce mailserver as open relay ( or just gaining information, likely censys scanner)

for your other problems, it COULD be that you get autobanned by fail2ban, but i can't tell you exactly what is wrong

i could check into your config with proper domain informations if you wish, but overall i don't think it will be hard to fix

4

Re: Connection Timeout errors shortly after installing Let's Encrypt Certs

Naturally, I've now started reading up on fail2ban...

I have a hunch that I was indeed getting my own IP banned, which would be consistent with it working briefly then stopping dead, as well as periodically reenabling.

- At the time of the prior server's crash, I had a couple dozen email accounts being accessed via my local Thunderbird mail client
- When I switched my dns away from the registrars, and created the new iRedMail instance, I recreated all the accounts by hand, and changed all the passwords server side.
- I was slowly working through the accounts on the local mail client, updating the passwords, and other connection changes
- Meanwhile, some number of the accounts were regularly attempting to autoconnect behind the scenes while awaiting ther turn to be updated to new criteria and credentials, thus may very well have been provoking an IP ban which would presumably affect everything going through my local router's IP, seen by the outside world.

So, I FINALLY did the obvious thing and checked the site via my phone, while NOT using my local wireless connection, and sure enough the /mail and /iredadmin pages came right up with SSL locks in place.

Coincidentally, just after doing so, successfully logging in checking and sending mail, I jumped back to the desktop machine and it worked as well--perhaps because I have NOT launched the local Thunderbird client to trigger more failed login attempts and reban my IP.

I ran this

fail2ban-client status sshd
Status for the jail: sshd
|- Filter
|  |- Currently failed: 3
|  |- Total failed:     463
|  `- File list:        /var/log/auth.log
`- Actions
   |- Currently banned: 2
   |- Total banned:     50
   `- Banned IP list:   218.152.37.209 211.192.41.14

Neither of those IPs are my own, but then I did not expect that would be the case, given that the web pages are again answering at my local IP.

Debating whether I should whitelist my home IP, given that while it seldom changes, it certainly can at any given moment.

Would it be appropriate to do so in
/etc/fail2ban/jail.local by adding the ip to this argument?
ignoreip    = 127.0.0.1 127.0.0.0/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16

5

Re: Connection Timeout errors shortly after installing Let's Encrypt Certs

mail logins do not trigger sshd, you won't find yourself there

6 (edited by BeforeTheFall 2023-01-22 07:52:45)

Re: Connection Timeout errors shortly after installing Let's Encrypt Certs

ah yes, duh me, I meant to post the postfix one which was also devoid of my home IP..

The only hit was on the

fail2ban-client status dovecot
which DID reveal my home IP on the list

SO I did
fail2ban-client unban [my home IP]
and for now I added my home IP to the ignoreip argument in /etc/fail2ban/jail.local and restarted the machine

Web mail (Roundcibe) is now working inbound and outbound, web pages showing the valid SSL lock.  And, of course Google and Outlook are rejecting mail with mention of my ip address being in a banned Linode IP address-block range.  Fortunately, I've kept the IP address from the previously working server instance, and I see there is a way to transfer said IP from that Node to the new one, potentially reclaiming the various earlier efforts at site reputation improvement and associated registrations I went through there to open the path to those One Ring to Rule Them All, Mordorian domains.

I'm guessing my future issues with getting the client SMTP connections going should be dealt with in new post, or other existing posts, if necessary?

Not sure what the etiquette is here, as far as marking it solved or marking a comment as "the answer", BUT, checking out fail2ban as Cthulhu https://forum.iredmail.org/user51448.html suggested was the clue.