1

Topic: iRedMail trouble with antiSpamWithSieve

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release):
- Linux/BSD distribution name and version:
- Store mail accounts in which backend (LDAP/MySQL/PGSQL):
- Web server (Apache or Nginx):
- Manage mail accounts with iRedAdmin-Pro?
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
======== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 0.9.8 MARIADB edition
- Linux/BSD distribution name and version: CentOS 7
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MySQL
- Web server (Apache or Nginx): Ngnix
- Manage mail accounts with iRedAdmin-Pro? not yet, trying to get basic functionality working.
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

Hi,

New iRedMail user, trying to get user spam training working. I followed the instructions here:

https://forum.iredmail.org/topic13615-u … pdate.html

I initialized the database with the sample spam, and afterwards wanted to check if it was actually training. I used my IMAP client to move a few messages into the Junk folder, then tried the command:

 sa-learn --dump 

but got error

 dbg: bayes: unable to initialize database for root user, aborting! 

Then tried

 # sudo --user vmail sa-learn --dump 

and received this at the top of the output

 0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0         65          0  non-token data: ntokens

The non-token data:nspam does not seem to increment itself past 1, and ntokens does not seem to increment past 65. I am assuming these came from the sample spam.

I'm not sure what to look at here. I turned on the $sa_debug = 1; in /etc/amavis/amavis.conf and that does seem to indicate some activity in /var/log/maillog with incoming SMTP connections, but I am not noticing anything from the training at all. I'm really at a loss for what to look at/change to make user training work. I am simply trying to get spam/ham training working for users using IMAP by having them manually move mails in/out of their Junk folders.

Are there any thoughts what I might look at here and how to get more logging/confirmation of the learning process taking place? Where am I going wrong?

Thanks so much

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: iRedMail trouble with antiSpamWithSieve

Zhang, I think there was some trouble submitting the post, The web server kept throwing an error about how something went wrong sending an email, so in my attempt to post it (back->click submit again) it posted a duplicate while throwing the same error, while never indicating it actually posted it the first (or second) time. I deleted the duplicate post.

3

Re: iRedMail trouble with antiSpamWithSieve

You didn't mention how you configure the SpamAssassin and Dovecot, so it's hard to understand how it works in your system.

- Check which system user account is used to run 'sa-learn'
- Run sa-learn with correct system user account name: 'sa-learn -u <user> ...'

4

Re: iRedMail trouble with antiSpamWithSieve

ZhangHuangbin wrote:

You didn't mention how you configure the SpamAssassin and Dovecot, so it's hard to understand how it works in your system.

- Check which system user account is used to run 'sa-learn'
- Run sa-learn with correct system user account name: 'sa-learn -u <user> ...'

Ah-ha! Thank you, that was my missing link. I needed to use 'sa-learn -u user@domain.com --dump' to get the Bayes stats for the particular user (me) I was testing to train.

So that is showing it is learning at the moment, but now I ran into bad situation training it.
I tried about 10 at a time, and that seemed to work okay. I then did an IMAP move of about 300 emails into two training folders I made (TrainGood, TrainSpam).  4 Hours later, it was still working on moving/training them. The CPU usage was between 30-60% the whole time, I/O charts were very slow, the actual processes of sa-learn (I noted up to 2 at a time running) seemed to be only taking 5-15% cpu (viewed with top).  The mail server however effectively halted its other processing and during this time the mail server was too slow to use at all. This seemed very strange since it appeared not fully loaded, but I couldn't figure out why this wasnt working.

I decided to stop the process and revert to a non-learning mode to see what effect that had - I commented out the modified plugin section of dovecot.conf, restarted dovecot and amavisd, and all went back to normal. The IMAP client seemed to complete its pending operations, and I was able to use the mail server again.

I tested by moving a few hundred emails, and they went almost instantly.

Then I re-enabled the anti spam portion of dovecot.conf, restarted dovecot and amavisd again, and tested a little at a time. By the time I was confident enough I tried moving 90 emails that it would train as spam. This operation took it about 60 seconds to do (~1.5/second). The system is running on 4 CPUs x Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz, running ESXi with only one VM, this mail server on CentOS 7. Somehow I thought it would be much faster, as my much slower 5 year old mac mini seemed to train faster than that. Is this normal? I'm not sure why that would be so slow.

But then came the real kicker - I looked at the spam training folder that messages were copying to when I had to first stop dovecot after 4 hours. It has a bunch of spam it had trained, and then it had 256 copies of a spam in the middle. Then there were 44 more unique spams, and then another 256 of a different spam in the middle. Then some more unique ones.
I can't imagine what went wrong initially to cause that, or how to ensure it does not happen again...

Have you ever seen anything like this?  I don't know what to look at or monitor to see what is going on here.

Thank you for your help here. My goal is to get this working well/reliably enough to make this my permanent home server solution for at least the next few years with the iRedAdmin-Pro. This has just been a painful struggle for some reason.

5

Re: iRedMail trouble with antiSpamWithSieve

If I am understanding correctly, as far as training speed goes, it is possibly so slow because of sa-learn's long startup time, and it is started once for each individual email moved.

It is nice to be able to train one at a time when in small quantities, but Perhaps someone knows of a better way for all users to train a bunch of emails at once?

The training time however does not explain though the problem of 2 different spams getting turned into 256 copies each.  That is probably the most critical issue, since it could be related to whatever seemed to turn the  map move/sa-learns into a process that essentially took over the servers resources until I forcibly shut down those processes.

Ideas?

6

Re: iRedMail trouble with antiSpamWithSieve

I expected you to read my reply in the link you posted in first post:
https://forum.iredmail.org/post59958.html#p59958

If you call 'sa-learn' every time you move a message (and multiple calls if you move multiple messages), it's very slow, and webmail may hang and no response.

A better way is: configure the plugin to save a copy of moved message, and call 'sa-learn' every, e.g. 10 minutes to learn all moved message at the same time (only one 'sa-learn' process). this way sa-learn runs fast.

7

Re: iRedMail trouble with antiSpamWithSieve

I looked at this again, and while I have figured out how to make sieve copy them into a different folder for later training, a script to train the folders of all the virtual users completely eludes me.

That however does not seem to be my issue at the moment. 

When I select a few messages for training, for example simply in/out of the Junk folder, Something seems to start looping on particular spam messages, causing what I am seeing now as a gradual ramp up to 100%cpu usage.  For example, a move of ~30 emails into Junk just now caused the mail client to quickly balloon into moving 3000+ messages, but even when it says it has completed them, it is still looping and waiting. It will not stop until I quite the IMAP client (Mail.app), delete the files that store the still pending move operations, delete the envelope indexes to force it to rebuild (not sure why, but thats not important right now), then on the mail server restart the dovecot/amavisd.  Then the sa-learn processes finally stops.

After starting the IMAP client again and reloading/rebuilding the envelope index, I now notice that certain emails in the Inbox or Junk folders now exist with up to 256 copies of each.

imapsync is now reporting that in my account alone I now have 86511 duplicates.. and I am in the process now of trying to back up all the mailboxes and then will attempt to use imapsync to remove the duplicates from them.

Obviously something is very wrong, but I do not see where the problem could be.

8

Re: iRedMail trouble with antiSpamWithSieve

I'm afraid that others cannot help without any detail of your implementation. sad

- Is it problem of your sieve rule/script?
- Problem of other part?

9

Re: iRedMail trouble with antiSpamWithSieve

Yes, I need to explore this further. It may be due to rules I tried adding to allow SPAM and HAM training from 2 alternate IMAP boxes.

In the meantime I have 379 trained HAM and over 400 trained SPAM under my username now, but I noticed that Bayes is still not being used. I changed the log levels in amavisd to see, and saw this:

 Jul 10 02:47:38 dark amavis[54164]: SA dbg: bayes: using username: amavis
Jul 10 02:47:38 dark amavis[54164]: SA dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x59765a8)
Jul 10 02:47:38 dark amavis[54164]: SA dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x5911d38) implements 'learner_is_scan_available', priority 0
Jul 10 02:47:38 dark amavis[54164]: SA dbg: bayes: database connection established
Jul 10 02:47:38 dark amavis[54164]: SA dbg: bayes: found bayes db version 3
Jul 10 02:47:38 dark amavis[54164]: SA dbg: bayes: Using userid: 3
Jul 10 02:47:38 dark amavis[54164]: SA dbg: bayes: not available for scanning, only 12 spam(s) in bayes DB < 200

It looks like the routines are being run by user amavis instead of mine, and looking at the database it shows:

 $ sa-learn -u amavis --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0         12          0  non-token data: nspam
0.000          0        611          0  non-token data: nham

Maybe this is where all the auto-training is going, because my username shows only what I trained with moving in IMAP with the sieve rules:

 0.000          0          3          0  non-token data: bayes db version
0.000          0        451          0  non-token data: nspam
0.000          0        379          0  non-token data: nham

I can't seem to find how this is happening, I never changed the config files to use a common user name for the Bayes

10 (edited by elf 2018-07-11 03:41:46)

Re: iRedMail trouble with antiSpamWithSieve

Okay, I think possibly I made a mistake in this config .. Can anyone look at this and tell me whats wrong here?  I'm afraid I don't understand this part very well.

use_bayes          1
bayes_auto_learn   1
bayes_auto_expire  1

# Store bayesian data in MySQL.
# Please make sure you have correct server address, port and database name.
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn      DBI:mysql:sa_bayes:127.0.0.1:3306

# Store bayesian data in PostgreSQL.
# Please make sure you have correct server address, port and database name.
#bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
#bayes_sql_dsn      DBI:Pg:sa_bayes:127.0.0.1:5432

# SQL username and password.
bayes_sql_username sa_bayes_admin
bayes_sql_password RemovedMyAdminPasswordForPosting

# Override the username used for storing data in the database.
# This could be used to group users together to share bayesian filter data.
# You can also use this config option to trick sa-learn to learn data as a specific user.
#bayes_sql_override_username vmail

[Edit- hit submit too early]
I noticed that my last line for override username is still commented out, and shows user vmail. this is I believe how this iRedMail config file came. The example I was trying to follow from https://docs.iredmail.org/store.spamass … n.sql.html shows the line uncommented, with the username specified as amavis.

11

Re: iRedMail trouble with antiSpamWithSieve

To try to clarify, the reason I never uncommented this line and left it alone is because I wanted to have each user's spam/ham training to be separate so that their own individual Bayes database would be used for their own email scanning. Yet what seems to be happening is that if they train manually via IMAP copies to folders to run the sieve scripts, the training happens under their own username, but when main comes in, it looks like it gets scanned under username amavis, with auto training, so the amavis user HAM database is growing in the hundreds while it won't use Bayes to scan because it has only 12 spam emails in the database.

I'm trying to figure out how I get the incoming mail to be scanned with the respective user's database.

12

Re: iRedMail trouble with antiSpamWithSieve

did you read these lines in tutorial?

# In iRedMail, SpamAssassin is called by Amavisd, so we must set it to be
# same as Amavisd daemon user:
#   - on Linux, it's user `amavis`.
#   - on FreeBSD, it's user `vscan`.
#   - on OpenBSD, it's user `_vscan`.
bayes_sql_override_username amavis