1

Topic: Veriy SpamAssassin

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 0.9.8
- Linux/BSD distribution name and version: Ubuntu 18.04
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MySQL
- Web server (Apache or Nginx): NginX
- Manage mail accounts with iRedAdmin-Pro? No
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

Hi

I have had my mailserver running for a few months now, and have tried to use SpamAssassin to minimize Spam. I followed this guide: https://docs.iredmail.org/store.spamass … n.sql.html - Successfully I believe, and I can see that the numbers of records in the bayes_token table is increasing:

mysql> SELECT COUNT(*) FROM bayes_token;
+----------+
| COUNT(*) |
+----------+
|   230412 |
+----------+
1 row in set (0.00 sec)

But how can I check and verify that new emails are being correctly scanned and that this is actually working?

I have added the spam status to the mail headers, below an email that correctly have been identified as spam - but is this becuase of learnings from the mails marked as spam or due to other factors? - How can I very that SpamAssassin is actually learning, are using the learnings, and maybe even adjust the importans when classifying spam mails.

Hope someone can help me. Thanks!

X-Spam-Flag: YES
X-Spam-Score: 4.461
X-Spam-Level: ****
X-Spam-Status: Yes, score=4.461 tagged_above=-100 required=2
    tests=[DKIM_SIGNED=0.1, HTML_IMAGE_RATIO_02=0.805, HTML_MESSAGE=0.001,
    REMOVE_BEFORE_LINK=1.587, SPF_PASS=-0.001, T_DKIM_INVALID=0.01,
    T_KAM_HTML_FONT_INVALID=0.01, URIBL_ABUSE_SURBL=1.948,
    URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: Veriy SpamAssassin

Better turn on debug mode in Amavisd for both Amavisd and SpamAssassin to verify it.
FYI: http://www.iredmail.org/docs/debug.amavisd.html

3

Re: Veriy SpamAssassin

I have updated log levels - but what am I looking for in the log afterwards?

4

Re: Veriy SpamAssassin

There's a lot of logging going on now, so without knowing what to look for it isn't easy. That said, I did found this one that worries me:

Sep 20 08:20:20 mail amavis[28151]: (28151-07) SA dbg: bayes: not available for scanning, only 2 spam(s) in bayes DB < 200

5

Re: Veriy SpamAssassin

SpamAssassin will start using bayes to judge spams when it got enough training.

6

Re: Veriy SpamAssassin

ZhangHuangbin wrote:

SpamAssassin will start using bayes to judge spams when it got enough training.

That makes sense, but when is "enough". In the log it now says "only 3 spam(s) in bayes DB < 200" - But I have 500 mails in my spam folder, and at least half of them have been manuel moved from the inbox - and by then forced learning.

Is the "3 spam(s)" a count for something else than mails?

7

Re: Veriy SpamAssassin

200 emails if i remembered correctly.

Learning same email multiple times is only calculated as one time.

8 (edited by Runberg 2018-09-25 18:20:13)

Re: Veriy SpamAssassin

ZhangHuangbin wrote:

200 emails if i remembered correctly.

This make sense with the log:

Sep 20 08:20:20 mail amavis[28151]: (28151-07) SA dbg: bayes: not available for scanning, only 2 spam(s) in bayes DB < 200

But I'm sure I have added way more than 2-3 mails from inbox to the junk/spam folder?! Obviously I haven't counted but minimum 100 mails have been moved for sure.

*Update:*
So I had a look in the database:

mysql> select * from bayes_vars;
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
| id | username                | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | oldest_token_age | newest_token_age |
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
|  1 | amavis                  |          3 |      2912 |      155549 |  1537831443 |          2764800 |               1629 |       1535063221 |       1537869763 |
|  2 | user1@mydomain.com |          1 |         0 |         123 |           0 |                0 |                  0 |       1532499417 |       1532499417 |
|  3 | user2@mydomain.com       |        200 |        18 |       18949 |           0 |                0 |                  0 |       1532507124 |       1537845479 |
|  4 | user3@mydomain.com       |        713 |         4 |       58495 |           0 |                0 |                  0 |       1534133423 |       1537858363 |
|  5 | user4@mydomain.com         |          0 |         3 |        1005 |           0 |                0 |                  0 |       1537809279 |       1537811821 |
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
5 rows in set (0.00 sec)

So it looks like the log when referring to number of spams it looks at ID 1 (amavis), but the learning is per user. Any ideas how to fix this?

9

Re: Veriy SpamAssassin

Do you have this line in /etc/mail/spamassassin/local.cf?

bayes_sql_override_username amavis

It's mentioned in our tutorial:
https://docs.iredmail.org/store.spamass … n.sql.html

10 (edited by Runberg 2018-09-25 20:06:55)

Re: Veriy SpamAssassin

ZhangHuangbin wrote:

Do you have this line in /etc/mail/spamassassin/local.cf?

Nope ... roll ... I'm afraid I had missed that one. I have updated the config and restarted amavis.

If I understand this correct, then it will look for tokens / spam at default user, but the learning is done per user I assume, so will this work? - Sorry, but I think I don't fully understand how this work. All help is truly appreciated!

I believe this is the guided used: https://forum.iredmail.org/topic8169-ir … assin.html

11

Re: Veriy SpamAssassin

Runberg wrote:

If I understand this correct, then it will look for tokens / spam at default user, but the learning is done per user I assume, so will this work? - Sorry, but I think I don't fully understand how this work. All help is truly appreciated!

As the parameter name says, it "override"s the user name, so it's always "amavis" user.

Runberg wrote:

I believe this is the guided used: https://forum.iredmail.org/topic8169-ir … assin.html

This is another way to call sa-learn to learn spam/clean messages.

12 (edited by Runberg 2018-09-26 14:45:34)

Re: Veriy SpamAssassin

As the parameter name says, it "override"s the user name, so it's always "amavis" user.

That is what I assumed. I have commented this out again in order to do the learning per user, and not with the default user.

This is another way to call sa-learn to learn spam/clean messages.

I double checked my configuration, and it seems like it has "disappeared". Either I'm looking the wrong place (/etc/dovecot/dovecot.conf) or more likely my custom configuration have been overwritten during and update of Dovecot sad
I'll try to walk through the guide again and see if I can restore a working setup.

13 (edited by Runberg 2018-09-26 16:22:11)

Re: Veriy SpamAssassin

I have now confirmed that when a mail is moved from the inbox to the spam folder, then numbers in the database is in fact going up for the specific user.

Before:

mysql> select * from bayes_vars;
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
| id | username                | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | 
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
|  1 | amavis                  |         29 |      2946 |      153707 |  1537917844 |          2764800 |               1963 | 
|  2 | user1@mydomain.com      |          1 |         0 |         123 |           0 |                0 |                  0 | 
|  3 | user2@mydomain.com      |        200 |        18 |       18949 |           0 |                0 |                  0 | 
|  4 | user3@mydomain.com      |        713 |         4 |       58495 |           0 |                0 |                  0 | 
|  5 | user4@mydomain.com      |          0 |         3 |        1005 |           0 |                0 |                  0 | 
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-

Now:

+----+------------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
| id | username                    | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | 
+----+-----------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
|  1 | amavis                         |         29 |      2958 |      155579 |  1537917844 |          2764800 |               1963 | 
|  2 | user1@mydomain.com  |          1 |         0 |         123 |           0 |                0 |                  0 | 
|  3 | user2@mydomain.com  |        200 |        18 |       18949 |           0 |                0 |                  0 | 
|  4 | user3@mydomain.com  |        719 |         4 |       58679 |           0 |                0 |                  0 | 
|  5 | user4@mydomain.com  |          0 |         3 |        1005 |           0 |                0 |                  0 | 
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-

But still it seems like Amavis is only looking into ID 1 (amavis) when scanning mails. Below from the log:

Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: database connection established
Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: found bayes db version 3
Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: Using userid: 1
Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: not available for scanning, only 29 spam(s) in bayes DB < 200

Any ideas to how I can get Amavis to look in the individual user row?

14

Re: Veriy SpamAssassin

Anyone who can help with this issue? How do I verify that mails actually are scanned based upon the learning from mails moved to Junk / Spam by the user? - Preferably the individual user.

As mentioned below I have confirmed that the spam count is increasing in the database for the individual user, but according to the log, I still have less than 200 spam mails, and therefore bayes are not looked at ...

Or am I misunderstanding something here?

Any help would be appreciated!

Thanks

Runberg wrote:

I have now confirmed that when a mail is moved from the inbox to the spam folder, then numbers in the database is in fact going up for the specific user.

Before:

mysql> select * from bayes_vars;
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
| id | username                | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | 
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
|  1 | amavis                  |         29 |      2946 |      153707 |  1537917844 |          2764800 |               1963 | 
|  2 | user1@mydomain.com      |          1 |         0 |         123 |           0 |                0 |                  0 | 
|  3 | user2@mydomain.com      |        200 |        18 |       18949 |           0 |                0 |                  0 | 
|  4 | user3@mydomain.com      |        713 |         4 |       58495 |           0 |                0 |                  0 | 
|  5 | user4@mydomain.com      |          0 |         3 |        1005 |           0 |                0 |                  0 | 
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-

Now:

+----+------------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
| id | username                    | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | 
+----+-----------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-
|  1 | amavis                         |         29 |      2958 |      155579 |  1537917844 |          2764800 |               1963 | 
|  2 | user1@mydomain.com  |          1 |         0 |         123 |           0 |                0 |                  0 | 
|  3 | user2@mydomain.com  |        200 |        18 |       18949 |           0 |                0 |                  0 | 
|  4 | user3@mydomain.com  |        719 |         4 |       58679 |           0 |                0 |                  0 | 
|  5 | user4@mydomain.com  |          0 |         3 |        1005 |           0 |                0 |                  0 | 
+----+-------------------------+------------+-----------+-------------+-------------+------------------+--------------------+-

But still it seems like Amavis is only looking into ID 1 (amavis) when scanning mails. Below from the log:

Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: database connection established
Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: found bayes db version 3
Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: Using userid: 1
Sep 26 10:03:42 mail amavis[24995]: (24995-20) SA dbg: bayes: not available for scanning, only 29 spam(s) in bayes DB < 200

Any ideas to how I can get Amavis to look in the individual user row?

15

Re: Veriy SpamAssassin

I suggest posting to Amavisd mailing list to get help from developers and other users:
https://amavis.org/#support