1

Topic: spam training as one user, but scanned and auto trained by another

==== REQUIRED BASIC INFO OF YOUR IREDMAIL SERVER ====
- iRedMail version (check /etc/iredmail-release): 0.9.8 MARIADB edition
- Linux/BSD distribution name and version: CentOS 7
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MySQL
- Web server (Apache or Nginx): Nginx
- Manage mail accounts with iRedAdmin-Pro? evaluating for purchase
- [IMPORTANT] Related original log or error message is required if you're experiencing an issue.
====

I'm experiencing a serious spam/ham filtering and training issue.

spam/ham data is trained and stored in SQL under the individual user names, such as "SomeEmail@MyDomain.com", using the sieve rules created with the iRedMail doc links.

This works for user training but this is not used for scanning incoming email. Instead, all incoming email seems to be scanned with and auto-trained under user "amavis" instead of "SomeEmail@MyDomain.com".

This means that all training errors can not be corrected by users, and incoming mail uses an error-filled Bayes database under "amavis".

from my /etc/mail/spamassassin/local.cf:

# Override the username used for storing data in the database.
# This could be used to group users together to share bayesian filter data.
# You can also use this config option to trick sa-learn to learn data as a specific user.
#bayes_sql_override_username vmail

I do not override this because I would like it to use individual user training, and indeed it does - when users train with IMAP sieve, it trains to their own DB.
But when email arrives, it does not use that username for scanning or auto-training, it uses "amavis".

Does anyone have this working properly?
Can anyone tell me what/where to set this so it uses the recipient SQL for Bayes scanning and auto-learning?

many many thanks

2

Re: spam training as one user, but scanned and auto trained by another

Many many hours researching this the past couple days - Still have not found how to solve this.

I'm really confused because if incoming email can not be scanned by its designated recipient's Bayes SQL db, then there is no point in trying to train anything. Even the iRedMail writeups I see refer to users training their own user Bayes SQL, but incoming email is still using the amavis user SQL to scan with.. But Why? that will never work..All that does is waste CPU and disk space for training that is never used anywhere.

Anyone?

3

Re: spam training as one user, but scanned and auto trained by another

elf wrote:

This works for user training but this is not used for scanning incoming email. Instead, all incoming email seems to be scanned with and auto-trained under user "amavis" instead of "SomeEmail@MyDomain.com".

Is this tutorial you followed?
https://docs.iredmail.org/store.spamass … n.sql.html

How did you verify that it's not used for scanning incoming email?

----

Does my reply help a little? How about buying me a cup of coffee ($5) as an encouragement?

buy me a cup of coffee

4

Re: spam training as one user, but scanned and auto trained by another

ZhangHuangbin wrote:

Is this tutorial you followed?
https://docs.iredmail.org/store.spamass … n.sql.html

How did you verify that it's not used for scanning incoming email?

It is not the exact tutorial I used as I have no use for the Roundcube plugin since the home users will likely almost never use Roundcube.  I did however do the entire first part of that (except Roundcube applicable things), with the exception of the last line in spamassassin/local.cf that says "bayes_sql_override_username amavis".

Since everyone here primarily uses IMAP clients, I was looking to enable the IMAP training, and I ended up trying to follow the updated User based antispam via Dovecot and Spamassassian (update) after looking through the original that you link to in the docs section (with deprecated dovecot anti spam plugin).

You talk in this recent updated thread of merging this into a future iRedMail release, but I noticed it does not seem to be in the current release, so I was attempting to do this because it looked like it was supposed to do what I was really looking for all this time: Per-User Training (Bayes) via IMAP.  It looked like the Sieve portion works fine for allowing users to train (or correct bad training) by moving messages, and in fact I have verified this. The dovecot sieve training works on a per-user basis. If a user moves an email using IMAP in/out of the Junk folder, the appropriate training occurs and can be verified by dumping the individual users Bayes "magic" (as you were kind enough to remind me before with the sa-learn -u username).

The problem appears to be that although individual users Bayes Corpus are trained individually by each user,  When email is inbound to the system, Amavis does not seem to scan it using the recipient's username to recall the appropriate users' Bayes SQL, even though in spamassassin/local.cf there is no bayes_sql_override_username defined. Instead, Amavis appears to always scan as/under user amavis, which SpamAssassin takes to mean a separate amavis user Bayes DB.  This is verified by viewing verbose logs generated by amavis and spamassassin (increased log level for this).  Scanning the incoming email is done using user amavis, who's Bayes DB does not contain enough spam/ham to enable Bayes, and this too is reflected in the logs. Even worse, some incoming emails are misclassified by the scan, and then auto-trained - again in the user amavis Bayes DB, so that future incoming scans will potentially be done with tainted Bayes when it finally receives enough bad "nspam" auto-training. It already has 800+ auto-trained ham (also tainted since the user train/correcting sieve routines listed here train individual user Bayes DBs). 

The ideal solution of course would be to have amavis/spamassassin scan and train incoming emails with the appropriate recipient's BayesDB, but I am completely at a loss figuring out how to do that, and the writeup on this (at the link above) seems to be an attempt at that (which also seems not to work because of the training vs scanning user DB mismatch.)

I'm not sure if this writeup I am following actually has a way to work, but as far as I can tell, as it currently exists it does not.

Can you tell me - Is there a way with iRedMail to have incoming email scanned by the appropriate user's BayesDB (per-user training+scanning)? How? Or is it only possible to have incoming email scanned by one predefined user's DB (in this case amavis, unless overridden by the bayes_sql_override_username parameter)? I hope this is not the case, but if it is, it would seem completely non-productive to have any training under user accounts as the writeup suggests, as the training would never be used, and it would be incapable of correcting any bad "global" learning under user amavis.

Am I missing something here? It seems hard to believe that I would be the first to notice this.. I really hope I am missing something..

Many thanks for your reply

5

Re: spam training as one user, but scanned and auto trained by another

FYI:

- https://lists.amavis.org/pipermail/amav … 01842.html
- https://lists.amavis.org/pipermail/amav … 01843.html

----

Does my reply help a little? How about buying me a cup of coffee ($5) as an encouragement?

buy me a cup of coffee

6

Re: spam training as one user, but scanned and auto trained by another

Yes! That seems to be it! It took me all day, but after some puzzling, that second link led me to the right things I needed to get this working. In particular:

- In the policy table, the last field is called "sa_username". This is the user that SA will be run as and will be used in the SA per-user SQL config.
- Don't use the sa_userconf field. Leave it empty, set it to NULL,
whatever.
- In the users table you can put username (email address, email domain)
together with a policy_id (and priority), so that the correct policy is
used when an email is received.

The answer seems to lie in the SQL DB for amavis. I have never used SQL before, so to keep me from doing something stupid I opted to install MySQL Workbench on my laptop, use it to SSH into the server to then do a local DB connection (avoid firewall issues this way too). From there I explored the various SA and amavis tables, and figured out that it was the amavis.user and amavis.policy, and that they were linked by reference of amavis.policy id <-> amavis.user policy_id. 

So I duplicated some records of amavis.policy to copy over the included handling policies, incremented the id fields, and then changed the policy_name fields to the email addresses of the users, as well as added the email addresses to the sa_username fields. I left all the other duplicated fields alone to the defaults that existed for the @. entry.

Then I went to the amavis.users table and made new rows for the users, again incrementing each id, assigned unique priorities greater than the @. entry to process them first (not sure if they need to be unique or not, but did that to be sure I wasn't going to cause issues), entered the email addresses for each, and entered a matching policy_id for each to refer to the appropriate amavis.policy record for each user.

And magically, incoming email started scanning (and auto-training) with each individual recipients own Bayes DB in the SA database!

Now because of what I just did, I have 2 important iRedAdmin-Pro questions:

1- I did read that iRedAdmin-Pro can modify the policies. If I use iRedAdminPro, will it then use the records I created in the amavis.policy and be okay? Or will it not work well with the changes I made?

2- Since I think I just found that some of what I was setting up (policies) overlaps what iRedAdmin-Pro does, I am wondering - Can iRedAdmin-Pro handle creating these policy and user tables to do all this setup automatically? Or would I still need to manually enter in SQL records under the amavis.users and amavis.policy tables in order to complete a per-user Bayes setup such as this?

I still have i think 4 more email accounts to do this for. Fortunately my family (this is a home server) isn't too big, but automating this or being able to do it from a control panel would be so much better than having to learn or remember all of this for such changes.

Lastly, I have one possibly non-iRedAdmin-Pro question, but it is certainly an iRedMail question having to do with the Per-User Bayes setup. Obviously any user that does NOT have a amavis.user entry and mapped amavis.policy entry will not have incoming mail checked by an individual per-user Bayes DB. It appears the catch-all for that is still the user amavis Bayes DB.  So if spam or ham is misclassified for those users, there appears to be no way to train/correct it. Is there a solution to this? also does this problem affect incoming aliases of a user who has entries in the DB for per-user Bayes?

Thank you Thank you for pointing me in the right direction. I can't believe how many days I have put into trying to make this all work! I was almost ready to give up on this solution.

7

Re: spam training as one user, but scanned and auto trained by another

elf wrote:

1- I did read that iRedAdmin-Pro can modify the policies. If I use iRedAdminPro, will it then use the records I created in the amavis.policy and be okay? Or will it not work well with the changes I made?

Not sure, need tests.

*) iRedAdmin-Pro queries amavisd.{user,policy}, but it requires column "policy.name" to be one of '@.' (catch-all), '@domain.com' (per-domain) or "full email" (per-user), to indicate different policies for different accounts. if you follow this rule, iRedAdmin-Pro should work fine with it.

*) iRedAdmin-Pro doesn't manage the "policy.sa_username" column, so you cannot set per-user sa username with iRedAdmin-Pro.

elf wrote:

2- Since I think I just found that some of what I was setting up (policies) overlaps what iRedAdmin-Pro does, I am wondering - Can iRedAdmin-Pro handle creating these policy and user tables to do all this setup automatically? Or would I still need to manually enter in SQL records under the amavis.users and amavis.policy tables in order to complete a per-user Bayes setup such as this?

Currently you need to manually create them with iRedAdmin-Pro, e.g. go to user profile page, tab "Spam Policy", then set the policy.

elf wrote:

Lastly, I have one possibly non-iRedAdmin-Pro question, but it is certainly an iRedMail question having to do with the Per-User Bayes setup. Obviously any user that does NOT have a amavis.user entry and mapped amavis.policy entry will not have incoming mail checked by an individual per-user Bayes DB. It appears the catch-all for that is still the user amavis Bayes DB.  So if spam or ham is misclassified for those users, there appears to be no way to train/correct it. Is there a solution to this? also does this problem affect incoming aliases of a user who has entries in the DB for per-user Bayes?

No solution.

OK, time to raise my question for you:

If you train it with more emails, the better result you get. You mentioned this is "a home server", i suppose it's safe to guess that this server doesn't accept many emails everyday, and only few of them will be feed to train. In this case, the training result may be not good enough in a long time. And if this is a home server, why not simply use a catch-all user for training to get better result?

----

Does my reply help a little? How about buying me a cup of coffee ($5) as an encouragement?

buy me a cup of coffee

8

Re: spam training as one user, but scanned and auto trained by another

Thanks Zhang,

I don't know how many emails per-day is [normal/few/a lot] for a home server, but this is the middle section of the last LogWatch in the postmaster account (last tuesday).

--------------------- Amavisd-new Begin ------------------------ 

      478   Total messages scanned ------------------  100.00%
    9.130M  Total bytes scanned                      9,573,131
 ========   ==================================================
 
      478   Passed ----------------------------------  100.00%
      238     Spam passed                               49.79%
        2     Spammy passed                              0.42%
      238     Clean passed                              49.79%
 ========   ==================================================
 
      240   Spam ------------------------------------   50.21%
        2     Spammy passed                              0.42%
      238     Spam passed                               49.79%
 
      238   Ham -------------------------------------   49.79%
      238     Clean passed                              49.79%
 ========   ==================================================
 
 
 
 **Unmatched Entries**
        2   No ext program for   .lrz, tried: lrzip -q -k -d -o -, lrzcat -q -k
 
 ---------------------- Amavisd-new End ------------------------- 

 
 --------------------- Postfix Begin ------------------------ 

      448   Miscellaneous warnings  
 
    9.062M  Bytes accepted                           9,501,742
    9.327M  Bytes delivered                          9,779,741
   10.887K  Bytes forwarded                             11,148
 ========   ==================================================
 
      238   Accepted                                    99.58%
        1   Rejected                                     0.42%
 --------   --------------------------------------------------
      239   Total                                      100.00%
 ========   ==================================================
 
        1   5xx Reject unknown user                    100.00%
 --------   --------------------------------------------------
        1   Total 5xx Rejects                          100.00%
 ========   ==================================================
 
       22   4xx Reject HELO/EHLO                         1.30%
     1674   4xx Reject recipient address                98.70%
 --------   --------------------------------------------------
     1696   Total 4xx Rejects                          100.00%
 ========   ==================================================
 
     1796   Connections             
        4   Connections lost (inbound) 
     1797   Disconnections          
      481   Removed from queue      
      239   Delivered               
        3   Forwarded               
     5347   Postscreen              
 
        7   Timeouts (inbound)      
      590   Hostname verification errors (FCRDNS) 
      138   TLS connections (server) 
 
 
 ---------------------- Postfix End ------------------------- 

 
 --------------------- Connections (secure-log) Begin ------------------------ 

 
 **Unmatched Entries**
    gdm-password]: gkr-pam: unlocked login keyring: 2 Time(s)
    polkitd: Registered Authentication Agent for unix-process:24122:51654372 (system bus name :1.21156 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8): 1 Time(s)
    polkitd: Registered Authentication Agent for unix-process:24394:51654734 (system bus name :1.21158 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8): 1 Time(s)
    polkitd: Unregistered Authentication Agent for unix-process:24122:51654372 (system bus name :1.21156, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus): 1 Time(s)
    polkitd: Unregistered Authentication Agent for unix-process:24394:51654734 (system bus name :1.21158, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus): 1 Time(s)
 
 ---------------------- Connections (secure-log) End ------------------------- 

For some reason I have not received any since about a week ago, instead I get:

/etc/cron.daily/0logwatch:

postdrop: warning: uid=0: File too large
sendmail: fatal: root(0): message file too big

To answer your question about why I was wanting the per-user training: I was thinking that if one or more users are not good about training or correcting the auto-training of incoming email, this would likely affect the other users results negatively (myself included). I thought this may be a concern for instance with kids email accounts. I am pretty sure that the amavis user already has some "bad" training from incoming things that were initially scanned/auto-trained that were never corrected, and have subsequently been deleted from the accounts so that they can't be reclassified. I was under the impression that a common Bayes db will only be effective if everyone corrects auto-train and trains appropriately. Do you think my approach of keeping them separate is the wrong way to go? 

The only disadvantage I can think of at the moment is for users that receive very little email (theres 2 or 3 "vanity accounts", so to speak) or in fact, for any user that I do not make mailbox setup a multi step process (iRedAdmin GUI add, manual database add (amavis.users, amavis.policy))- these accounts may suffer from poor spam filtering because Sieve training trains the per-user db but filtering will still be from the (now uncorrected) auto trained "amavis" user db.

9

Re: spam training as one user, but scanned and auto trained by another

oh.. looks like my logwatch issue is all the SA debugs haha. I better disable that now that i know whats going on.

10

Re: spam training as one user, but scanned and auto trained by another

elf wrote:

      478   Total messages scanned ------------------  100.00%

With 478 emails per day, the traffic is not so low. you can keep the per-user setting if you want. smile

----

Does my reply help a little? How about buying me a cup of coffee ($5) as an encouragement?

buy me a cup of coffee