"Fossies" - the Fresh Open Source Software Archive
Member "amavisd-new-2.11.1/README_FILES/README.performance" (12 Aug 2005, 13284 Bytes) of package /linux/misc/amavisd-new-2.11.1.tar.bz2:
As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard
) with prefixed line numbers.
Alternatively you can here view
the uninterpreted source code file.
1 This file README.performance is part of the amavisd-new distribution,
2 which can be found at http://www.ijs.si/software/amavisd/
4 Updated: 2002-05-13, 2002-08-01, 2003-01-09, 2005-01-19
7 Here are some excerpts from my mail(s) on the topic of performance.
12 | What I use now is FreeBSD+Postfix+amavisd+Sophie,
14 Good choice in my opinion. (P.S.: add clamd to the mix)
16 Hopefully hardware matches expectations,
17 fast disks and enough memory are paramount.
19 You may want to put Postfix spool on different disk than /var/amavis,
20 where amavisd does mail unpacking.
22 | is there any suggested configuration for this
23 | environment? Especially if my server is a high loaded
24 | busy mail hub/gateway? Any parameters for performance tuning?
26 | Do I need to increase this number to fit a busy server?
27 | Or any other related parameters should I notice?
29 How many messages per day are we talking about?
31 Both the amavisd child processes, and (to much lesser degree) the Postfix
32 smtpd services consume quite some chunks of memory, so the memory size
33 can determine how many parallel processes you can run.
35 Note that the Perl interpreter in amavisd-new processes occupies the same
36 memory if fork on a Unix system uses copy-on-write for memory pages,
37 as most modern Unixes do. This however does not apply to memory allocated
38 after the child processes have forked.
40 I would start small, e.g. by 2 or 3 child processes per CPU
41 (parameter $max_servers), then see how machine behaves.
42 If you see heavy swapping or load regularly going beyond 2 or 3 (per CPU),
43 decrease the number of parallel streams, otherwise increase it - gradually.
44 This number is probably the most important tuning parameter.
45 Going beyond 10 usually brings no more improvements in overall system
46 throughput, it just wastes memory.
48 If this does not come close to your needs, you may want to place
49 amavisd-new with Sophie on a different host than Postfix.
50 They talk via SMTP so there is no particular advantage in having
51 both MTA and amavisd on the same host.
53 Actually there are now three quite independent modules,
54 which can share the same host, or not:
55 incoming Postfix (MTA-IN) -> amavisd+Sophie -> outgoing Postfix (MTA-OUT)
57 Both MTA-IN and MTA-OUT can be the same single Postfix, but need not be.
58 If you decide to split MTA-IN and MTA-OUT, you can position
59 one of them on the same host as amavisd, although I guess it
60 would be better to either have three boxes, or have MTA-IN
61 and MTA-OUT be a single Postfix, as in the normal setup,
62 while optionally moving amavisd+Sophie to a different host.
64 As amavisd-new is just a regular SMTP server/client to Postfix,
65 one can use the usual load sharing mechanisms as available for
66 normal mail delivery, like having multiple MX records for the
67 content filter (applies to feeding amavisd by the Postfix
68 service smtp, but not to lmtp which does not care for MX).
72 | I would like to know the possibility of email loss? Especially
73 | under unawareness! What if amavisd or Sophie suddenly/abnormally
74 | terminated? Is there any recovery procedures should be take?
76 Mail loss should not be possible (except with disk failure holding
77 MTA spool directories). I am continually testing some awkward situations
78 like disk full, process restarts, child dies, even programming errors :) ...
79 Amavisd never takes the responsibility for mail delivery away from MTA,
80 it just acts as an intermediary between MTA-IN and MTA-OUT.
81 Only when MTA-OUT confirms it has received mail, the MTA-IN does
82 a SMTP session close-down with a success status code. All breakdowns
83 and connection losses are handled by MTA, and Postfix is very good
84 in doing it in a reliable way.
86 The only cause of concern is DoS in some unpackers. This part of code
87 in amavisd-new is still mostly the same as in the amavisd version,
88 and although it does exercise some care, there is still a lot
89 to be desired.
91 Let me tell a heretic secret: if your AV scanner (e.g. Sophie)
92 can handle all archives used by current viruses (except MIME decoding,
93 which is done by amavisd), it is reasonably safe, good and fast
94 to set $bypass_decode_parts to 1 (see amavisd.conf).
96 And more: later Postfix versions can do the MIME syntax checking
97 and enforce 7bit header RFC 2822 requirements (see parameters like:
98 $ postconf | egrep 'mime|bit' ) so you can block invalid MIME
99 even before it hits the MIME::Parser Perl module.
101 Instead of wasting 5 minutes for some particularly nasty archive,
102 Sophie can do it in 5 seconds !!! I have yet to see a virus (in the
103 wild) that Sophos would ONLY detect if first unpacked by amavisd.
104 (P.S. not always true, but most of the time this is so)
106 This does not take care of manual malicious intents,
107 but one can always bring in a virus on a floppy, or download it
108 some other way (e.g. PGP encrypted), if one really wants to.
112 See article by Cor Bosman for a high-end installation:
116 Limit the number of AV-scanning processes, don't let MTA run
117 arbitrary number of AV-scanning processes (P.S. this is easy to ensure with
118 Postfix, hard to do with pre-queue content filtering like sendmail milter
119 or Postfix smtp proxy). Also limiting based on CPU load (like in sendmail)
120 is not a good idea in my opinion - set the fixed limit based on the number
121 of concurrent AV-checking processes you host (memory,disk,cpu) can handle,
122 not on the current load or mail rate, otherwise when the situation goes
123 bad, it is more likely it will go bad all the way - disk and memory
124 thrashing is the last thing you desire when load goes high.
127 | I have a question about how to distribute amavisd-new directories across
128 | different disks for optimal performance. There are usually 4 directories
129 | in the amavisd-new mail path.
130 | 1) The amavis TEMPBASE directory (Where incoming emails are scanned)
131 | 2) The postfix queue directory
132 | 3) The directory for amavis and mail system logs
133 | 4) The directory where mail is delivered.
134 | What would be the best distribution of these directories over multiple disks?
135 | Obviously, having each one on a different disk would be best. However, if
136 | you only have 3 disks to use, which two services should be combined? If you
137 | have only two disks, which services should be put together?
139 !!! Let amavisd-new log via syslog and make sure your syslogd does
140 not call flush for every log entry !!! (as Linux does by default,
141 but is configurable per log file)! This way the disk with log files
142 becomes non-critical.
144 The disk with Postfix mail queue is likely to be most heavily
145 beaten by file creates/deletes. I would put it on its own disk.
147 The $TEMPBASE (amavis work directory) is probably not as heavily
148 exercised (in the SMTP-in/SMTP-out amavisd-new setup, as with Postfix),
149 unless your mail messages often contain many MIME parts that need
150 to be decoded. If you can afford it, it can even reside on a
151 RAM disk / tempfs or with delayed-syncing without risking any mail loss.
154 Perl running in Unicode mode is reported to be noticably slower
155 than otherwise. It is wise to disable it, e.g. by setting environment
156 variable LC_ALL=C before starting amavisd on systems where this
157 is not a default (Linux RedHat 8.0).
159 See also 'Speed up amavisd-new by using a tmpfs filesystem for $TEMPDIR'
160 at http://www.stahl.bau.tu-bs.de/~hildeb/postfix/amavisd_tmpfs.shtml
161 by Ralf Hildebrandt
164 | define(`confMAX_DAEMON_CHILDREN', 20)
165 | should we limit MaxDaemonChildren in MTA-RX ? ... what would be a magic
166 | formula to define it? I assume it should be based on the number of
167 | amavisd-new child processes (which should match queue runners)
168 | and Max No. of msgs per connection ?
170 Here the charm of dual-sendmail setup (or Postfix setup) is most apparent.
172 The MaxDaemonChildren sendmail option is almost completely independent
173 from the number of amavisd-new child processes.
175 The MaxDaemonChildren in MTA-RX should be sufficiently large so that
176 most of the time all incoming mail connections can each get its own
177 sendmail process which is willing to accept the mail trickle.
178 These smtp server processes are relatively lightweight (hopefully
179 sharing the program code in memory), so they don't cost much.
180 The upper limit is the number of sendmail receiving processes
181 the host can comfortably handle, including disk I/O they produce.
182 One may set this value high and observe the usual number of incoming
183 parallel SMTP sessions during normally busy hours, then set the limit
184 comfortably above that value.
185 This applies to Postfix as well (maxproc for smtpd service on port 25).
187 The number of amavisd-new child processes and the number of
188 queue runners is another matter. Since content filtering (especially
189 with SA enabled) is CPU and memory intensive, the number of content
190 filtering processes is limited by the host power and its memory.
191 Never have this number so high that swapping occurs, or that
192 the time for each individual mail check gets too large, say over
193 a couple of seconds. Long content checking times can also increase
194 the locking contention on the SA Bayes database. P.S.: It is advised to
195 move Bayes database to a SQL server, it need not be on a separate host.
197 A very rough rule of thumb may be that the MaxDaemonChildren
198 can easily be 10 times the number of content filtering processes.
201 > How did you determine this optimal number of child processes?
202 > Is there a nice scientific way to do it or via simple trial and error.
204 Measure and plot a diagram of maximum sustained mail troughput (msgs/h)
205 for a couple of values of max_proc (and a matching value of $max_servers).
206 About 5 or 10 strategically placed data points could already give a useful
207 picture. Measuring throughput for each max_proc value takes a restart of
208 amavisd and a postfix reload (and possibly: postfix flush), the measuring
209 period should last for say 10 minutes or preferably more, assuming that
210 the supply of mail to be processed does not run out and keeps mail processing
211 saturated, i.e. that there are plenty of mail messages in the mail queue
212 waiting to be processed by the content filter.
214 An opportunity for such an experiment on a production machine arises
215 when some backlog of mail accumulates, e.g. after a network outage
216 or some other problem occurred that stopped mail flow. It is certainly
217 possible to create a synthetic mail traffic. For mail sink (if needed)
218 one could use the src/smtpstone/smtp-sink.c from a Postfix distribution,
219 but for generating synthetic mail the smtp-source.c is not realistic
220 regarding the mail contents, so it is best to use a real ham/spam/viruses
221 mail mix, either on a running system, or mail collected and saved
222 for the purpose during normal business hours.
224 The exact mail rate can be deduced from the mail log for each measurement
225 period. I usually choose to do a plot of cumulative message count vs.
226 wall-clock time, which makes it easier to find the slope and ignore startup
227 or other anomalies which stand out more obviously in the plot.
229 A good enough mail rate fugure is given by the amavisd-agent program,
230 which for the first couple of screens shows the counter averages since
231 the start of amavisd (which is probably what we need here), then after
232 about 2.5 minutes its starts reporting exact last 5 minute running averages.
233 The usual counter of interest is the 'InMsgs' (which also happens to be the
234 same as the 'CacheAttempts', which is the first counter on the reported
237 One should get a diagram like:
240 | | * * * *
241 | * | *
242 | * |
243 | * best
245 -----------------------------> max_proc
247 The optimum max_proc is where the function starts to level off,
248 it gives the best throughput at a minimum waste of memory
249 for excessive processes that gain no benefit.
251 > We're running dual 3.0Ghz xeons (64bit fedora core 3) with 2GB or RAM -
252 > what do you think the optimal number should be?
253 > We have local copies of surbl.org and a local dcc server. Not using razor.
255 I wouldn't dare to guess, you just have to measure it.
256 It depends on too many factors. The more network-based high latency
257 SA tests you have enabled, the higher would be the optimum max_proc value.
258 The exact spot varies over daily network conditions and usage patterns,
259 so don't bother to narrow it down too exactly.
263 LDAP lookups note from Michael Hall in reply to Matt Juszczak on Aug 12, 2005:
265 > >You might try to make better indexes on the LDAP server before
266 > >upgrading the hardware.
268 > Yep, tried this yesterday :) that was the problem. Added another index
269 > for mailRoutingAddress (I already had mailLocalAddress created but I
270 > guess the mailRoutingAddress needed one too) and now we're experiencing
271 > instant mail delivery with no queues :)
273 You definitely want to make sure you have indexes on any attribute used
274 in searches, it can make a huge difference as you found out. If you're
275 using OpenLDAP and 'bdb' databases you also want to make sure and
276 configure the Berkely DB environment with a DB_CONFIG file, and use
277 db_stat to check things. Below is an excerpt from one of our mail
278 servers at work:
279 $ db_stat-4.2 -h /var/db/openldap-data -m
281 Correctly sizing the cache can make a big difference as answers can be
282 pulled from it vs accessing the disks.