Russell Coker’s Documents

12 Jun

Postal

This is a program I wrote to benchmark SMTP servers. I started work on this because I need to know which mail server will give the best performance with more than 1,000,000 users. I have decided to release it under the GPL because there is no benefit in keeping the source secret, and the world needs to know which mail servers perform well and which don’t!

At the OSDC conference in 2006 I presented a paper on mail relay performance based on the new BHM program that is now part of Postal.

I have a Postal category on my main blog that I use for a variety of news related to Postal. This post (which will be updated periodically) will be the main reference page for the software. Please use the comments section for bug reports and feature requests.

It works by taking a list of email addresses to use as FROM and TO addresses. I originally used a template to generate the list of users because if each email address takes 30 bytes of storage then 3,000,000 accounts would take 90M of RAM which would be more than the memory in the test machine I was using at the time. Since that time the RAM size in commodity machines has increased far faster than the size of ISP mail servers so I removed the template feature (which seemed to confuse many people).

When sending the mail the subject and body will be random data. A header field X-Postal will be used so that procmail can easily filter out such email just in case you accidentally put your own email address as one of the test addresses. ;)

I have now added two new programs to the suite, postal-list, and rabid. Postal-list will list all the possible expansions for an
account name (used for creating a list of accounts to create on your test server). Rabid is the mad Biff, it is a POP benchmark.

Postal now adds a MD5 checksum to all messages it sends (checksum is over the subject and message body including the “\r\n” that ends each line of text in the SMTP protocol). Rabid now checks the MD5 checksum and displays error messages when it doesn’t match.

I have added rate limiting support in Rabid and Postal. This means that you can specify that these programs send a specific number of messages and perform a specific number of POP connections per minute respectively. This should make it easy to determine the amount of system resources that are used by a particular volume of traffic. Also if you want to run performance analysis software to determine what the bottlenecks are on your mail server then you could set Postal and Rabid to only use half the maximum speed (so the CPU and disk usage of the analysis software won’t impact on the mail server).

I will not release a 1.0 version until the following features are implemented:


  • Matching email sent by Postal and mail received by BHM and Rabid to ensure that each message is delivered correctly (no repeats and no corruption)

  • IMAP support in Rabid that works

  • Support for simulating large numbers of source addresses in Postal. This needs to support at least 2^24 addresses so it is entirely impractical to have so many IP addresses permanently assigned to the test machine.

  • Support for simulating slow servers in Postal and BHM (probably reducing TCP window size and delaying read() calls)

  • Making BHM simulate the more common anti-spam measures that are in use to determine the impact that they have on list servers

  • Determining a solution to the problem of benchmarking DNS servers. This may mean just including documentation on how to simulate the use patterns of a mail server using someone else’s DNS benchmark, but may mean writing my own DNS benchmark.

Here are links to download the source:


  • postal-0.70.tgz - tidied up the man pages and made it build without SSL support.

  • postal-0.69.tgz - fixed some compile warnings, and really made it compile with GCC 4.3

  • postal-0.68.tgz - fixed some compile warnings, made it compile with GCC 4.3, and I think I made it compile correctly with OpenSolaris.

  • postal-0.67.tgz - changed the license to GPL 3

  • postal-0.66.tgz - made GNUTLS work in BHM and added MessageId to Postal.

  • postal-0.65.tgz - significant improvement, many new features and many bugs fixed!

  • postal-0.62.tgz - Slightly improved the installation documents and made it build with GCC 3.2.

  • postal-0.61.tgz - version 0.61. Fixed the bug with optind that stopped it working on BSD systems, and a few other minor bugs.

  • postal-0.60.tgz - version 0.60. Fixed the POP deletion bug, made it compile with GCC 3.0, and added logging of all network IO to disk.

  • postal-0.59.tgz - version 0.59.

  • postal-0.58.tgz - version 0.58. Added some new autoconf stuff, RPM build support, and the first steps to OS/2 and Win32 portability.

  • postal-0.57.tgz - version 0.57. Fixed lots of trivial bugs and some BSD portability issues.

  • postal-0.56.tgz - version 0.56. Added Solaris package manager support. Made it compile without SSL. Added heaps of autoconf stuff.

  • postal-0.55.tgz - version 0.55. Made Rabid work with POP servers that support the CAPA command. Fixed some compile problems on Solaris.

  • postal-0.54.tgz - version 0.54. Added a ./configure option to turn off name expansion (for systems with buggy regex). Fixed a locking bug that allowed Rabid to access the same account through two threads.

  • postal-0.53.tgz - version 0.53. Don’t use NIS domain name etc for SMTP protocol.

  • postal-0.52.tgz - version 0.52. Better portability with autoconf.

  • postal-0.51.tgz - version 0.51. Supports compiling without SSL and some hacky Solaris support.

  • postal-0.50.tgz - version 0.50. Adds SSL support to Postal (Rabid comes next).

12 Jun

SE Linux Magic

Here is a complete list of entries for /etc/magic related to SE Linux.

# SE Linux policy database for Fedora versions less than 5, RHEL 4, and Debian before Etch
# http://doc.coker.com.au/computers/selinux-magic
0      lelong  0xf97cff8c      SE Linux policy
>16    lelong  x              v%d
>20    lelong  1      MLS
>24    lelong  x      %d symbols
>28    lelong  x      %d ocons

# SE Linux policy modules *.pp reference policy for Fedora 5 to 9,
# RHEL5, and Debian Etch and Lenny.
# http://doc.coker.com.au/computers/selinux-magic
0      lelong  0xf97cff8f      SE Linux modular policy
>4      lelong  x      version %d,
>8      lelong  x      %d sections,
>>(12.l) lelong 0xf97cff8d
>>>(12.l+27) lelong x          mod version %d,
>>>(12.l+31) lelong 0          Not MLS,
>>>(12.l+31) lelong 1          MLS,
>>>(12.l+23) lelong 2
>>>>(12.l+47) string >\0        module name %s
>>>(12.l+23) lelong 1          base

# for SE Linux policy source for reference policy
# http://doc.coker.com.au/computers/selinux-magic
0      string  policy_module(  SE Linux policy module source
1      string  policy_module(  SE Linux policy module source
2      string  policy_module(  SE Linux policy module source

0      string ##\ <summary>    SE Linux policy interface source

0      search  gen_context(    SE Linux policy file contexts

0        search        gen_sens(        SE Linux policy MLS constraints source

07 Jun

How to Debug POP

POP (Post Office Protocol) is the most used protocol for receiving mail from a server to a MUA (Mail User Agent) for reading. It is specified in RFC1939.

But the way it works (in most cases) is quite simple and doesn’t require reading the RFC, connect to port 110 (the standard port for POP3) and a basic session transcript is as follows (data sent by the client is prefixed with C: and data sent by the server is prefixed with S:):
S:+OK POP3 Ready [HOST NAME]
C:user ABC
S:+OK USER ABC set, mate
C:pass asecret
S:+OK Mailbox locked and ready
C:list
S:+OK scan listing follows
S:1 2989
S:.
C:quit
S:+OK

When the server successfully completes an operation it will precede it’s response with “+OK“, when it fails it will precede it’s response with “-ERR“. The data after the OK or ERR statement is for humans not machines, so in most cases your MUA will discard it. Therefore connecting to the service manually is required to properly debug problems. The unfortunate thing is that often on big mail servers it takes time for the sys-admin to do such tests. If the user can do it for them and give a bug report saying “your POP server said -ERR user unknown” then things will get fixed a lot faster than if the report is “the POP server didn’t work”.

One thing that is quite important is the initial greeting string, on any system of moderate size you will have multiple back-end servers and the greeting will tell you which server you are connecting to. If POP sometimes works and sometimes fails then your ISP might have one server failing so making a note of this greeting string in a transcript of a failed session can really help in tracking down problems.

When the list of messages is displayed, the first column is message numbers (starting at one and going up sequentially) and the second column is message sizes. If you have a POP session timing out and you have an extremely large message then that might be the cause.

A commonly used program for testing POP (and other Internet services) is telnet. So start the above process you would type telnet mail.example.com 110.

There are methods of hashing POP passwords (which make things a little more complex), but they often aren’t used - and in any case don’t encrypt the data. So it’s common to run POP servers with SSL, and the standard port for this is 995. This makes testing a little more complicated (but actually no more difficult).

To make an SSL connection you can use the program stunnel, it is included in many (most?) distributions of Linux, and Windows binaries are apparently at this link (NB I’ve never tested the Windows binaries as I don’t use Windows).

The command stunnel -c -r mail.example.com:995 will connect you to your mail server via SSL and you can then type in the POP commands as normal.

06 Jun

Log Tools

The Logtools package contains a number of programs for managing log files (mainly for web servers).


  • clfmerge will merge a number of Common Logfile Format web log files into a single file while also re-ordering them in a sliding window to cope with web servers that generate log entries with the start-time of the request and write them in order of completion.

  • logprn operates like tail -f but will (after a specified period of inactivity) spawn a process and write the new data in the log file
    to it’s standard input.

  • clfsplit will split up a single CLF format web log into a number of files based on the client’s IP address.

  • funnel will write it’s standard-input to a number of files or processes.

  • clfdomainsplit split a CLF format web log containing fully qualified URLs (including the host name) into separate files, one for each host.

Download:

05 Jun

Polyinstantiation of directories in an SE Linux system

Notes

I presented this paper at the 2006 SAGE-AU conference.

Abstract

This paper describes the problems related to shared directories such as /tmp and /var/tmp as well as problems related to having multiple SE Linux security contexts used for accessing a single home directory. It then provides detailed information on the solution to this problem that has been implemented with polyinstantiated directories by using the pam_namespace module.

Introduction

It is a long-standing Unix tradition that the directories /tmp and /var/tmp are used for temporary storage by all programs and on behalf of all users. This used to not be considered a problem, however in recent times it has been recognised that the use of such a shared directory is vulnerable to race-condition attacks with symbolic links.

Another problem is that in some situations a file name may convey secret information. If the file in question is in a public directory such as /tmp or /var/tmp (which may be an unintended result of a command by the user) then this will represent an information leak if there are any less privileged processes running on the machine.

Past attempts to deal with these problems have included restrictions on creating sym-links and hiding file names, which have both been inadequate. The solution chosen for use with SE Linux (which is also designed to work without SE Linux) is to have polyinstantiated directories based on Unix account name and/or SE Linux context. This means that every user will see a different version of the directory in question based on their context.

In the past this feature has been implemented as part of Multi-Level Security (Dr. Rick Smith [HREF2]) systems under the name multi-level directories. I believe that the multi-level directory variant of this solution was based on file system support, while the Linux support for this type of operation that I will describe is based in the VFS layer and thus does not require modification to any of the file systems that may be used.

Summary of Attacks that can be Prevented by Poly-Instantiated Directories

In this paper I am considering the following attack scenarios:

  1. Attack by user on user (including the case of a non-PI user as attacker or victim)

  2. Attack by user on daemon (including the case of a non-PI user as attacker)

  3. Attack by non-root daemon on user

  4. Attack by root daemon on user (will always succeed without SE Linux)

Each of the above four attack scenarios may occur with one of the following three attacks:

  1. Race-condition attacks on the integrity of processes and data (sym-link attacks, race conditions on renaming objects, or pre-creating a file to take ownership of data)

  2. Leaks of confidential data via secrets in file names

  3. Denial Of Service (DOS) attacks based on race conditions and pre-allocating file/directory names

Other Solutions

One attempt at solving this problem that has been implemented in some Linux security systems is to hide file names. This can work as long as it is not possible to guess any of the file names in question. If the file name can be guessed then the hostile party can attempt to create a new file of the same name, failure to create the file in question indicates existence. But this only solves the problem of secret data in file names.

Another partial attempt at dealing with this problem is controlling the ability to create hard-links and/or sym-links to try and prevent race conditions. A well-known implementation of this is in the OpenWall kernel patch [HREF3] which prevents the user from creating hard-links to files to which they have no write access and from creating sym-links in a +t directory (a directory such as /tmp or /var/tmp) which point to a file that they don’t own. It also prevents writing to named pipes in +t directories which are owned by a different user. This deals with some of the issues related to race-condition attacks but there are potential issues that it does not address, such as a hostile user creating sym-links to their own files to divert output or creating a file with no write permissions as a denial of service against a program that uses a fixed file name.

But this only deals with the case of race conditions used to attack system integrity. It does not prevent DOS attacks or protect secret data when it is used in a file name.

SE Linux Requirements for Shared Directories

SE Linux does not attempt to hide file names in a directory, if the name of a file contains secret data then this can be a security problem on shared directories such as /tmp and /var/tmp, this is an issue that has to be solved outside of the core SE Linux code base.

The SE Linux strict and mls policies provide good protection against most race condition attacks. Most domains are not permitted to create hard links to privileged files (types such as etc_t). Daemons are all protected from sym-link attacks by each other due to being denied access to sym-links created by other daemons and by users, and users are given similar protection against attack by daemons (both root and non-root). The main benefit for PI directories in strict and mls SE Linux systems is for protection against users attacking other users, in most cases large numbers of users will have the same SE Linux domain and therefore there will not be any effective protection against such attacks in the domain-type model (the integrity protection part of SE Linux).

When a Unix account is associated with more than one SE Linux context it is necessary to have multiple instances of the home directory to match the SE Linux context. If there is only one instance of the home directory and different SE Linux contexts are used for user logins then one of the contexts may be denied access to shared files such as .bashrc and .bash_history, or they may serve as information leaks. This use creates a requirement for PI home directories in SE Linux that does not exist for non-SE systems.

The problem of multiple logins with different contexts can occur in the older version of SE Linux (known as the example policy) that was used in Red Hat Enterprise Linux 4 and Fedora Core versions 2 to 4 when running the strict policy that permits multiple roles to be allocated to a user. But this is more of an issue with the newer versions of SE Linux policy that have functional support for MLS labels and the new MCS policy that permits different sets of categories to be assigned to a user session.

Non-SE Linux Requirements for Shared Directories

Polyinstantiation of shared directories also provides benefits for non-SE Linux systems, in fact there are probably more benefits to be gained from using this on non-SE systems. The SE Linux strict policy provides protection against sym-link race condition attacks launched by users against users in different roles, attacks by users against daemons, and attacks by daemons against users. The SE Linux MLS policy provides these benefits and also protects against attack from programs running at different levels, for example a process running at sensitivity level s2 could not be tricked into leaking data to a program running at level s1, even if the two programs ran in the same domain and with the same UID. Also SE Linux prevents unprivileged processes from creating hard links to files that are important to system integrity or data confidentiality (which is almost a complete solution to hard-link based attacks).

A non-SE system has none of the above protections and only has the Unix UID to protect both system integrity and confidentiality of data.

Linux Kernel Support for Poly-Instantiated Directories

In recent versions of Linux the current list of mounted file systems is available from the /proc/mounts file which is a sym-link to /proc/self/mounts, this permits displaying the name-space which applies to the current process. If /etc/mtab is a sym-link to /proc/mounts then programs such as df will display information on the mount points that are associated with the name-space for the process.

The initial support for PI directories was via the CLONE_NEWNS flag to the clone() system call. This flag causes the child process to be allocated a separate name space. That process and each child process that it launched would have a separate name space to the process which called clone(), and to any process that resulted from another call to clone() with the CLONE_NEWNS flag. The problem with this was the requirement that applications be modified to use clone() with this flag instead of using fork().

To solve this problem a new system call sys_unshare [HREF4] was added to the Linux kernel. The unshare system call can create a separate name-space for mounted file systems among other things (the set of kernel datra structures that can be unshared has been steadily increased since the introduction of unshare).

The unshare system call requires the SYS_ADMIN capability but does not require a fork, exec, or other operation. So it can be called from a PAM module and thus work with unmodified login programs. Also it is possible for multiple PAM modules to unshare different kernel data structures.

Shared Subtrees

One obvious problem with the functionality described in the previous section is the situation where the administrator wants to mount file systems and have all users see them, or have daemons mount file systems (such as autofs).

The solution to this is a development known as Shared Subtrees [HREF5]. This gives the option of specifying that certain subtrees will not be shared. For example if the directories /tmp and /var/tmp are being instantiated
then the following commands could be run from a system boot script to cause all other mount operations to propagate to all users:


mount –make-shared /
mount –bind /tmp /tmp
mount –make-private /tmp
mount –bind /var/tmp /var/tmp
mount –make-private /var/tmp

The above commands make the root of the name-space shared and then make /tmp and /var/tmp private. Note that the --make-private option to the mount command only applies to mount points. As on my test system both /tmp and /var/tmp are on the root file system I have to bind mount them to themselves to have a mount point that can be made private. Be aware that if you don’t correctly exclude the PI directories from the shared name space then each user who logs in may get PI directories under another user’s directories, and things generally won’t work.

Design Overview of PI Directories in Linux

The initial design for PI directories was based on having them only created for user sessions at login time by PAM [HREF6] or similar mechanisms. To implement this the PAM module will create a directory under the directory that is being instantiated, create an unshared name space, and then bind mount the new directory over the PI directory. For example if /tmp is to be PI for user rjc then the directory /tmp/tmp.inst-rjc-rjc would be created as the instance of /tmp for the user rjc. After the directory is created an unshared name space would be created via the unshare system call. Finally in the new name space a bind mount would be used to replace /tmp with /tmp/tmp.inst-rjc-rjc, the bind mount operation would be equivalent to the command:


mount --bind /tmp/tmp.inst-rjc-rjc /tmp

The directory that was created was given the Unix permission mode 1777 (all users can create files and directories, but it is only permitted to remove files or directories that you own). This solved many of the problems related to users attacking users and users attacking daemons. But it does not solve the problem of a daemon attacking a user as the daemon has access to the parent of the PI directories. Also there is a configuration option to have a user excluded from the PI directory system, a user who is granted such access (either deliberately or accidentally) would also be able to attack other users. As all directories were created under /tmp with mode 1777 there was no protection of secret file names from daemons and users who were outside the PI system (for most systems I expect that there wil be some users who will be excluded from the PI configuration).

Another problem with the initial implementation was that the directories were all created at login time, therefore a hostile process could guess the names and pre-allocate directories to allow taking over ownership and potentially allowing other race condition attacks. For example any privileged process which relies on files not being unlinked or renamed for correct operation would operate incorrectly (and possibly be subject to attack) if run in a situation where the /tmp directory did not have the mode 1777 to prevent such rename and unlink operations.

Finally the initial implementation did not have a fall-back case for when the desired name for a PI directory had been taken by a file and would cause the login process to abort, this could be used as a DOS attack against user login sessions.

I have identified two possible solutions to the problem of DOS attacks against the pam_namespace module. One solution is to have it check whether the PI directory already exists, if it exists but has the wrong permissions (either Unix or SE Linux) or if there is an object other than a directory using that name then it would try creating the directory under a different name (maybe the original name with “.1″ appended) and keep trying different names until it finds one that is available. This solution does not solve the problem of protecting secret file names.

The other solution I have identified solves the problems of DOS attacks and race conditions as well as the leaks of secret data in file names. This requires that a directory be pre-allocated on the system to contain all PI directories. So instead of a PI directory having the name /tmp/tmp.inst-rjc-rjc it might have the name /tmp/.inst/tmp.inst-rjc-rjc. The /tmp/.inst directory would be created and/or verified at system boot time and would have Unix permission mode 000 (the capability dac_override which every login program posesses would be requred to access it) and would also have a SE Linux context that permits only very restrictive access. Therefore non-root daemons will be denied access to /tmp/.inst and therefore would not be able to launch attacks on users via the /tmp directory. On SE Linux systems root daemons will also be denied such access. If a user session is launched with a shared system name space (through misconfiguration or unusual requirements) then they would also be denied access to the instance of /tmp used by other users.

In the initial design of PI directories the aim was to confine users to prevent them from attacking the rest of the system, and such a confined user was still vulnerable to attack from outside. The second of the two solutions that I propose above is the one that I believe to be the best, it will protect users who have a PI version of a shared directory from attack by non-root daemons on a non-SE system and from attack by root daemons as well on a SE Linux system. It would be a viable option for the sys-admin to give a single user a PI version of /tmp to protect the files for that user while allowing all other users access to the system shared name space.

At the time of writing we have agreement on the concept of using a naming system somewhat like /tmp/.inst/tmp.inst-something where the directory /tmp/.inst will have Unix mode 000 and restrictive SE Linux access controls. This will prevent daemons and users that are not included in the PI configuration from attacking daemons and users that have it enabled. This makes PI protect the user who has such a PI directory as well as protecting the rest of the system from that user. Note that at the time of writing there was no final agreement on the directory names, while the concept of a two-level directory is agreed the actual name of the directory in the default configuration is still to be resolved.

A feature that has been discussed and agreed in concept is to have the pam_namespace.so module check the permissions of the /tmp/.inst directory and abort the login process if the directory does not have Unix permission mode 000, root ownership, and a suitable SE Linux label (if SE Linux is enabled). There will be a configuration option to disable this functionality as not all systems will need this level of protection (and not all administrators will want a system to fail-closed on such a minor security issue).

Currently Released Code

As of the time of writing Fedora Rawhide has a shared object named pam_namespace.so that implements the basic functionality. To use it the PAM configuration files in the /etc/pam.d directory must be modified to have the following line at the end:


session required pam_namespace.so

The system will work if the pam_namespace object is not the last in the list, but the creation of the namespace may interfere with some other PAM modules (for example if a PAM module wanted to access files in the /tmp directory) and in general it is safest to have it last. The only situation in which you might not want to have pam_namespace as the last session module is if you are using pam_mkhomedir and also using pam_namespace to provide PI home directories. But currently pam_mkhomedir does not work correctly in situations where PI home directories are desired so this should not be an issue.

The most noteworthy parameter for the pam_namespace module is the optional parameter unmnt_remnt. This is used by programs that run from an unshared namespace and need to create another unshared namespace. The primary example of this is su, all other programs that perform actions which are similar in concept (IE they are run from a user session and launch a new session on behalf of another user) will have the same requirement.

The pam_namespace module uses the configuration file /etc/security/namespace.conf. This file currently has four parameters, the first gives a directory that should be instantiated (there is an option of $HOME for instantiating the user home directory). The second is the name of the real directory to be used for the instance which has variables $USER and $HOME to represent the user-name and the home directory of the user. The third parameter may have as it’s value user, context, or both to indicate whether the instantiation should be based on user-name, SE Linux context, or both. The final parameter is a list of comma-separated user-names for accounts that are exempt from poly-instantiation of the directory in question. I believe that it will be standard practice to include root in this list of accounts (usually there will be no need for other users to be excluded).

Preventing Daemons From Attacking Each Other

In the currently released code there is no protection against daemons attacking each other. I believe that to take advantage of the full benefits offered by PI directories most daemons that run as non-root need the same protection so that they can not attack each other.

In Fedora there is a new program called runuser that will start a daemon as a user other than root. It is linked against PAM and can be configured to call the pam_namespace module. When I finish the debugging then every time it launches a daemon as non-root it will be able to create a new unshared namespace. Non-root daemons that require the system shared name space will need to have their user-names specified in the namespace.conf file.

In Debian daemons are started via a program named start-stop-daemon. I plan to modify this program to have the necessary name-space functionality.

Interesting Features

There is no requirement that the PI directory be a sub-directory of the directory it replaces. In fact it can be on a different filesystem. If you have a separate filesystem for /tmp and don’t want to have a separate filesystem for /var/tmp too then you could just configure namespace.conf such that /var/tmp is instantiated under /tmp. The following is one sample configuration:


/tmp /tmp/.inst/tmp.inst-$USER- both rjc,root
/var/tmp /tmp/.inst/var-tmp.inst-$USER- both rjc,root

Conclusion

With a two-level directory configuration (such as /tmp/.inst/whatever) we protect against all the attack scenarios that I consider (including an attack launched by a root-owned daemon on users when running SE Linux). The protection provided by the PI shared directory works in two ways, it protects the process with the unshared namespace and it also protects all other processes on the system against attack from that process.

Most operations that are described in this paper are usable in Fedora Rawhide as of the 31st of May 2006. The only operation that is not usable at this time is PI support for daemons. I hope to have PI working in Debian and have PI support for daemons in both Debian and Fedora Rawhide by the time this paper is published, I will describe my success in these efforts when I present this paper.



Hypertext References

HREF1 http://www.coker.com.au/selinux/
HREF2 http://www.cs.stthomas.edu/faculty/resmith/r/mls/index.html
HREF3 http://www.openwall.com/linux/README.shtml
HREF4 http://marc2.theaimsgroup.com/?l=linux-kernel&m=112350785026703&w=2

HREF5
http://lwn.net/Articles/159092/


HREF6
http://www.kernel.org/pub/linux/libs/pam/


Copyright

The System Administrators Guild of Australia© 2006. The authors assign to The System Administrator’s Guild of Australia and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to The System Administrators Guild of Australia to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.

03 Jun

Maildir Bulletin

This program is designed to deliver bulletin messages to thousands of users on a system. If you want to deliver mail to a large number of people to be read through POP or a local email program (such as mutt) then the traditional approach has been to setup an alias to map to all the users. The problem with this is that mail delivery is very slow and even delivering to 1000 users on a fast machine can take a significant amount of time and system resources. Also if the message is large then you use a lot of disk space.

This program solves that problem by creating a single file with the message data and creating links to it from the ~/Maildir/new directory of every user who is in the group (it delivers mail based on Unix groups). This is fast (can deliver a bulletin to 30000 users in minutes on a slow machine), saves disk space, works with all Maildir client software, and makes it very easy to undeliver or modify a bulletin.

Download links:

28 May

RAM Speed according to Memtest86+

Here are some speed results for RAM according to Memtest86+ on some machines that I have run. Note that the reported speed varies between runs by a few percent.












Compaq Athlon 1GHz PC133 RAM (3 DIMMs)219MB/s
Compaq P3-800MHz PC133 RAM (1 DIMM)270MB/s
Compaq P3-800MHz PC133 RAM (3 DIMMs, 2*128 + 256)240MB/s
Compaq P4 1.5GHz PC133 RAM (3 DIMMs)486MB/s
Compaq P4 1.5GHz PC133 RAM (1 or 2 DIMMs)490MB/s
HP Celeron 1.8GHz PC2100/DDR266 (1 DIMM)824MB/s
HP Celeron 2.4GHz PC2100/DDR266 RAM (2 DIMMs)984MB/s
Celeron D (32bit) 2.93GHz PC2400/DDR300 PC3200 RAM1,140MB/s
Dell PowerEdge T105 Dual-core Opteron 1212 (2GHz) single DDR2-667 ECC RAM1,670MB/s
Dell PowerEdge T105 Dual-core Opteron 1212 (2GHz) pair of DDR2-667 ECC RAM1,826MB/s
NEC Pentium D 2.8GHz DDR533 (paired DIMMS)2,600MB/s
NEC Pentium D 2.8GHz DDR 533 (unpaired DIMMS)1,600MB/s
NEC Pentium E2160 1.8GHz DDR663 (single DIMM)11,700MB/s

The Wikipedia page on SDRAM lists the theoretical speeds and the various names of the different types of DDR RAM (each type seems to have at least two names).

DDR266 theoretically can do 2100MB/s, but I’ve only seen it do 984MB/s (with two DIMMs).

26 May

Benchmarking Mail Relays and Forwarders

Notes

I presented this paper at the OSDC conference in 2006.

The main page for my Postal benchmark is at http://doc.coker.com.au/projects/postal/. My blog posts about benchmarking can be found at http://etbe.coker.com.au/category/benchmark/.

Abstract

Postal is a mail server benchmark that I wrote. The main components of it are postal for testing the delivery of mail via SMTP, rabid for testing mail download via POP, and a new program I have just written called bhm (for Black Hole Mailer) which listens on port 25 and sends all mail to /dev/null.

The new BHM program makes it possible to test the performance of mail relay systems. This means outbound smart host gateway systems, list servers, and forwarding services. The testing method is to configure three machines, one running Postal for sending the mail, the machine to be tested running a mail server or list server configuration, and the target machine running BHM.
The initial aim of this paper was to use artificial delays in the BHM program to simulate slow network performance and also to simulate various anti-spam measures and to measure how they impact a mail relay system. However I found other issues along the way which were interesting to analyse and will be useful to other people.

Description of Postal

The first component of the benchmark suite is Postal, this program sends mail at a controlled rate. When using it you have a list of addresses for senders and a separate list of recipients that will be used for sending random messages to a mail server. It sends the mail to a specified IP address to save the effort of configuring a test DNS server because in the most common test scenario you have a single mail server that you want to test.

Postal sends at a fixed rate because in most MTAs an initial burst of mail will just go to the queue and will be actually delivered much more slowly. Often mail servers will take two minutes or more of sustained load to show the full performance impact. So I designed the programs in the Postal suite to display their results once per minute so you can watch the performance of the system over time and track the system load.

The most important thing to observe is that the load (in all areas) is below 100%. If any system resource (CPU, network IO, or disk IO) is used to 100% capacity then the queue will grow without limit. Such unlimited queue growth leads to timeouts which increases the queue and causes the system to break down. An SMTP server has a continual load from the rest of the Internet and if it goes more slowly the load will not decrease in the short-term. So a server that falls behind can simply become unusable, an unmanaged mail server can easily accumulate a queue of messages as old as a week through not having the performance required to deliver them as fast as they arrive.

The second program in the Postal suite is Rabid, a benchmark for POP servers. The experiments I document in this paper do not involve Rabid.

The most recent program is BHM which is written as an SMTP sink for testing mail relays. The idea is that a mail relay machine will have mail sent to it by Postal and then send it on to a machine running BHM. There are many ways in which machines that receive mail can delay mail and thus increase the load on the server. Unfortunately I spent all the time available for this paper debugging my code and tracking down DNS server issues so I didn’t discover much about the mail server itself.

Hardware

For running Postal I used my laptop. Postal does much less work than any other piece of software in the system so I’m sure that my laptop is not a performance bottleneck. It is however a 1700MHz Pentium-M and probably the fastest machine in my network.

For the mail relay machine (the system actually being tested) I used a Compaq Evo desktop machine with a 1.5GHz P4 CPU, 384M of RAM, and an 80G IDE disk.

For running BHM I used an identical Compaq system.

The network is 100baseT full duplex with a CableTron SmartSwitch. I don’t think it will impact the performance. During the course of testing I did not notice any reports of packet loss or collisions.

All the machines in question were running the latest Fedora rawhide as of late September 2006.

Preparation

To prepare for the testing I set up a server running BHM with 254 IP addresses to receive email (mail servers perform optimisations if they see the same IP address being used). The following script creates the interfaces:


for n in `seq 1 254`
do ifconfig eth0:$n 10.254.0.$n netmask 255.255.255.0
done

Test 1, BIND and MTA on the Same Server

The script in appendix 1 creates the DNS configuration for the 254 zones and the file of email addresses (one per zone) to use as destinations.
I configured the server as a DNS server and a mail relay. A serious mail server will often have a DNS cache running on localhost so for my purposes
having primary zones under example.com configured on a DNS server on localhost seemed appropriate.

I initially tested with only a single thread of Postal connecting to the server. This means that there was no contention on the origin side and it was all on the MTA side. I tested Sendmail and Postfix with an /etc/aliases file expanding to 254 addresses (one per domain). All the messages had the same sender, and the message size was a random value from 0 to 10K.

The following table shows the amount of CPU time used by the server (from top output) and the load average as well as the mail server in use and the number of messages per minute sent through it.





MTAMsgs/MinuteCPU UseLoad Average
Postfix15~70%9
Postfix18~80%9
Postfix20~90%11
Sendmail10~50%1
Sendmail13~70%2
Sendmail15~95%4.5
Sendmail20100%*

Surprisingly the named process appeared to be using ~10% of the CPU at any given time when running Postfix and 25% of the CPU when running Sendmail (not sure why Sendmail does more DNS work - both MTAs were in fairly default Fedora configurations). As for this operation CPU was the bottleneck it appears that having the named process on the same machine might not be a good optimisation.

When testing 15 and 20 messages per minute with Sendmail the CPU use was higher than with Postfix and in my early tests with 256M of RAM in the kernel started reporting ip_conntrack: table full, dropping packet. which disrupted the tests by deferring connections.

The conntrack errors are because the TCP connection tracking code in the kernel has a fixed number of entries where the default is chosen based on the amount of RAM in the system. With 256M of RAM in the test system the number of connections that could be tracked was just under 15,000. After upgrading the system to 384M of RAM there was support for tracking 24,568 connections and the problem went away. You can change the maximum number of connections by the command echo NUMBER > /proc/sys/net/ipv4/ip_conntrack_max or for a permanent change edit /etc/sysctl.conf and add the line net.ipv4.ip_conntrack_max = NUMBER and then run sysctl -p to load the settings from /etc/sysctl.conf. Note that adding more RAM will increase many system resource limits that affect the operation of the system.

My preferred solution to this problem is to add more RAM because it keeps the machine in a default configuration which decreases the chance of finding a new bug that no-one else has found. Also an extra 128M of RAM is not particularly expensive.

After performing those tests I decided that I needed to add a minimum message size option to Postal (for results that had a lower variance).

I also decided to add an option to specify the list of sender addresses separately from the recipient addresses. When I initially wrote Postal the aim was to test a mail store system. So if you have 100,000 mail boxes then sending mail between them randomly works reasonably well. However for a mail relay system a common test scenario is having two disjoint sets of users for senders and recipients.

Test 2, Analysing DNS Performance

For the second test run I moved the DNS server to the same machine that runs the BHM process which is lightly loaded as the mail relay doesn’t send enough mail to cause BHM to take much CPU time).

I then did a test with Sendmail to see what the performance would be for messages that have a size of exactly 10K for the body which are sent from 254 random sender addresses (one per domain). I noticed that the named process rapidly approached 100% CPU use and was a bottleneck on system performance. It seems that the DNS load for Sendmail is significant!

I then analysed the tcpdump output from the DNS server and saw the following requests:

IP sendmail.34226 > DNS.domain: 61788+ A? a0.example.com. (32)
IP sendmail.34228 > DNS.domain: 22331+ MX? a0.example.com. (32)
IP sendmail.34229 > DNS.domain: 4387+ MX? a0.example.com. (32)
IP sendmail.34229 > DNS.domain: 18834+ A? mail.a0.example.com. (37)

It seems that there are four DNS requests per recipient giving a total of 1016 DNS requests per message. When 15 messages per minute are delivered to 254 recipients that means 254 DNS requests per second plus some extra requests (lookups of the sending IP address etc).

Also one thing I noticed is that Sendmail does a PTR query (reverse DNS lookup) on it’s own IP address for every delivery to a recipient. This added an extra 254 DNS queries to the total for Sendmail. I am sure that I could disable this through Sendmail configuration, but I expect that most people who use Sendmail in production would use the default settings in this regard.

Noticing that the A record is consulted first I wondered whether removing the MX record and having only an A record would change things. The following tcpdump output shows that the same number of requests are sent so it really makes no difference for Sendmail:


IP sendmail.34238 > DNS.domain: 26490+ A? a0.example.com. (32)
IP sendmail.34240 > DNS.domain: 16187+ MX? a0.example.com. (32)
IP sendmail.34240 > DNS.domain: 57339+ A? a0.example.com. (32)
IP sendmail.34240 > DNS.domain: 50474+ A? a0.example.com. (32)

Next I tested Postfix with the same DNS configuration (no MX record) and saw the following packets:


IP postfix.34245 > DNS.domain: 3448+ MX? a0.example.com. (32)
IP postfix.34261 > DNS.domain: 50123+ A? a0.example.com. (32)

The following is the result for testing Postfix with the MX based DNS configuration:

IP postfix.34675 > DNS.domain: 29942+ MX? a0.example.com. (32)
IP postfix.34675 > DNS.domain: 33294+ A? mail.a0.example.com. (37)

It seems that in all cases Postfix does less than half the DNS work that Sendmail does in this regard and as BIND is a bottleneck this means that Sendmail can’t be used. So I excluded Sendmail from all further tests.

Below is the results for the Exim queries for sending the same message, Exim didn’t check whether IPv6 was supported before doing an IPv6 DNS query. I filed a bug report about this and was informed that there is a configuration option to disable AAAA lookups, but it is agreed that looking up an IPv6 entry when there is no IPv6 support on the system (or no support other than link-local addresses) is a bad idea.

IP exim.35992 > DNS.domain: 43702+ MX? a0.example.com. (32)
IP exim.35992 > DNS.domain: 7866+ AAAA? mail.a0.example.com. (37)
IP exim.35992 > DNS.domain: 31399+ A? mail.a0.example.com. (37)

The total number of DNS packets sent and received for each mail server was 2546 for Sendmail, 1525 for Exim, and 1020 for Postfix. Postfix clearly wins in this case for being friendly to the local DNS cache and for not sending pointless IPv6 queries to external DNS servers. For further tests I will use Postfix as I don’t have time to configure a machine that is fast enough to handle the DNS needs of Sendmail.

Exim would also equal Postfix in this regard if configured correctly. However I am making a point of using configurations that are reasonably close to the Fedora defaults as that is similar to the common use on the net.

Test 3 - Postfix Performance

To test Postfix performance I used the DNS server on a separate machine which had the MX records. I decided to test a selection of message sizes to determine the correlation between message size and system load.




Msg SizeMsgs/MinuteCPU UseLoad Average
10K20~95%11
0-1K20~85%7
100K10~80%6

In all the above tests there were a few messages not being sent due to connections timing out. This seems unreasonably poor performance. I had expected Postfix in the most simplistic mailing list configuration to be able to handle more than 5080 outbound messages per minute.

Test 4 - different Ethernet Card

If an Ethernet device driver takes an excessive amount of CPU time in interrupt context then it will be billed to the user-space process that was running at the time, this can result in an application being reported as using a lot more CPU time than it really uses. To check whether that is the case I decided to replace the Ethernet card in the test system and see if that changed the reported CPU use.

I installed a PCI Ethernet card with Intel Corporation 82557/8/9 chipset to use instead of the Ethernet port on the motherboard which had a Intel Corporation 82801BA/BAM/CA/CAM chipset and observed no performance difference. I did not have suitable supplies of spare hardware to test a non-Intel card.

Conclusion

DNS performance is more important to mail servers than I had previously thought. The choice and configuration of the mail server will affect the performance required from local DNS caches and from remote servers. Sendmail is more demanding on DNS servers and Exim needs to be carefully configured to match the requirements.

I am still considering whether it would be more appropriate for Exim to check for IPv4 addresses before checking for IPv6 addresses given that most of the Internet runs on IPv4 only. Maybe a configuration option for this would be appropriate. [After publication it occurred to me that checking for IPv4 first would be bad if you want to migrate to an IPv6 Internet.]

Also other mail servers will have the same issues to face as IPv6 increases in popularity.

The performance of 20 messages per minute doesn’t sound very good, but when you consider the outbound performance it’s more impressive. Every inbound message gets sent to 254 domains, so 20 inbound messages per minute gives 84.6 outbound messages per second on average which is a reasonable number for a low-end machine. Surprisingly there was little disk IO load.

Future Work

The next thing to implement is BHM support for tar pits, gray-listing, randomly dropping connections, temporary deferrals, and generating bounce messages. This will significantly increase the load on the mail server. Administrators of list servers often complain about the effects of being tar-pitted, I plan to do some tests to estimate the performance overhead of this and determine what it means in terms of capacity planning for list administrators.

Another thing I plan to develop is support for arbitrary delays at various points in the SMTP protocol. This will be used for similating some anti-spam measures, and also the effects of an overloaded server which will take a long time to return a SMTP 250 code in response to a complete message. It will be interesting to discover whether making your mail server faster can help the internet at large.

Appendix 1


Script to create DNS configuration


#!/usr/bin/perl

# use:
# mkzones.pl 100 a%s.example.com 10.254.0 10.253.0.7
# the above command creates zones a0.example.com to a99.example.com with an
# A record for the mail server having the IP address 10.254.0.X where X is
# a number from 1 to 254 and an NS record
# with the IP address 10.1.2.4
#
# then put the following in your /etc/named.conf
#include "/etc/named.conf.postal";
#
# the file "users" in the current directory will have a sample user list for
# postal
#

my $inclfile = "/etc/named.conf.postal";
open(INCLUDE, ">$inclfile") or die "Can not create $inclfile";
open(USERS, ">users") or die "Can’t create users";
my $zonedir = "/var/named/data";
for(my $i = 0; $i < $ARGV[0]; $i++)
{
my $zonename = sprintf($ARGV[1], $i);
my $filename = "$zonedir/$zonename";
open(ZONE, ">$filename") or die "Can not create $filename";
print INCLUDE "zone \"$zonename\" {\n type master;\n file \"$filename\";\n};\n\n";
print ZONE "\$ORIGIN $zonename.\n\$TTL 86400\n\@ SOA localhost. root.localhost. (\n";
# serial refresh retry expire ttl
print ZONE " 2006092501 36000 3600 604800 86400 )\n";
print ZONE " IN NS ns.$zonename.\n";
print ZONE " IN MX 10 mail.$zonename.\n";
my $final = $i % 254 + 1;
print ZONE "mail IN A $ARGV[2].$final\n";
print ZONE "ns IN A $ARGV[3]\n";
close(ZONE);
print USERS "user$final\@$zonename\n";
}
close(INCLUDE);
close(USERS);

12 May

Computer Power Use

This table shows the power consumption of some of the computers I own. I use a domestic electricity meter that was certified for use in billing customers to measure this. Any inaccuracies in the measurement will
correspond to inaccuracies in electricity bills of people who use such computers.

Before anyone asks, I am not interested in contributions of data, I believe that doing tests with a different meter or in a different country with a different supply voltage will diminish the accuracy of the results. Also I will provide minimal analysis on this page (the numbers should allow you to perform your own analysis).

Before I started such tests I had significant problems cooling my house in summer. Based on the results of these tests I made changes such as replacing the Compaq 1GHz Athlon machine by an IBM 1GHz P3 machine for a small server I run, this saved 49W of power, 49W of power which mostly ends up as heat makes a significant difference in a small server room when running 24*7!

All the machines below apart from the SMP machine are workstation class machines, they don’t have ECC RAM and their PSUs are designed for small load. The SMP machine has a PSU designed for a desktop machine (I couldn’t easily obtain any other type). If it had a PSU designed for server use it would draw more power.

Unless otherwise noted all machines were idling while running Linux (idling while running DOS uses significantly more power).

The summary of this table is, P3 is a great CPU for power to computer power ratio, the P4 isn’t too good, and the Athlon sucks badly - don’t run an Athlon server if you have heat problems!




Compaq SFF 800MHz P3 512M 10G IDE35W
Compaq SFF 800MHz P3 512M 10G IDE spun-down28W
Compaq 800MHz P3 128M 10G IDE38W
Compaq 1.5GHz P4 256M 20G IDE, idling78W
Compaq 1.5GHz P4 256M 20G IDE, installing85W
IBM 1GHz P3 256M 30G IDE, idling38W
Compaq 1.1GHz Celeron 512M 40G IDE idling46W
Compaq 1GHz Athlon 256M 20G IDE idling87W
SMP 2*P3 1GHz, 1GB RAM, 2*U160 SCSI 18G disks idle81W
SMP 2*P3 1GHz, 1GB RAM, 2*U160 SCSI 18G disks disk busy99W
SMP 2*P3 1GHz, 1GB RAM, 2*U160 SCSI 18G disks CPU busy130W
SMP 2*P3 1GHz, 1GB RAM, 2*U160 SCSI 18G disks CPU and disk busy136W
White-box Athlon XP 1700+, 768M RAM, 2*80G IDE + 46G IDE110W
HP Pavilion 513A Celeron 1.8GHz, 384M RAM, 40G IDE45W
HP Pavilion 513A Celeron 1.8GHz, 768M RAM, 2*80G IDE + 46G IDE58W
Cobalt Qube AMD K6-450MHz, 128M RAM, 10G IDE20W
Packard-Bell (NEC) Celeron-D 2.93GHz, 512M RAM, 2*20G IDE75W
NEC Pentium-D (920) 2.8GHz, 1G RAM, 160G S-ATA98W
NEC Pentium-E2160 1.8GHz, 1G RAM (1 DIMM), 160G S-ATA52W
HP/Compaq Celeron 2.4GHz, 512M RAM, 300G IDE50W
HP/Compaq Celeron 2.4GHz, 512M RAM, no hard disk43W
Thinkpad T41p 1.7GHz idle at 600MHz, screen on and battery charged23W

Here is the Computer Related Power Use page [1] (for switches, filters, and other things).

28 Apr

SE Debian: how to make NSA SE Linux work in a distribution

Notes

I presented this paper at Ottawa Linux Symposium (OLS) 2002. Since that time the acceptance of SE Linux in Debian was significantly less than I expected. But the acceptance in Red Hat Enterprise Linux and Fedora has been quite good.

http://lsm.immunix.org/ is defunct, since about 2004.

I corrected the URLs for the NSA papers I referenced, NSA broke the links some years after I published this.

Crispin Cowan wrote some notes during my talk and sent them to a mailing list at the end, they are archived here.

SE Debian: how to make NSA SE Linux work in a distribution

Russell Coker <russell@coker.com.au>
http://www.coker.com.au/

Abstract


I conservatively expect that tens of thousands of Debian users will be using NSA SE Linux [1] next year. I will explain how to make SE Linux work as part of a distribution, and be managable for the administrator.

Although I am writing about my work in developing SE Linux support for Debian, I am using generic terms as much as possible, as the same things need to be done for RPM based distributions.

Introduction


SE Linux offers significant benefits for security. It accomplishes this by adding another layer of security in addition to the default Unix permissions model. This is accomplished by firstly assigning a type to every file, device, network socket, etc. Then every process has a domain, and the level of access permitted to a type is determined by the domain of the process that is attempting the access (in addition to the usual Unix permission checks). Domains may only be changed at process execution time. The domain may automatically be changed when a process is executed based on the type of the executable program file and the domain of the process that is executing it, or a privileged process may specify the new domain for the child process.

In addition to the use of domains and types for access control SE Linux tracks the identity of the user (which will be system_u for processes that are part of the operating system or the Unix user-name) and the role. Each identity will have a list of roles that it is permitted to assume, and each role will have a list of domains that it may use. This gives a high level of control over the actions of a user which is tracked through the system. When the user runs SUID or SGID programs the original identity will still be tracked and their privileges in the SE security scheme will not change. This is very different to the standard Unix permissions where after a SUID program runs another SUID program it’s impossible to determine who ran the original process. Also of note is the fact that operations that are denied by the security policy [2] have the identity of the process in question logged.

For a detailed description of how SE Linux works I recommend reading the paper Peter Loscocco presented at OLS in 2001 [1].

The difficulty is that this increase in functionality also involves an increase in complexity, and requires re-authenticating more often than on a regular Unix system (the SE Linux security policy requires that the user re-authenticate for change of role). Due to this most people who could benefit from SE Linux will find themselves unable to use it because of the difficulties of managing it. I plan to address this problem through packaging SE Linux for Debian.

The first issue is getting packages of software that is patched for support of the SE Linux system calls and logic. This includes modified programs for every method of login (/bin/login, sshd, and X login programs), modified cron to run cron jobs in the correct security context, modified ps to display the security context, modified logrotate to keep the correct context on log files, as well as many other modified utilities.

The next issue is to configure the system such that when a package of software is installed the correct security contexts will be automatically applied to all files.

The most difficult problem is ensuring that configuration scripts get run in the correct security context when installing and upgrading packages.

The final problem is managing the configuration files for the security policy.

Once these problems are solved there is still the issue of the SE Linux sample policy being far from the complete policy that is needed in a real network. I estimate that at least 500 new security policy files will need to be written before the sample policy is complete enough that most people can just select the parts that they need for a working system.

Patching the Packages

The task of the login program is to authenticate the user, chown the tty device to the correct UID, and change to the appropriate UID/GID before executing the user’s shell. The SE patched version of the login program performs the same tasks, but in addition changes the security identifier (SID) on the terminal device with the chsid system call and then uses the execve_secure system call instead of the execve system call to change the SID of the child process. The login program also gives the user a choice of which of their authorised roles they will assume at login time.

This is not very different from the regular functionality of the login program and does not require a significant patch.

Typically this adds less than 9K to the object size of the login program, so hopefully soon many of the login programs will have the SE code always compiled in. For the rest we just need a set of packages containing the SE versions of the same programs. So this issue is not a difficult one to solve and most of the work needed to solve it has been done.

A similar patch needs to be applied to many other programs which perform similar operations. One example is cron which needs to be modified so cron jobs will be run in the correct security context. Another example is the suexec program from Apache. An example of a similar program for which no-one has yet written a patch is procmail.

Programs which copy files also need to have suitable options for preserving SIDs, logrotate and the fileutils package (which includes cp) have such patches, cpio lacks such a patch, and there is a patch for tar but it doesn’t apply to recent versions and probably needs to be re-written.

Setting the Correct SID When Installing Files

When a package of software is installed the final part of the installation is running a postinst script which in the case of a daemon will usually start the daemon in question. However if the files in the package do not have the correct SIDs then the daemon may not be able to run, or will be unable to run correctly!

The Debian packaging system does not currently have any support for running a script after the files of a package are installed but before the postinst script. There have been discussions for a few years on how best to do this, as I didn’t have time to properly re-write dpkg I instead did a quick hack to make it run scripts that it finds in /etc/dpkg/postinst.d/ before running the postinst of the package.

When installing an SE Linux system the program setfiles is used to apply the correct SIDs to all files in the system. I have written a patch to make it instead take a list of canonical fully-qualified file names on standard input if run with the -s switch, which is now included in the NSA source release.

The combination of the dpkg patch and the setfiles patch allow me to solve the basic problem of getting the correct SIDs applied to files, my script just queries the package management system for a list of files contained in the package and pipes it through to setfiles to set the SID on each file.

The next complication is setting the correct SID for the setfiles program, by default it gets installed with the security type sbin_t because that is the type of the directory it is installed in. However in my default policy setup I have not given the dpkg_t domain (which is used by the dpkg program when it is run administratively) the privilege of changing the SID of files. So the setfiles program needs to have the type setfiles_exec_t to trigger an automatic domain transition to the setfiles_t domain.

To solve this issue I have the preinst script (the script that is run before the package is installed) of the selinux package rename the /usr/sbin/setfiles to /usr/sbin/setfiles.old on an upgrade. Then the /etc/dpkg/postinst.d/selinux script will run the old version if it exists.

Here’s the relevant section of the selinux.preinst file:

if [ ! -f /usr/sbin/setfiles.old -a \
    -f /usr/sbin/setfiles ]; then
  mv /usr/sbin/setfiles /usr/sbin/setfiles.old
fi

Here’s the contents of /etc/dpkg/postinst.d/selinux. The first parameter to the script is the name of the package that is being installed. Also I have “grep …” included because setfiles currently has some problems with blank lines and /. which dpkg produces.

#!/bin/sh

make -s -C /etc/selinux \
  file_contexts/file_contexts

SETFILES=/usr/sbin/setfiles
if [ -x /usr/sbin/setfiles.old ]; then
  SETFILES=/usr/sbin/setfiles.old
fi
dpkg -L $1 | grep ^/.. | $SETFILES -s \
    /etc/selinux/file_contexts/file_contexts
if [ -x /usr/sbin/setfiles.old \
    -a "$1" = "selinux" ]; then
  rm /usr/sbin/setfiles.old
fi

Running Configuration Scripts in the Correct Context

When a SE Linux system boots the process init is started in the domain init_t. When it runs the daemon start scripts it uses the scripts /etc/init.d/rc and /etc/init.d/rcS on a Debian system (on Red Hat it is /etc/rc.d/rc and /etc/rc.d/rc.sysinit). So these scripts are given the type initrc_exec_t and there is a rule domain_auto_trans(init_t, initrc_exec_t, initrc_t) which causes a transition to the initrc_t domain. The security policy for each daemon will have a rule causing a domain transition from the initrc_t domain to the daemon domain upon execution of the daemon. This all happens as the system_u identity and the system_r role.

When the system administrator wants to start a script manually they use the program run_init which can only be run from the sysadm_t domain, it re-authenticates the administrator (to avoid the possibility of it being called by some malicious code that the administrator accidentally runs) before running the specified script as system_u:system_r:initrc_t.

This works fine when the daemon start script is quite simple (most such start scripts just check whether the daemon is already running and then run it with appropriate parameters). However this doesn’t work for complex scripts, which may copy files, change sysctl entries via /proc, and do many other things. An example of this is the devfsd package where the start script creates device nodes for device drivers that lack kernel support for devfs. Getting this to work correctly required that the code for device node creation be split into a separate file with the same SID as the main daemon (devfsd_exec_t) which causes it to run in the same domain as the daemon (devfsd_t). Such changes will probably have to be made to about 5% of daemon start scripts.

But that is part of the standard proceedure of correctly setting up SE Linux. The package specific part comes when the scripts have to be started from the package installation. To get the correct domain (initrc_t) for the scripts I use the rule domain_auto_trans(dpkg_t, etc_t, initrc_t) which causes the dpkg_t domain to transition to the initrc_t domain when a script of type etc_t is executed. Now the hard part is getting the identity and the role correct when running dpkg. For this purpose I have written a customised version of run_init to change to the context to system_u:system_r:dpkg_t, system_u:system_r:apt_t, or system_u:system_r:dselect_t, for the programs dpkg, dselect, and apt-get respectively.

The apt_t and dselect_t domains are only used for selecting and downloading packages, and then executing dpkg, which triggers an automatic transition to the dpkg_t domain.

Managing the Configuration Files

For normal configuration files in Debian (almost every file under /etc and some files in other locations) the file is registered as a conffile in the packaging system, and the package status file contains the MD5 checksum of the file. If a file is changed from it’s original contents (according to an MD5 check) at the time the package is upgraded and if the new version has a different set of data for the file than that which was provided by the old version of the package (according to MD5) then the user will be asked if they want to replace the old file (with a default of no). However if the new version of the package contains different content and the old content was not changed, then the user will get the new content without even being informed of the fact!

This is OK for many files, but the idea of a file from your audited security configuration being replaced with one you’ve never seen is not a pleasant one! This is only the first problem with managing policy files, the next problem is the size of the database for the sample policy. If you are using an initial RAM disk (initrd) then you must have the policy database on the initrd. The default initrd size of 4 megabytes is not large enough to accomodate the usual modules and the complete sample policy.

So what we need to solve this is a way of having a set of sample policy files (one per domain), of which not all will be used, and when new policy files are added or existing files are changed the user must be prompted as to whether they want to add the new files or apply the changes. Also when adding new policy the matching entries have to be added to the database used by setfiles for setting the file context.

In the latest versions of the sample policy the Makefile creates a configuration file for setfiles to match the program configuration files used. For every application policy file domains/program/%.te the matching file file_contexts/program/%.fc will be used as part of the configuration. This change will solve the issue of determining the configuration for setfiles, but it doesn’t entirely solve the problem. One issue with this is that when a file is added to or removed from the configuration the appropriate changes need to be made to the file system. If you make an addition to the policy before installing a new package (the correct proceedure) then you can usually get away without this as long as none of the files or directories previously existed, however this is not always the case, especially when files are diverted or when dealing with standard directories such as /var/spool/mail which will exist even if you have not installed any software to use them! It should not be that difficult to write a program to relabel the files matching the specifications of the added policy, the question is whether policy additions are common enough to make it worth saving the effort of a relabel. Also there’s the risk that a bug in such a program (or it’s use) could potentially cause a security hole.

The security policy is comprised of one configuration file per application (or class of application, some domains such as the DHCP client domain dhcpd_t are used by multiple programs which perform similar functions). Also sometimes an application requires multiple domains which will therefore be defined in the one file, for example my current policy for Postfix has eleven domains (which is excessive, I plan to reduce it to three or four once I’ve determined exactly what is required). One problem I faced with this is the issue of what to do when one domain needs to interact with another domain, for example the pppd process often needs to run sendmail -q to flush the mail queue when it establishes a connection. This requires the policy statement domain_auto_trans(pppd_t, sendmail_exec_t, sysadm_mail_t), previously such a statement would be put in either the sendmail.te file or the pppd.te file, thus making one of them depend on the other. This is a bad idea because there’s no reason for either of these programs to depend on the other. The solution I devised is based on the M4 macro language (which was already used for simpler macro functionality in producing the policy file). I created a script to define a macro with the name of each application policy file that is used. So the solution to the PPP and Sendmail problem is to put the following in the pppd.te file:


ifdef(`sendmail.te’,
`domain_auto_trans(pppd_t, sendmail_exec_t
                 , sysadm_mail_t)’)

The next problem, is how to effectively manage things so that when I ship a new and improved sample policy the administrator can update it without excessive pain.

The current method involves running diff -ru and then copying files if you like the changes. This is excessively painful even when managing one or two SE Linux machines! So it obviously won’t scale to serious production. I plan to write a Perl script to manage this, the first thing it has to do is track when the administrator doesn’t want a policy file. When a file is removed then the fact that the user has chosen not to have that file installed should be recorded, and they should not be prompted to re-install it on the next upgrade. However if the sample policy is upgraded and a new file has been added then they should be asked if they want to install it. Then when a file in the sample policy changes and it is a file that is installed the user should be asked if they want the new file copied over their existing file (and they should be provided with a diff to show what the changes would be). Finally if such changes involve the file configuration for setfiles then the user should be asked whether they want to relabel the system.

The people who are working on Red Hat packaging are considering other ways of managing the versions of configuration files, one of which involves having symbolic links pointing to the files to be used, if you decide to use your own version instead of one of the supplied policy files then you can change the sym-link.

Managing Device Nodes

In Linux there are two methods of managing device nodes. One is the traditional method of having /dev be a regular directory on the root file system and have device nodes created on it with mknod, the other is the devfs file system which allows the kernel to automatically create device nodes while the devfsd process automatically assigns the correct UID, GID, and permissions to them.

On a traditional (non-devfs) system running SE Linux the device nodes will be labelled in the same way as any other file. On a devfs system things are different, the devfs policy database contains rules for labelling device nodes. However this has some limitations, one being that when the policy database does not have an entry for the device node at the time it is created, then it will never be labelled. Another is that every type listed in the devfs configuration rules must be defined, which can cause needless dependencies.

To address these issues I wrote a module for devfsd which adds support for SE Linux. This allows you to change the mapping of SIDs to device nodes and re-apply it at any time, and if a security context listed in the configuration file does not exist in the policy then an error will be logged and the system will continue working.

This is especially useful for the case of an initrd as the types for all the possible device nodes won’t need to be in the ram disk.

Work To Be Done

Initial RAM Disk

When using an initrd to boot a modular kernel the security policy database must be stored on the initrd. The problem is that the default initrd size is 4M, which does not leave much space when libc6 is included, often not enough for the policy you want. Also even if the policy does fit you won’t really want to have such a large initrd image. If you are installing SE Linux on a single PC, or even on a network of similar PCs then you are best advised to build a kernel with all modules needed for booting statically linked and not use an initrd. However this is not possible for a distribution vendor who has to support a huge variety of hardware.

Another problem with using an initrd for storing the policy is that when you generate a new policy you then have to regenerate the initrd to avoid having your changes disappear on the next boot, of course a boot script could easily load the updated policy from the root file system before going to multi-user mode. But it is wasteful to have a large policy on the initrd that you then discard before ever using much of it.

The solution is to have a small policy that contains all the settings needed for either the first stage of boot, or alternately for running recovery tools in case a failure prevents the machine from entering multi-user mode. Then after the machine has passed the first stages of the boot process a complete policy can be loaded from the root file system, as long as the two policies don’t conflict in any major way this should work well. NB A Major policy conflict is a situation where the initrd defines domains that aren’t defined in the new policy and processes are executed in such a domain.

The latest release of SE Linux supports automatically re-loading the policy when the real root file system is mounted. Now all that needs to be done is for someone to write a mini-policy to install on the initrd.

Polishing run_init

Stephen Smalley has suggested that we develop a run_init program that incorporates the functionality of my modified program as well as of the original run_init program in a more generic fashion. It is apparent that other people will have similar needs for programs to execute programs under a different domain, role, and maybe identity. It is better that one program do this than to have many people writing programs for such things.

Also currently my program is hard-coded for the names of the Debian administration programs. An improved program should handle the needs of Debian, RPM, and the regular run_init functionality.

Writing Sample Policy Files

Currently any serious system will require policy files that are not in the sample policy. This forces everyone who uses SE Linux to start by writing policy files (which is the most difficult and time consuming task involved with the project). Currently we are writing new sample policy files for the variety of daemons and applications, and developing new macros for writing policy files quickly. With the new macros policy files are on average half the size that they used to be (and I aim to reduce the size again by new macros). The macros allow short policy files which are easy to understand, and therefore the user can easily determine how to make any required changes, or how to write a policy file for a new program based on existing programs.

Obtaining the Source

Currently most of my packages and source are available at http://www.coker.com.au/selinux/ however I plan to eventually get them all into Debian at which time I may remove that site.

I have several packages in the unstable distribution of Debian, the first is the kernel-patch-2.4-lsm and kernel-patch-2.5-lsm packages which supply the Linux Security Modules http://lsm.immunix.org/ kernel patch. That patch includes SE Linux as well as LIDS and some of the OpenWall functionality. When I have time I back-port patches to older kernels and include new patches that the NSA has not officially released, so often my patches will provide more features than the official patches distributed by the NSA from =http://www.nsa.gov/selinux/index.html= or the patches distributed by Immunix. However if you want the official patches then these packages may not be what you desire.

From the selinux-small archive I create the packages selinux and libselinux-dev which are also in the unstable distribution of Debian.

Acknowledgments

I would like to thank Stephen Smalley for being so helpful when I was learning about SE Linux, and Dr. Brian May for checking my early packages and giving me some good advice when I first started.

Also thanks to Dr. May, Stephen Smalley, and Peter Loscocco for reviewing this paper.

Bibliography


© 2008 Russell Coker’s Documents | Entries (RSS) and Comments (RSS)

wordpress logo
Close
E-mail It