Categories

Donate

Advert

Benchmarking Mail Relays and Forwarders

Notes

I presented this paper at the OSDC conference in 2006.

The main page for my Postal benchmark is at http://doc.coker.com.au/projects/postal/. My blog posts about benchmarking can be found at http://etbe.coker.com.au/category/benchmark/.

Abstract

Postal is a mail server benchmark that I wrote. The main components of it are postal for testing the delivery of mail via SMTP, rabid for testing mail download via POP, and a new program I have just written called bhm (for Black Hole Mailer) which listens on port 25 and sends all mail to /dev/null.

The new BHM program makes it possible to test the performance of mail relay systems. This means outbound smart host gateway systems, list servers, and forwarding services. The testing method is to configure three machines, one running Postal for sending the mail, the machine to be tested running a mail server or list server configuration, and the target machine running BHM.
The initial aim of this paper was to use artificial delays in the BHM program to simulate slow network performance and also to simulate various anti-spam measures and to measure how they impact a mail relay system. However I found other issues along the way which were interesting to analyse and will be useful to other people.

Description of Postal

The first component of the benchmark suite is Postal, this program sends mail at a controlled rate. When using it you have a list of addresses for senders and a separate list of recipients that will be used for sending random messages to a mail server. It sends the mail to a specified IP address to save the effort of configuring a test DNS server because in the most common test scenario you have a single mail server that you want to test.

Postal sends at a fixed rate because in most MTAs an initial burst of mail will just go to the queue and will be actually delivered much more slowly. Often mail servers will take two minutes or more of sustained load to show the full performance impact. So I designed the programs in the Postal suite to display their results once per minute so you can watch the performance of the system over time and track the system load.

The most important thing to observe is that the load (in all areas) is below 100%. If any system resource (CPU, network IO, or disk IO) is used to 100% capacity then the queue will grow without limit. Such unlimited queue growth leads to timeouts which increases the queue and causes the system to break down. An SMTP server has a continual load from the rest of the Internet and if it goes more slowly the load will not decrease in the short-term. So a server that falls behind can simply become unusable, an unmanaged mail server can easily accumulate a queue of messages as old as a week through not having the performance required to deliver them as fast as they arrive.

The second program in the Postal suite is Rabid, a benchmark for POP servers. The experiments I document in this paper do not involve Rabid.

The most recent program is BHM which is written as an SMTP sink for testing mail relays. The idea is that a mail relay machine will have mail sent to it by Postal and then send it on to a machine running BHM. There are many ways in which machines that receive mail can delay mail and thus increase the load on the server. Unfortunately I spent all the time available for this paper debugging my code and tracking down DNS server issues so I didn’t discover much about the mail server itself.

Hardware

For running Postal I used my laptop. Postal does much less work than any other piece of software in the system so I’m sure that my laptop is not a performance bottleneck. It is however a 1700MHz Pentium-M and probably the fastest machine in my network.

For the mail relay machine (the system actually being tested) I used a Compaq Evo desktop machine with a 1.5GHz P4 CPU, 384M of RAM, and an 80G IDE disk.

For running BHM I used an identical Compaq system.

The network is 100baseT full duplex with a CableTron SmartSwitch. I don’t think it will impact the performance. During the course of testing I did not notice any reports of packet loss or collisions.

All the machines in question were running the latest Fedora rawhide as of late September 2006.

Preparation

To prepare for the testing I set up a server running BHM with 254 IP addresses to receive email (mail servers perform optimisations if they see the same IP address being used). The following script creates the interfaces:

for n in `seq 1 254`
  do ifconfig eth0:$n 10.254.0.$n netmask 255.255.255.0
done

Test 1, BIND and MTA on the Same Server

The script in appendix 1 creates the DNS configuration for the 254 zones and the file of email addresses (one per zone) to use as destinations.
I configured the server as a DNS server and a mail relay. A serious mail server will often have a DNS cache running on localhost so for my purposes
having primary zones under example.com configured on a DNS server on localhost seemed appropriate.

I initially tested with only a single thread of Postal connecting to the server. This means that there was no contention on the origin side and it was all on the MTA side. I tested Sendmail and Postfix with an /etc/aliases file expanding to 254 addresses (one per domain). All the messages had the same sender, and the message size was a random value from 0 to 10K.

The following table shows the amount of CPU time used by the server (from top output) and the load average as well as the mail server in use and the number of messages per minute sent through it.

MTA Msgs/Minute CPU Use Load Average
Postfix 15 ~70% 9
Postfix 18 ~80% 9
Postfix 20 ~90% 11
Sendmail 10 ~50% 1
Sendmail 13 ~70% 2
Sendmail 15 ~95% 4.5
Sendmail 20 100% *

Surprisingly the named process appeared to be using ~10% of the CPU at any given time when running Postfix and 25% of the CPU when running Sendmail (not sure why Sendmail does more DNS work – both MTAs were in fairly default Fedora configurations). As for this operation CPU was the bottleneck it appears that having the named process on the same machine might not be a good optimisation.

When testing 15 and 20 messages per minute with Sendmail the CPU use was higher than with Postfix and in my early tests with 256M of RAM in the kernel started reporting ip_conntrack: table full, dropping packet. which disrupted the tests by deferring connections.

The conntrack errors are because the TCP connection tracking code in the kernel has a fixed number of entries where the default is chosen based on the amount of RAM in the system. With 256M of RAM in the test system the number of connections that could be tracked was just under 15,000. After upgrading the system to 384M of RAM there was support for tracking 24,568 connections and the problem went away. You can change the maximum number of connections by the command echo NUMBER > /proc/sys/net/ipv4/ip_conntrack_max or for a permanent change edit /etc/sysctl.conf and add the line net.ipv4.ip_conntrack_max = NUMBER and then run sysctl -p to load the settings from /etc/sysctl.conf. Note that adding more RAM will increase many system resource limits that affect the operation of the system.

My preferred solution to this problem is to add more RAM because it keeps the machine in a default configuration which decreases the chance of finding a new bug that no-one else has found. Also an extra 128M of RAM is not particularly expensive.

After performing those tests I decided that I needed to add a minimum message size option to Postal (for results that had a lower variance).

I also decided to add an option to specify the list of sender addresses separately from the recipient addresses. When I initially wrote Postal the aim was to test a mail store system. So if you have 100,000 mail boxes then sending mail between them randomly works reasonably well. However for a mail relay system a common test scenario is having two disjoint sets of users for senders and recipients.

Test 2, Analysing DNS Performance

For the second test run I moved the DNS server to the same machine that runs the BHM process which is lightly loaded as the mail relay doesn’t send enough mail to cause BHM to take much CPU time).

I then did a test with Sendmail to see what the performance would be for messages that have a size of exactly 10K for the body which are sent from 254 random sender addresses (one per domain). I noticed that the named process rapidly approached 100% CPU use and was a bottleneck on system performance. It seems that the DNS load for Sendmail is significant!

I then analysed the tcpdump output from the DNS server and saw the following requests:

IP sendmail.34226 > DNS.domain: 61788+ A? a0.example.com. (32)
IP sendmail.34228 > DNS.domain: 22331+ MX? a0.example.com. (32)
IP sendmail.34229 > DNS.domain: 4387+ MX? a0.example.com. (32)
IP sendmail.34229 > DNS.domain: 18834+ A? mail.a0.example.com. (37)

It seems that there are four DNS requests per recipient giving a total of 1016 DNS requests per message. When 15 messages per minute are delivered to 254 recipients that means 254 DNS requests per second plus some extra requests (lookups of the sending IP address etc).

Also one thing I noticed is that Sendmail does a PTR query (reverse DNS lookup) on it’s own IP address for every delivery to a recipient. This added an extra 254 DNS queries to the total for Sendmail. I am sure that I could disable this through Sendmail configuration, but I expect that most people who use Sendmail in production would use the default settings in this regard.

Noticing that the A record is consulted first I wondered whether removing the MX record and having only an A record would change things. The following tcpdump output shows that the same number of requests are sent so it really makes no difference for Sendmail:


IP sendmail.34238 > DNS.domain: 26490+ A? a0.example.com. (32)
IP sendmail.34240 > DNS.domain: 16187+ MX? a0.example.com. (32)
IP sendmail.34240 > DNS.domain: 57339+ A? a0.example.com. (32)
IP sendmail.34240 > DNS.domain: 50474+ A? a0.example.com. (32)

Next I tested Postfix with the same DNS configuration (no MX record) and saw the following packets:


IP postfix.34245 > DNS.domain: 3448+ MX? a0.example.com. (32)
IP postfix.34261 > DNS.domain: 50123+ A? a0.example.com. (32)

The following is the result for testing Postfix with the MX based DNS configuration:

IP postfix.34675 > DNS.domain: 29942+ MX? a0.example.com. (32)
IP postfix.34675 > DNS.domain: 33294+ A? mail.a0.example.com. (37)

It seems that in all cases Postfix does less than half the DNS work that Sendmail does in this regard and as BIND is a bottleneck this means that Sendmail can’t be used. So I excluded Sendmail from all further tests.

Below is the results for the Exim queries for sending the same message, Exim didn’t check whether IPv6 was supported before doing an IPv6 DNS query. I filed a bug report about this and was informed that there is a configuration option to disable AAAA lookups, but it is agreed that looking up an IPv6 entry when there is no IPv6 support on the system (or no support other than link-local addresses) is a bad idea.

IP exim.35992 > DNS.domain: 43702+ MX? a0.example.com. (32)
IP exim.35992 > DNS.domain: 7866+ AAAA? mail.a0.example.com. (37)
IP exim.35992 > DNS.domain: 31399+ A? mail.a0.example.com. (37)

The total number of DNS packets sent and received for each mail server was 2546 for Sendmail, 1525 for Exim, and 1020 for Postfix. Postfix clearly wins in this case for being friendly to the local DNS cache and for not sending pointless IPv6 queries to external DNS servers. For further tests I will use Postfix as I don’t have time to configure a machine that is fast enough to handle the DNS needs of Sendmail.

Exim would also equal Postfix in this regard if configured correctly. However I am making a point of using configurations that are reasonably close to the Fedora defaults as that is similar to the common use on the net.

Test 3 – Postfix Performance

To test Postfix performance I used the DNS server on a separate machine which had the MX records. I decided to test a selection of message sizes to determine the correlation between message size and system load.

Msg Size Msgs/Minute CPU Use Load Average
10K 20 ~95% 11
0-1K 20 ~85% 7
100K 10 ~80% 6

In all the above tests there were a few messages not being sent due to connections timing out. This seems unreasonably poor performance. I had expected Postfix in the most simplistic mailing list configuration to be able to handle more than 5080 outbound messages per minute.

Test 4 – different Ethernet Card

If an Ethernet device driver takes an excessive amount of CPU time in interrupt context then it will be billed to the user-space process that was running at the time, this can result in an application being reported as using a lot more CPU time than it really uses. To check whether that is the case I decided to replace the Ethernet card in the test system and see if that changed the reported CPU use.

I installed a PCI Ethernet card with Intel Corporation 82557/8/9 chipset to use instead of the Ethernet port on the motherboard which had a Intel Corporation 82801BA/BAM/CA/CAM chipset and observed no performance difference. I did not have suitable supplies of spare hardware to test a non-Intel card.

Conclusion

DNS performance is more important to mail servers than I had previously thought. The choice and configuration of the mail server will affect the performance required from local DNS caches and from remote servers. Sendmail is more demanding on DNS servers and Exim needs to be carefully configured to match the requirements.

I am still considering whether it would be more appropriate for Exim to check for IPv4 addresses before checking for IPv6 addresses given that most of the Internet runs on IPv4 only. Maybe a configuration option for this would be appropriate. [After publication it occurred to me that checking for IPv4 first would be bad if you want to migrate to an IPv6 Internet.]

Also other mail servers will have the same issues to face as IPv6 increases in popularity.

The performance of 20 messages per minute doesn’t sound very good, but when you consider the outbound performance it’s more impressive. Every inbound message gets sent to 254 domains, so 20 inbound messages per minute gives 84.6 outbound messages per second on average which is a reasonable number for a low-end machine. Surprisingly there was little disk IO load.

Future Work

The next thing to implement is BHM support for tar pits, gray-listing, randomly dropping connections, temporary deferrals, and generating bounce messages. This will significantly increase the load on the mail server. Administrators of list servers often complain about the effects of being tar-pitted, I plan to do some tests to estimate the performance overhead of this and determine what it means in terms of capacity planning for list administrators.

Another thing I plan to develop is support for arbitrary delays at various points in the SMTP protocol. This will be used for similating some anti-spam measures, and also the effects of an overloaded server which will take a long time to return a SMTP 250 code in response to a complete message. It will be interesting to discover whether making your mail server faster can help the internet at large.

Appendix 1

Script to create DNS configuration

#!/usr/bin/perl
#
# use:
# mkzones.pl 100 a%s.example.com 10.254.0 10.253.0.7
# the above command creates zones a0.example.com to a99.example.com with an
# A record for the mail server having the IP address 10.254.0.X where X is
# a number from 1 to 254 and an NS record
# with the IP address 10.1.2.4
#
# then put the following in your /etc/named.conf
#include "/etc/named.conf.postal";
#
# the file "users" in the current directory will have a sample user list for
# postal
#
my $inclfile = "/etc/named.conf.postal";
open(INCLUDE, ">$inclfile") or die "Can not create $inclfile";
open(USERS, ">users") or die "Can’t create users";
my $zonedir = "/var/named/data";
for(my $i = 0; $i < $ARGV[0]; $i++)
{
my $zonename = sprintf($ARGV[1], $i);
my $filename = "$zonedir/$zonename";
open(ZONE, ">$filename") or die "Can not create $filename";
print INCLUDE "zone \"$zonename\" {\n type master;\n file \"$filename\";\n};\n\n";
print ZONE "\$ORIGIN $zonename.\n\$TTL 86400\n\@ SOA localhost. root.localhost. (\n";
# serial refresh retry expire ttl
print ZONE " 2006092501 36000 3600 604800 86400 )\n";
print ZONE " IN NS ns.$zonename.\n";
print ZONE " IN MX 10 mail.$zonename.\n";
my $final = $i % 254 + 1;
print ZONE "mail IN A $ARGV[2].$final\n";
print ZONE "ns IN A $ARGV[3]\n";
close(ZONE);
print USERS "user$final\@$zonename\n";
}
close(INCLUDE);
close(USERS);

SE Debian: how to make NSA SE Linux work in a distribution

Notes

I presented this paper at Ottawa Linux Symposium (OLS) 2002. Since that time the acceptance of SE Linux in Debian was significantly less than I expected. But the acceptance in Red Hat Enterprise Linux and Fedora has been quite good.

http://lsm.immunix.org/ is defunct, since about 2004.

I corrected the URLs for the NSA papers I referenced, NSA broke the links some years after I published this.

Crispin Cowan wrote some notes during my talk and sent them to a mailing list at the end, they are archived here.

SE Debian: how to make NSA SE Linux work in a distribution

Russell Coker <russell@coker.com.au>
http://www.coker.com.au/

Abstract

I conservatively expect that tens of thousands of Debian users will be using NSA SE Linux [1] next year. I will explain how to make SE Linux work as part of a distribution, and be managable for the administrator.

Although I am writing about my work in developing SE Linux support for Debian, I am using generic terms as much as possible, as the same things need to be done for RPM based distributions.

Introduction

SE Linux offers significant benefits for security. It accomplishes this by adding another layer of security in addition to the default Unix permissions model. This is accomplished by firstly assigning a type to every file, device, network socket, etc. Then every process has a domain, and the level of access permitted to a type is determined by the domain of the process that is attempting the access (in addition to the usual Unix permission checks). Domains may only be changed at process execution time. The domain may automatically be changed when a process is executed based on the type of the executable program file and the domain of the process that is executing it, or a privileged process may specify the new domain for the child process.

In addition to the use of domains and types for access control SE Linux tracks the identity of the user (which will be system_u for processes that are part of the operating system or the Unix user-name) and the role. Each identity will have a list of roles that it is permitted to assume, and each role will have a list of domains that it may use. This gives a high level of control over the actions of a user which is tracked through the system. When the user runs SUID or SGID programs the original identity will still be tracked and their privileges in the SE security scheme will not change. This is very different to the standard Unix permissions where after a SUID program runs another SUID program it’s impossible to determine who ran the original process. Also of note is the fact that operations that are denied by the security policy [2] have the identity of the process in question logged.

For a detailed description of how SE Linux works I recommend reading the paper Peter Loscocco presented at OLS in 2001 [1].

The difficulty is that this increase in functionality also involves an increase in complexity, and requires re-authenticating more often than on a regular Unix system (the SE Linux security policy requires that the user re-authenticate for change of role). Due to this most people who could benefit from SE Linux will find themselves unable to use it because of the difficulties of managing it. I plan to address this problem through packaging SE Linux for Debian.

The first issue is getting packages of software that is patched for support of the SE Linux system calls and logic. This includes modified programs for every method of login (/bin/login, sshd, and X login programs), modified cron to run cron jobs in the correct security context, modified ps to display the security context, modified logrotate to keep the correct context on log files, as well as many other modified utilities.

The next issue is to configure the system such that when a package of software is installed the correct security contexts will be automatically applied to all files.

The most difficult problem is ensuring that configuration scripts get run in the correct security context when installing and upgrading packages.

The final problem is managing the configuration files for the security policy.

Once these problems are solved there is still the issue of the SE Linux sample policy being far from the complete policy that is needed in a real network. I estimate that at least 500 new security policy files will need to be written before the sample policy is complete enough that most people can just select the parts that they need for a working system.

Patching the Packages

The task of the login program is to authenticate the user, chown the tty device to the correct UID, and change to the appropriate UID/GID before executing the user’s shell. The SE patched version of the login program performs the same tasks, but in addition changes the security identifier (SID) on the terminal device with the chsid system call and then uses the execve_secure system call instead of the execve system call to change the SID of the child process. The login program also gives the user a choice of which of their authorised roles they will assume at login time.

This is not very different from the regular functionality of the login program and does not require a significant patch.

Typically this adds less than 9K to the object size of the login program, so hopefully soon many of the login programs will have the SE code always compiled in. For the rest we just need a set of packages containing the SE versions of the same programs. So this issue is not a difficult one to solve and most of the work needed to solve it has been done.

A similar patch needs to be applied to many other programs which perform similar operations. One example is cron which needs to be modified so cron jobs will be run in the correct security context. Another example is the suexec program from Apache. An example of a similar program for which no-one has yet written a patch is procmail.

Programs which copy files also need to have suitable options for preserving SIDs, logrotate and the fileutils package (which includes cp) have such patches, cpio lacks such a patch, and there is a patch for tar but it doesn’t apply to recent versions and probably needs to be re-written.

Setting the Correct SID When Installing Files

When a package of software is installed the final part of the installation is running a postinst script which in the case of a daemon will usually start the daemon in question. However if the files in the package do not have the correct SIDs then the daemon may not be able to run, or will be unable to run correctly!

The Debian packaging system does not currently have any support for running a script after the files of a package are installed but before the postinst script. There have been discussions for a few years on how best to do this, as I didn’t have time to properly re-write dpkg I instead did a quick hack to make it run scripts that it finds in /etc/dpkg/postinst.d/ before running the postinst of the package.

When installing an SE Linux system the program setfiles is used to apply the correct SIDs to all files in the system. I have written a patch to make it instead take a list of canonical fully-qualified file names on standard input if run with the -s switch, which is now included in the NSA source release.

The combination of the dpkg patch and the setfiles patch allow me to solve the basic problem of getting the correct SIDs applied to files, my script just queries the package management system for a list of files contained in the package and pipes it through to setfiles to set the SID on each file.

The next complication is setting the correct SID for the setfiles program, by default it gets installed with the security type sbin_t because that is the type of the directory it is installed in. However in my default policy setup I have not given the dpkg_t domain (which is used by the dpkg program when it is run administratively) the privilege of changing the SID of files. So the setfiles program needs to have the type setfiles_exec_t to trigger an automatic domain transition to the setfiles_t domain.

To solve this issue I have the preinst script (the script that is run before the package is installed) of the selinux package rename the /usr/sbin/setfiles to /usr/sbin/setfiles.old on an upgrade. Then the /etc/dpkg/postinst.d/selinux script will run the old version if it exists.

Here’s the relevant section of the selinux.preinst file:

if [ ! -f /usr/sbin/setfiles.old -a \
    -f /usr/sbin/setfiles ]; then
  mv /usr/sbin/setfiles /usr/sbin/setfiles.old
fi

Here’s the contents of /etc/dpkg/postinst.d/selinux. The first parameter to the script is the name of the package that is being installed. Also I have “grep …” included because setfiles currently has some problems with blank lines and /. which dpkg produces.

#!/bin/sh
make -s -C /etc/selinux \
  file_contexts/file_contexts
SETFILES=/usr/sbin/setfiles
if [ -x /usr/sbin/setfiles.old ]; then
  SETFILES=/usr/sbin/setfiles.old
fi
dpkg -L $1 | grep ^/.. | $SETFILES -s \
    /etc/selinux/file_contexts/file_contexts
if [ -x /usr/sbin/setfiles.old \
    -a “$1” = “selinux” ]; then
  rm /usr/sbin/setfiles.old
fi

Running Configuration Scripts in the Correct Context

When a SE Linux system boots the process init is started in the domain init_t. When it runs the daemon start scripts it uses the scripts /etc/init.d/rc and /etc/init.d/rcS on a Debian system (on Red Hat it is /etc/rc.d/rc and /etc/rc.d/rc.sysinit). So these scripts are given the type initrc_exec_t and there is a rule domain_auto_trans(init_t, initrc_exec_t, initrc_t) which causes a transition to the initrc_t domain. The security policy for each daemon will have a rule causing a domain transition from the initrc_t domain to the daemon domain upon execution of the daemon. This all happens as the system_u identity and the system_r role.

When the system administrator wants to start a script manually they use the program run_init which can only be run from the sysadm_t domain, it re-authenticates the administrator (to avoid the possibility of it being called by some malicious code that the administrator accidentally runs) before running the specified script as system_u:system_r:initrc_t.

This works fine when the daemon start script is quite simple (most such start scripts just check whether the daemon is already running and then run it with appropriate parameters). However this doesn’t work for complex scripts, which may copy files, change sysctl entries via /proc, and do many other things. An example of this is the devfsd package where the start script creates device nodes for device drivers that lack kernel support for devfs. Getting this to work correctly required that the code for device node creation be split into a separate file with the same SID as the main daemon (devfsd_exec_t) which causes it to run in the same domain as the daemon (devfsd_t). Such changes will probably have to be made to about 5% of daemon start scripts.

But that is part of the standard proceedure of correctly setting up SE Linux. The package specific part comes when the scripts have to be started from the package installation. To get the correct domain (initrc_t) for the scripts I use the rule domain_auto_trans(dpkg_t, etc_t, initrc_t) which causes the dpkg_t domain to transition to the initrc_t domain when a script of type etc_t is executed. Now the hard part is getting the identity and the role correct when running dpkg. For this purpose I have written a customised version of run_init to change to the context to system_u:system_r:dpkg_t, system_u:system_r:apt_t, or system_u:system_r:dselect_t, for the programs dpkg, dselect, and apt-get respectively.

The apt_t and dselect_t domains are only used for selecting and downloading packages, and then executing dpkg, which triggers an automatic transition to the dpkg_t domain.

Managing the Configuration Files

For normal configuration files in Debian (almost every file under /etc and some files in other locations) the file is registered as a conffile in the packaging system, and the package status file contains the MD5 checksum of the file. If a file is changed from it’s original contents (according to an MD5 check) at the time the package is upgraded and if the new version has a different set of data for the file than that which was provided by the old version of the package (according to MD5) then the user will be asked if they want to replace the old file (with a default of no). However if the new version of the package contains different content and the old content was not changed, then the user will get the new content without even being informed of the fact!

This is OK for many files, but the idea of a file from your audited security configuration being replaced with one you’ve never seen is not a pleasant one! This is only the first problem with managing policy files, the next problem is the size of the database for the sample policy. If you are using an initial RAM disk (initrd) then you must have the policy database on the initrd. The default initrd size of 4 megabytes is not large enough to accomodate the usual modules and the complete sample policy.

So what we need to solve this is a way of having a set of sample policy files (one per domain), of which not all will be used, and when new policy files are added or existing files are changed the user must be prompted as to whether they want to add the new files or apply the changes. Also when adding new policy the matching entries have to be added to the database used by setfiles for setting the file context.

In the latest versions of the sample policy the Makefile creates a configuration file for setfiles to match the program configuration files used. For every application policy file domains/program/%.te the matching file file_contexts/program/%.fc will be used as part of the configuration. This change will solve the issue of determining the configuration for setfiles, but it doesn’t entirely solve the problem. One issue with this is that when a file is added to or removed from the configuration the appropriate changes need to be made to the file system. If you make an addition to the policy before installing a new package (the correct proceedure) then you can usually get away without this as long as none of the files or directories previously existed, however this is not always the case, especially when files are diverted or when dealing with standard directories such as /var/spool/mail which will exist even if you have not installed any software to use them! It should not be that difficult to write a program to relabel the files matching the specifications of the added policy, the question is whether policy additions are common enough to make it worth saving the effort of a relabel. Also there’s the risk that a bug in such a program (or it’s use) could potentially cause a security hole.

The security policy is comprised of one configuration file per application (or class of application, some domains such as the DHCP client domain dhcpd_t are used by multiple programs which perform similar functions). Also sometimes an application requires multiple domains which will therefore be defined in the one file, for example my current policy for Postfix has eleven domains (which is excessive, I plan to reduce it to three or four once I’ve determined exactly what is required). One problem I faced with this is the issue of what to do when one domain needs to interact with another domain, for example the pppd process often needs to run sendmail -q to flush the mail queue when it establishes a connection. This requires the policy statement domain_auto_trans(pppd_t, sendmail_exec_t, sysadm_mail_t), previously such a statement would be put in either the sendmail.te file or the pppd.te file, thus making one of them depend on the other. This is a bad idea because there’s no reason for either of these programs to depend on the other. The solution I devised is based on the M4 macro language (which was already used for simpler macro functionality in producing the policy file). I created a script to define a macro with the name of each application policy file that is used. So the solution to the PPP and Sendmail problem is to put the following in the pppd.te file:


ifdef(`sendmail.te’,
`domain_auto_trans(pppd_t, sendmail_exec_t
                 , sysadm_mail_t)’)

The next problem, is how to effectively manage things so that when I ship a new and improved sample policy the administrator can update it without excessive pain.

The current method involves running diff -ru and then copying files if you like the changes. This is excessively painful even when managing one or two SE Linux machines! So it obviously won’t scale to serious production. I plan to write a Perl script to manage this, the first thing it has to do is track when the administrator doesn’t want a policy file. When a file is removed then the fact that the user has chosen not to have that file installed should be recorded, and they should not be prompted to re-install it on the next upgrade. However if the sample policy is upgraded and a new file has been added then they should be asked if they want to install it. Then when a file in the sample policy changes and it is a file that is installed the user should be asked if they want the new file copied over their existing file (and they should be provided with a diff to show what the changes would be). Finally if such changes involve the file configuration for setfiles then the user should be asked whether they want to relabel the system.

The people who are working on Red Hat packaging are considering other ways of managing the versions of configuration files, one of which involves having symbolic links pointing to the files to be used, if you decide to use your own version instead of one of the supplied policy files then you can change the sym-link.

Managing Device Nodes

In Linux there are two methods of managing device nodes. One is the traditional method of having /dev be a regular directory on the root file system and have device nodes created on it with mknod, the other is the devfs file system which allows the kernel to automatically create device nodes while the devfsd process automatically assigns the correct UID, GID, and permissions to them.

On a traditional (non-devfs) system running SE Linux the device nodes will be labelled in the same way as any other file. On a devfs system things are different, the devfs policy database contains rules for labelling device nodes. However this has some limitations, one being that when the policy database does not have an entry for the device node at the time it is created, then it will never be labelled. Another is that every type listed in the devfs configuration rules must be defined, which can cause needless dependencies.

To address these issues I wrote a module for devfsd which adds support for SE Linux. This allows you to change the mapping of SIDs to device nodes and re-apply it at any time, and if a security context listed in the configuration file does not exist in the policy then an error will be logged and the system will continue working.

This is especially useful for the case of an initrd as the types for all the possible device nodes won’t need to be in the ram disk.

Work To Be Done

Initial RAM Disk

When using an initrd to boot a modular kernel the security policy database must be stored on the initrd. The problem is that the default initrd size is 4M, which does not leave much space when libc6 is included, often not enough for the policy you want. Also even if the policy does fit you won’t really want to have such a large initrd image. If you are installing SE Linux on a single PC, or even on a network of similar PCs then you are best advised to build a kernel with all modules needed for booting statically linked and not use an initrd. However this is not possible for a distribution vendor who has to support a huge variety of hardware.

Another problem with using an initrd for storing the policy is that when you generate a new policy you then have to regenerate the initrd to avoid having your changes disappear on the next boot, of course a boot script could easily load the updated policy from the root file system before going to multi-user mode. But it is wasteful to have a large policy on the initrd that you then discard before ever using much of it.

The solution is to have a small policy that contains all the settings needed for either the first stage of boot, or alternately for running recovery tools in case a failure prevents the machine from entering multi-user mode. Then after the machine has passed the first stages of the boot process a complete policy can be loaded from the root file system, as long as the two policies don’t conflict in any major way this should work well. NB A Major policy conflict is a situation where the initrd defines domains that aren’t defined in the new policy and processes are executed in such a domain.

The latest release of SE Linux supports automatically re-loading the policy when the real root file system is mounted. Now all that needs to be done is for someone to write a mini-policy to install on the initrd.

Polishing run_init

Stephen Smalley has suggested that we develop a run_init program that incorporates the functionality of my modified program as well as of the original run_init program in a more generic fashion. It is apparent that other people will have similar needs for programs to execute programs under a different domain, role, and maybe identity. It is better that one program do this than to have many people writing programs for such things.

Also currently my program is hard-coded for the names of the Debian administration programs. An improved program should handle the needs of Debian, RPM, and the regular run_init functionality.

Writing Sample Policy Files

Currently any serious system will require policy files that are not in the sample policy. This forces everyone who uses SE Linux to start by writing policy files (which is the most difficult and time consuming task involved with the project). Currently we are writing new sample policy files for the variety of daemons and applications, and developing new macros for writing policy files quickly. With the new macros policy files are on average half the size that they used to be (and I aim to reduce the size again by new macros). The macros allow short policy files which are easy to understand, and therefore the user can easily determine how to make any required changes, or how to write a policy file for a new program based on existing programs.

Obtaining the Source

Currently most of my packages and source are available at http://www.coker.com.au/selinux/ however I plan to eventually get them all into Debian at which time I may remove that site.

I have several packages in the unstable distribution of Debian, the first is the kernel-patch-2.4-lsm and kernel-patch-2.5-lsm packages which supply the Linux Security Modules http://lsm.immunix.org/ kernel patch. That patch includes SE Linux as well as LIDS and some of the OpenWall functionality. When I have time I back-port patches to older kernels and include new patches that the NSA has not officially released, so often my patches will provide more features than the official patches distributed by the NSA from http://www.nsa.gov/selinux/index.html or the patches distributed by Immunix. However if you want the official patches then these packages may not be what you desire.

From the selinux-small archive I create the packages selinux and libselinux-dev which are also in the unstable distribution of Debian.

Acknowledgments

I would like to thank Stephen Smalley for being so helpful when I was learning about SE Linux, and Dr. Brian May for checking my early packages and giving me some good advice when I first started.

Also thanks to Dr. May, Stephen Smalley, and Peter Loscocco for reviewing this paper.

Bibliography

Running the Net After a Collapse

I’ve been thinking about what we need in Australia to preserve the free software community in the face of an economic collapse (let’s not pretend that the US can collapse without taking Australia down too). For current practices of using the Internet and developing free software to continue it seems that we need the following essential things:

  1. Decentralised local net access.
  2. Reliable electricity.
  3. Viable DNS in the face of transient outages.
  4. Reasonably efficient email distribution (any message should be delivered in 48 hours).
  5. Distribution of bulk binary data (CD and DVD images of distributions). This includes providing access at short notice.
  6. Mailing lists to provide assistance which are available with rapid turn-around (measured in minutes).
  7. Good access to development resources (test machines).
  8. Reliable and fast access to Wikipedia data.
  9. A secure infrastructure.

To solve these I think we need the following:

  1. For decentralised local net access wireless is the only option at this time. Currently in Australia only big companies such as Telstra can legally install wires that cross property boundaries. While it’s not uncommon for someone to put an Ethernet cable over their fence to play network games with their neighbors or to share net access, this can’t work across roads and is quite difficult when there are intervening houses of people who aren’t interested. When the collapse happens the laws restricting cabling will all be ignored. But it seems that a wireless backbone is necessary to then permit Ethernet in small local areas. There is a free wireless project wireless.org.au to promote and support such developments.
  2. Reliable electricity is only needed for server systems and backbone routers. The current model of having huge numbers of home PCs running 24*7 is not going to last. There are currently a variety of government incentives for getting solar power at home. The cheapest way of doing this is for a grid-connected system which feeds excess capacity into the grid – the down-side of this is that when grid power goes off it shuts down entirely to avoid the risk of injuring an employee of the electricity company. But once solar panels are installed it should not be difficult to convert them to off-grid operation for server rooms (which will be based in private homes). It will be good if home-based wind power generation becomes viable before the crunch comes, server locations powered by wind and solar power will need smaller batteries to operate overnight.
    In the third-world it’s not uncommon to transport electricity by carrying around car batteries. I wonder whether something similar could be done in a post-collapse first-world country with UPS batteries.
    Buying UPSs for all machines is a good thing to do now, when the crunch comes such things will be quite expensive. Also buying energy-efficient machines is a good idea.
  3. DNS is the most important service on the net. The current practice is for client machines to use a DNS cache at their ISP. For operation with severe and ongoing unreliability in the net we would need a hierarchy of caching DNS servers to increase the probability of getting a good response to a DNS request even if the client can’t ping the DNS server in question. One requirement would be the use of DNS cache programs which store their data on disk (so that a transient power outage on a machine which has no UPS won’t wipe out the cache), one such DNS cache is pdnsd [1] (I recall that there were others but I can’t find them at the moment). Even if pdnsd is not the ideal product for such tasks it’s a good proof of concept and a replacement could be written quickly enough.
  4. For reasonably efficient email distribution we would need at a minimum a distribution of secondary MX records. If reliable connections end to end aren’t possible then outbound smart-hosts serving geographic regions could connect to secondary MX servers in the recipient region (similar to the way most mail servers in Melbourne used to use a university server as a secondary MX until the sys-admin of that server got fed up with it and started bouncing such mail). Of course this would break some anti-spam measures and force other anti-spam measures to be more local in scope. But as most spam is financed from the US it seems likely that a reduction in spam will be one positive result of an economic crash in the US.
    It seems likely that UUCP [2] will once again be used for a significant volume of email traffic. It is a good reliable way of tunneling email over multiple hops between hosts which have no direct connectivity.
    Another thing that needs to be done to alleviate the email problem is to have local lists which broadcast traffic from major international lists, this used to be a common practice in the early 90’s but when bandwidth increased and the level of clue on the net decreased the list managers wanted more direct control to allow them to remove bad addresses more easily.
  5. Distributing bulk data requires local mirrors. Mirroring the CD and DVD images of major distributions is easy enough. Mirroring other large data (such as the talks from www.ted.com) will be more difficult and require more manual intervention. Fortunately 750G disks are quite cheap nowadays and we can only expect disks to become larger and cheaper in the near future. Using large USB storage devices to swap data at LUG meetings is a viable option (we used to swap data on floppy disks many years ago). Transferring CD images over wireless links is possible, but not desirable.
  6. Local LUG mailing lists are a very important support channel, the quality and quantity of local mailing lists is not as high as it might be due to the competition with international lists. But if a post to an international list could be expected to take five days to get a response then there would be a much greater desire to use local lists. Setting up local list servers is not overly difficult.
  7. Access to test machines is best provided by shared servers. Currently most people who do any serious free software development have collections of test machines. But restrictions in the electricity supply will make this difficult. Fortunately virtualisation technologies are advancing well, much of my testing could be done with a few DomU’s on someone else’s server (I already do most of my testing with local DomU’s which has allowed me to significantly decrease the amount of electricity I use).
    Another challenge with test machines is getting the data. At the moment if I want to test my software on a different distribution it’s quite easy for me to use my cable link to download a DVD image. But when using a wireless link this isn’t going to work well. Using ssh to connect to a server that someone else runs over a wireless link would be a much more viable option than trying to download the distribution DVD image over wireless!
  8. Wikipedia.org is such an important resource that access to it provides a significant difference in the ability to perform many tasks. Fortunately they offer CD images of Wikipedia which can be downloaded and shared [3].
  9. I believe that it will be more important to have secure computers after the crunch, because there will be less capacity for overhead. Currently when extra hardware is required due to DOS attacks and systems need to be reinstalled we have the resources to do so. Due to the impending lack of resources we need to make things as reliable as possible so that they don’t need to be fixed as often, this requires making computers more secure.

Partitioning a Server with NSA SE Linux

Notes

I presented this paper at Linux Kongress 2002. Since that time virtualisation systems based around VMWare, Xen, and the hardware virtualisation in recent AMD and Intel CPUs has really taken off. The wide range of virtualisation options makes this paper mostly obsolete, and what isn’t obsoleted by that is obsoleted by new developments in SE Linux policy which change the way things work. One thing that’s particularly worth noting is the range of network access controls that are now available in SE Linux.

This is mostly a historical document now.

http://lsm.immunix.org/ is defunct, since about 2004.

Partitioning a Server with NSA SE Linux

Russell Coker <russell@coker.com.au>
http://www.coker.com.au/

Abstract

The requirement to purchase multiple machines is often driven by the need to have multiple administrators with root access who do not trust each other.

Having large numbers of expensive under-utilised servers with the associated management costs is not ideal.

I will describe my solution to this problem using SE Linux [1] to partition a server such that the “root” users can’t access each other’s files, kill each other’s processes, change passwords for each other’s users, etc.

DOS attacks will still be possible by excessive use of memory and CPU time, but apart from that all the benefits of separate hardware will be provided.

Introduction

SE Linux dramatically improves the security of a Linux system by adding another layer of security in addition to the default Unix permissions model. This is accomplished by firstly assigning a type to every file, device, network socket, etc. Then every process has a domain and the level of access permitted to a type is determined by the domain of the process that is attempting the access (in addition to the usual Unix permission checks). Domains may only be changed at process execution time. The domain may automatically be changed when a process is executed based on the type of the executable program file and the domain of the process that is executing it, or a privileged process may specify the new domain for the child process.

In addition to the use of domains and types for access control SE Linux tracks the identity of the user (which will be system_u for processes that are part of the operating system or the Unix username) and the role. Each identity will have a list of roles that it is permitted to assume, and each role will have a list of domains that it may use. This gives a high level of control over the actions of a user which is tracked through the system. However in this paper I am not using any of the identity or role features of SE Linux (they merely give an extra layer of security on top of what I am researching). So I will not mention them again.

For a detailed description of how SE Linux works I recommend reading the paper Peter Loscocco presented at OLS in 2001[1]. For the details of SE Linux policy configuration I recommend Stephen Smalley’s paper[2].

The problem of dividing a hardware resource that is expensive to purchase and manage among many users is an ongoing issue for system administrators. In an ISP hosting environment or a software development environment a common method is to use chroot environments to partition a server into different virtual environments. I have devised some security policies and security server programs to implement a chroot environment on SE Linux with advanced security offering the following features:

  1. An unpriviledged user (chroot administrator) can setup their own chroot environment, create accounts, run sshd, let ordinary users login via ssh, and do normal root things to them without needing any help from the administrator.
  2. Processes outside the chroot environment can see the processes in the chroot, kill them, strace them, etc. Also the user can change the security labels of files in their chroot to determine which of the files are writable by a process in the chroot environment (this can be done from outside the chroot environment or from a priviledged process within the chroot environment).
  3. Processes inside the chroot can’t see any system processes, or any of the user’s processes that are outside the chroot. They can see which PID’s are in use, but that doesn’t allow them to see any information on the processes in question or molest such processes. The chroot processes won’t be allowed to read any files or directories labelled with the type of the user’s home directory. This is so that if the wrong files are accidentally moved into the chroot environment then they won’t be read by chroot processes.
  4. The administrator can set a user’s account for multiple independant chroot environments. In such a case processes inside one chroot can’t interfere with processes in other in any way, and all their files will be in different types. This prohibits a nested chroot, but I think that there’s no good cause for nested chroot’s anyway. If someone has a real need for nested chroot’s they could always build them on top of my work – it would not be particularly difficult.
  5. The user will have the option of running a priviledged process inside their chroot which can do anything to files or processes in the chroot (even files that are read-only to regular chroot processes) but which can’t do anything outside the chroot. Also if this process runs a program that is writable by a regular chroot process then it runs it in the regular chroot process domain. This is to prevent a hostile user inside the chroot from attacking the chroot administrator after tricking them into running a compromised binary.

A basic chroot environment has several limitations, the most important of which is that a process in the chroot environment can kill any process with the same UID (or in the case of a root process any process with any UID) outside the chroot environment. The second major limitation is that root access is needed to call the chroot() system call to start a chroot environment and this gives access to almost everything else that you might want to restrict. There are a number of other limitations of chroot environments, but they are all trivial compared to these.

One method of addressing this issue is that of GR security [3]. GR Security locks down a chroot environment tightly, preventing fchdir(), mount(), double-chrooting, pivot_root(), and access to non-chroot processes (or processes in a different chroot). It also has the benefit that no extra application configuration is required. However it has the limitation that you have very limited ability to configure the capabilities of the chroot, and it has no solution to the problem of requiring root access to setup the chroot environment.

Another possible solution to the problem is that of BSD Jails [4]. A jail is a chroot environment that prevents escape and also prevents processes in the jail from interfering with processes outside the jail. It is designed for running network servers and allows confining the processes in a particular jail to a single IP address (a very handy feature that is not available in SE Linux at this time and which I have to emulate in a shared object). Also the jail authors are working on a jailinit program which is similar to init but for jails (presumably the main difference is that it doesn’t expect to have PID==1), this program should work well for chroot environments in SE Linux too.

Another possible solution is to use User-Mode Linux [5]. UML is a port of the Linux kernel to the Linux system call interface. So it can run a kernel as a user (UID!=0) application using regular files on the file system for storage. One problem with UML is that it is based around file system images on disk, so to access them without the UML kernel running you have to loopback mount them which is inconvenient. Also starting or stopping a UML image is equivalent to booting or shutting down a server (which is inconvenient if you just want to run a simple command like last). A final problem with UML for hosting is that using file systems for storage is inefficient, if you have 100 users who each might use 1G of storage (but who use on average 50M) you need 100G of disk space. With chroot based systems you would only need 5G of disk space. Finally backing up a file system image from a live UML setup will give a corrupted file system, backing up files from a chroot is quite safe.

Isolation of the Chroot

The first aim is to have the chroot environment be as isolated as possible from other chroot’s, from the chroot administrator, and from the rest of the system. This is quite easy as SE Linux defaults to denying access (apart from the system administration domain sysadm_t which has full access to see and kill processes, read files, etc).

For example pivot_root, is denied by not giving the sys_admin capability. Access to other processes is denied by not permitting access to read from the /proc directories, send signals, ptrace, etc. fchdir() is prevented because processes in the chroot don’t get access to files or directories outside the chroot because of the type labels, so it can only fchdir() back into the chroot! Mount and chroot accesses are also not granted to the chroot environment, nor is access to configure network interfaces.

The next aim is to make a chroot environment more secure than a non-SE system in full control of the machine. The first step to this is having critical resources such as hard drive block devices well out of reach of the chroot environment. With a default SE Linux security policy (which my work is based on) that is already achieved, a regular root process that is not chrooted will be unable to perform such access. This isolation of block devices, the boot manager, priviledged system processes (init, named, automount, and the mail server) from the rest of the system already makes a chroot environment on SE Linux more secure than that on a system without SE Linux, most (hopefully all) attacks against such critical parts of the system that would work on other systems will be defeated by SE Linux.

One problem that I still have not solved to my satisfaction is that of not requiring super-user access to initially create the chroot environment. I have developed policy modifications to allow someone to login as root (UID==0) in a non-chroot environment and not have access to cause any damage. So for my current testing I have started all chroot environments from a root login in a user domain as the chroot and mount operations require root privileges. For production use you would probably not want to give a user-domain root account to a user, so I am writing a SUID root program for running both mount and chroot to start a chroot environment and to run umount to end such an environment (after killing all the processes). It will take a configuration file listing the target directory, the bind mounts to setup, and the programs to run. One issue that I initially had with this was to make sure that it only runs on SE Linux (if SE Linux was removed but the application remained installed then you would grant everyone the ability to run programs as root in an unrestricted fashion through chrooting to the root directory). My current method of solving this is to execute /bin/false at the start of the program, if SE Linux blocks that execution then it indicates that SE Linux is in enforcing mode (otherwise the program ends). Also it directly checks for the presence of SE Linux (so if the administrator removes /bin/false then it won’t give a false positive). However this still leaves the corner case where an administrator puts the machine in permissive mode and removes /bin/false To avoid this the program will then try reading /etc/passwd which will always be readable on a functional Unix machine unless you have SE Linux (or some similar mechanism) in place. Before running chroot the wrapper will change directory to the chroot target and run “chroot .”. My policy prohibits the chroot program from accessing any directories outside the chroot environment to prevent it from chrooting to the wrong directory, thus an absolute path will not work. Chroot administrators would be confused by this, so the wrapper hides it from them.

I believe that this method will work, but I have not tested it yet.

Domains

It would be possible to create a SE Linux policy to allow a chroot to have all the different domains that the full environment has (IE a separate security domain for each daemon). However this would be a huge amount of work for me to write the policy and maintain it as new daemons become available and as new versions of daemons are released with different functionality. It would result in a huge policy (number of daemons multiplied by number of chroot environments could result in a large number of domains) which is currently held in non-pageable kernel memory. In a future version of SE Linux they plan to make it pageable, but the fact that a large policy has a performance impact will remain. Also the typical chroot administrator does not want the bother of working with this level of complexity. Finally the limited scope of a chroot environment (no dhcp client, no need to run fsck or administer RAID devices, etc) reduces the number of interactions that can have a security impact, and as a general rule there is a security benefit in simplicity as mistakes are less likely.

In my current policy I have a single domain for the main programs in the chroot environment. This domain has write access to files under /home, /var, /tmp, and anywhere else that the chroot administrator desires (they can change the type of files and directories at any time to determine where programs can write, the writable locations of /home, /var, and /tmp are merely default values that I recommend – they are not requirements). Typically the entire chroot would default to being writable when it is installed and the chroot administrator would be encouraged to change that according to their own security needs. In the case of a chroot setup with chroot(etbe, etbe_apache) the main domain will be called etbe_apache_t.

I have been considered having a separate domain for user processes inside the chroot so that they can only write to files under /home (for example) and not /var (which would only be writable by system processes). To implement this would require either an automatic domain transition rule to transform the domain when running a user shell (as opposed to a shell script run by a system program), or a modified sshd, ftpd, etc. Requiring that chroot administrators use modified versions of standard programs is simply not practical, most users would be unable to manage correct installation and would require excessive help from the system administrator, also it might require significant programming to produce modified versions of some login programs. This leaves the option of automatic transitions. Doing this would require that a chsh inside the chroot not allow changing the shell to /bin/sh or any other shell that would be used by system shell scripts, and that all shells listed in /etc/shells be labelled as user shells. This requires a significant configuration difference between the chroot setup and that of standard configurations, that is difficult to maintain because of distribution packages and graphical system administration tools that would tend to change them back.

I conclude that having a separate domain for user processes is not feasible.

The next issue is administration of the chroot. Having certain directories and files be read-only is great for security but a pain when it is time to upgrade! This is a common problem for administrators who use a read-only mount for their chroot environment. To solve this issue I have created a domain which has write privileges for all files in the chroot, for the example above the name of this domain would be etbe_apache_super_t. This domain also has control over all processes in the chroot (ability to see them, kill them, and ptrace them), while the processes in the chroot can not even determine the existance of such super processes. If this process executes a file that is writable by the main chroot domain then it will transition to the main chroot domain, so that if a trojan is run it will not be able to damage the read-only files. This protection is not perfect however, using the source command or “/bin/sh < script” to execute a shell script will result in it being run in the super domain. However I think that this is a small problem, tricking an administrator into redirecting input for a shell from a hostile script or using the source command is very difficult (but not impossible). However it is quite easy for an administrator to mistakenly give the wrong label to files and allow them to be written by the wrong people (I made exactly this mistake when I first set it up), so preventing the execution of binaries that may have been compromised is a good security measure.

Note, that this method of managing read-only files does not require that applications running in the chroot environment be stopped for an upgrade, unlike most other methods of denying write access to files that are in use.

To have a working chroot environment you need a number of device files to be present, pseudo-tty files (for logins and expect), /dev/random (for sshd), and others. The chroot administrator can not be permitted to create their own device nodes as this would allow them to create /dev/hda and change system data! So I have created a special mount domain which in this example would be named etbe_mount_t based on Brian May’s mount policy to allow bind mounts of these device nodes. This does have one potential drawback, if you are denying a domain access to device nodes solely by denying search access to /dev then bind mounts could be used to allow access in contravention of security policy! I could imagine a situation where someone would want to allow a domain to access the pseudo-tty it inherits from it’s parent but not open other pty’s of the same type, and implementing this by denying access to the /dev/pts. This is not something that I would recommend however. I believe that this is the best solution, as not having a user mount domain would require that either the system administrator create bind mounts (a process that has risks regarding sym-links etc), create special device nodes (having two different device nodes for the same device with different types is a security issue), or otherwise be involved in the setup of the chroot in a fashion that involves work and security risks.

In summary, if you have the bad idea of restricting access to the /dev directory to prevent access to device nodes then things will break for you, however there are many other ways of breaking such things so I think that the net result is that security will not be weakened.

To actually enter the chroot environment you need to execute the chroot() system call. To allow that I created a domain for chroot, which in this example will be called etbe_chroot_t, this domain is entered when the user domain etbe_t executes the chroot program. Then when this domain executes a file of type etbe_apache_ro_t or etbe_apache_rw_t it will transition to domain etbe_apache_t, and when it executes a file of type etbe_apache_super_entry_t it will transition to domain etbe_apache_super_t.

Types

The primary type used for labelling files and directories is for read/write files, in the case of a chroot setup by chroot(etbe, etbe_apache) the type will be etbe_apache_rw_t. It can be read and written by etbe_apache_super_t and etbe_apache_t domains, and read by the etbe_chroot_t and etbe_mount_t domains. It can also be read and written by the user domain etbe_t.

The type etbe_apache_ro_t is the same but can only be written by the user domain and etbe_apache_super_t.

To enter as domain etbe_apache_super_t I have defined a type etbe_apache_super_entry_t. The aim of this is to allow easy entry of the administration domain by a script, otherwise I might have chosen to have a wrapper program to enter the domain which prompts the user for which domain they want to enter. The wrapper program idea would have the advantage of making it easier for novice chroot administrators, and I may eventually implement that too so that users get a choice.

One problem I found when I initially setup a chroot was that when installing new Debian packages the post installation scripts (running in the etbe_apache_super_t domain) started a daemon (such as sshd) it would also start as etbe_apache_super_t, and then a user could login with that domain! To solve this problem I created a new type etbe_apache_dropdown_t, when the etbe_apache_super_t executes a program of that type it transitions to etbe_apache_t, so labelling the /etc/init.d directory (and all files it contains) with this type causes the daemons to be executed in the correct domain. The write access for this type is the same as that for etbe_apache_ro_t.

Configuring the Policy

I have based my policy for chroot environments around a single macro that takes two parameters, the name of a user domain, and the name of a chroot. For example if I have a etbe_t domain and I want to create a chroot environment for apache then I could call chroot(etbe, etbe_apache) to create the chroot.

This makes it convenient to setup for a basic chroot as all the most likely operations that don’t comprise a security risk are allowed. Naturally if you have different aims then you will sometimes need to write some more policy. One of my machines runs a chroot environment for the purpose of running Apache, it required an extra fourteen lines of SE policy configuration to allow the Apache log files to be accessed by other system processes outside the chroot (a script that runs a web log analysis program in particular).

The typical chroot environment used for an Internet server will probably require between 5 and 20 lines of extra policy configuration and will take an experienced administrator less than 30 minutes to setup. Of course these could be added to custom macros allowing bulk creation with ease, setting up 500 different chroot environments for Apache should not take more than an hour!

Here is my policy for an Apache web server run in the system_r role by the system init scripts that has read-only access to all files under /home (which is –bind mounted inside the chroot), and which allows logrotate to run the web log analysis scripts as a cron job.


# setup the chroot
chroot(initrc, apache_php4)
# allow apache to change UID/GID and to bind to the port
allow apache_php4_t self:capability { setuid setgid net_bind_service };
allow apache_php4_t http_port_t:tcp_socket name_bind;
allow apache_php4_t tmpfs_t:file { read write };
# allow apache to search the /home/user directories
allow apache_php4_t user_home_dir_type:dir search;
# allow apache to read files and directories under the users home dir
r_dir_file(apache_php4_t, user_home_type);
# allow logrotate to enter this chroot (and any other chroot environments
# that are started from initrc_t)
domain_auto_trans(logrotate_t, chroot_exec_t, initrc_chroot_t)
# this chroot is located under a users home directory so logrotate needs to
# search the home directory (x access in directory permissions) to get to it
allow logrotate_t user_home_dir_t:dir search;
# allow logrotate to search through read-only directories (does not need read
# access) and read the directories and files that the chroot can write (the
# web logs). NB I do not need to restrict logrotate this much – but why give
# it more than it needs?
allow logrotate_t apache_php4_ro_t:dir search;
r_dir_file(logrotate_t, apache_php4_rw_t)
# allow a script in the chroot to write back to a pipe created by crond
allow initrc_chroot_t { crond_t system_crond_t }:fd use;
allow initrc_chroot_t { crond_t system_crond_t }:fifo_file { read write };
allow apache_php4_t { crond_t system_crond_t }:fd use;
allow apache_php4_t { crond_t system_crond_t }:fifo_file { read write };

That’s 14 lines because I expanded it to make the policy clearer. Otherwise I would probably compress it to 10 lines.

Networking

The final issue is networking, the BSD Jail facility has good support for limiting network access via a single IP address per jail.

SE Linux lacks support for controlling networking in this fashion, server ports can only be limited by the port number. This is OK if different chroot environments provide different services. Also this works if you have an intelligent router in front of the server to direct traffic destined for different IP addresses to different ports on the same IP address (in which case the different chroot environments can be given permissions for different ports).

I have an idea for another solution to this problem which is more invasive of the chroot environment. The idea is to write a shared object to be listed in /etc/ld.so.preload which will replace the bind() system call. This will then communicate over a Unix domain socket (which would be accessed through a bind mount) to a server process run with system privileges, and it will pass the file descriptor for the socket to it. The server process will then use the accept_secure() system call to determine the security context of the process that is attempting the bind, it will then examine some sort of configuration database and decide whether to allow the bind, or whether to modify it. If the bind parameters have to be modified (for example converting a bind to INADDR_ANY to be a bind to a particular IP address) then it would do so. Then it would do a bind() system call and return a success code to the socket that connects to the application.

This support would be ideal as it would be easiest to automate and would allow setting up hundreds or thousands of chroot environments at the same time with ease.

Unfortunately I couldn’t get this code fully written in time for the publishing deadline.

Conclusion

I believe that I have achieved all my aims regarding secure development environments or other situations where there is no need to run network servers. The policy provides good security and allows easy management.

My current design for entering a chroot environment should work via a SUID root program, but I will have to test it. The current method of allowing unprivileged root logins has been tested in the field and found to work reasonably well.

Currently the only issue I have not solved to my satisfaction is that of binding chroot environments to specific IP addresses. Currently the best option that I have devised involves pre-loading a shared object into all processes in the chroot environment (which will inconveniance the user). But I have not yet implemented this so I am not certain that it will work correctly.

Obtaining the Source

Currently most of my packages and source are available at http://www.coker.com.au/selinux/ however I plan to eventually get them all into Debian at which time I may remove that site.

I have several packages in the unstable distribution of Debian, the first is the kernel-patch-2.4-lsm and kernel-patch-2.5-lsm packages which supply the Linux Security Modules http://lsm.immunix.org/ kernel patch. That patch includes SE Linux as well as LIDS and some of the OpenWall functionality. When I have time I back-port patches to older kernels and include new patches that the NSA has not officially released, so often my patches will provide more features than the official patches distributed by the NSA from http://www.nsa.gov/selinux/ or the patches distributed by Immunix. However if you want the official patches then these packages may not be what you desire.

From the selinux-small archive I create the packages selinux and libselinux-dev which are also in the unstable distribution of Debian.

Acknowledgments

Thanks to Dr. Brian May for reviewing this paper and providing advice.

Bibliography

SE Linux Saves

Here are links to some instances when SE Linux prevented exploits from working or mitigated their damage:

Going Live with a Linux Server

Based on past mistakes by myself and others, here is a check-list before putting a Linux (or other Unix) server online:

  1. Run memtest86+ (or an equivalent program for other architectures) before going live, ideally run it before installing the OS. Run it again every time you upgrade the RAM.
  2. Reboot the machine after every significant change. EG if you install a new daemon then reboot it to make sure that the daemon starts correctly. It’s better to have 5 minutes of down-time for a scheduled reboot than a few hours of down-time after something goes wrong at 2AM.
  3. Make sure that every account that is used for cron jobs has it’s email directed somewhere that a human will see it. Make sure that root has it’s mail sent somewhere useful even if you don’t plan to have any root cron jobs.
  4. Make sure that ntpd is running and has at least two servers to look at. If you have a big site then run two NTP servers yourself and have each of them look to two servers in the outside world or one server and a GPS.
  5. Make sure that you have some sort of daily cron job doing basic log analysis. The Red Hat logwatch program is quite effective, then you need to have some way of making sure that you notice if an email stops being sent (getting 11 instead of 12 messages from logwatch in the morning won’t be noticed by most people).
  6. Make sure that when (not if) a hard drive in your RAID array dies then you will notice it.

Any suggestions on other things I can add?

Personal SEO

One problem many people encounter is the fact that they don’t appear on Google and other search engines the way that they like. If you have an uncommon name which is not referenced in any popular web pages then a single mention in a popular site can immediately become the top hit, this may not be the post you want (EG you send a dirty joke to some friends and it ends up on the archive of a popular mailing list). If you have a more common name then you may want to compete with other people who share the same name with similar problems. The solution to these problems is known as Search Engine Optimisation (or SEO).

  1. Recently to improve things in this regard I have registered the domain RussellCoker.com. Domains that end in .com are often regarded as authoritative sources of information on the topic in question, and are also often typed in to browsers. If you visit the RussellCoker.com domain then you’ll find that it’s quite bare, just links to some other things that I do. I will make it a little more user-friendly but the main aim is just having links.
  2. One thing that I have been doing for almost a decade is to include URLs of some of my web pages in my signature of email that I send. When I send messages to mailing lists with public archives (or when other people forward my mail to such lists) these links will become visible to search engines. So I now have thousands of links to my web site from hundreds of sites all over the world. Some of these links are old (which may make them more highly rated by search engines). This not only gives benefits for direct hits from search engines, but when people search for terms that are relevant to my work they will often hit archives of my mailing list postings, and then they will often see a benefit in visiting my web site.

Most strategies of SEO that apply to corporate web sites will also apply to individuals (do a web search and you’ll find many pages of advice on these issues). But the above two are the only ones I have discovered which apply specifically to individuals (registering an appropriate .com domain name if possible is a standard practice for companies and something that is done long before SEO is considered).

How to Report an Email Problem

In my work I often receive problem reports from clients regarding their email service, here is some advice to help get the problem fixed as fast as possible:

Firstly problems sending mail and problems receiving mail are often unconnected and should be reported separately. If only one aspect of a problem is reported then probably only one will be fixed… Also receiving mail via POP or IMAP is separate to the mail transfer process.

For problems on mail transfer the following points should be addressed in any report to increase the speed of problem resolution. Don’t think of this as being for the benefit of the person who fixes your mail server, think of it as being for your own benefit in getting your email working again rapidly.

  1. Make sure that every complaint has a specific example. Saying “email doesn’t work for everyone” is not helpful, often when investigating some complaints I discover that email works for many people. Saying “email doesn’t work for John Smith and other people report problems too” is helpful, I can investigate John’s problem while knowing that it’s a symptom of a larger problem.
  2. For every report make sure that you list a sender and recipient email address for an instance of the problem. Saying that “alpha@example.com can’t send mail to bravo@example.com” is much more useful than saying “alpha@example.com can’t send mail to some people“.
  3. Whenever possible make sure that the problem can be reproduced by the person who is making the report. A report such as “John Smith can’t send email from his PC” is not nearly as helpful as “I can’t send email from my PC“, the reason is that I often require more information from the person who experienced the problem and going through a third-party slows things down.
  4. Make a careful note of the time of the problem. If you report that alpha@example.com can’t send mail to bravo@example.com and the logs show that such a message was correctly delivered at 9AM I will assume that the problem was mistakenly reported and not that maybe there was a problem which first occurred at 9:30AM.
  5. When noting the time make sure you note any discrepancies. If a problem occurs at 9AM and a bounce message reports a problem occurring at 10AM then please make sure to note both times when reporting the problem. Either or both times could be referenced in the log files and the more information you provide the better the chance that the problem will be fixed quickly.
  6. Most serious errors in email delivery have a code numbered from 550 to 559 with an error message. Any time you see a bounce message which includes text such as “Remote host said: 554” then please report all the rest of the text on that line, it will often tell me precisely what needs to be done to fix the problem.
  7. Any time a bounce or other error message includes a URL then please report it to me. Sometimes it takes me a significant amount of work to discover what I could have learned in a matter of seconds if I had the URL.

Hotel Rooms vs Apartments

One option that is often overlooked is the possibility of staying in an apartment instead of a hotel room. The difference (which is not always reflected in the name) is that a hotel room usually will have a fridge that is mostly filled with over-priced drinks and has no facilities for cooking and often no facilities for cleaning clothes. A recent disappointing trend is of hotels that have a “bar fridge” that is entirely filled with things that you have to pay for which have sensors such that if any of them are moved then you have to pay. This prevents you from moving the rubbish out of the way for the duration of your visit so that you can store your sandwiches there.

An apartment will ideally have a fridge which is entirely empty for you to use for storing food and ingredients for your own cooking (it will have a minimally supplied kitchen). Apartments tend not to have a fry-pan in the kitchen, the reason for this is that they are legally required to have a smoke detector and false-alarms cause lots of problems for everyone (often it starts with an insanely loud alarm in the room which annoys everyone who is in a nearby room). You can expect to have two pots of different sizes and a toaster for your cooking.

If you are after cheap travel then the thing to do is to have cereal for breakfast and sandwiches for lunch. Finding cheap places to have lunch in a new city can take time, and finding cheap places to eat lunch near tourist attractions is almost impossible.

Some apartments have a washing machine and dryer in the room, but it’s quite common to have communal facilities for washing clothes. In any case there will be some option for washing clothes that doesn’t cost an excessive amount of money.

Another significant difference that is commonly seen between apartments and hotels is the layout (which is dictated to a large degree by the size). An apartment may have walls partially or fully separating it into separate rooms. One apartment I stayed in had two separate bedrooms (with doors), a kitchen area, and a lounge/dining area which was separated from the rest of the apartment. That made it possible to have business meetings in the lounge area while someone slept in one of the bedrooms. Another apartment I once stayed in had the bed area partially separated from the lounge area, the wall didn’t extend the full distance but covered enough that a business associate who walked into the lounge probably wouldn’t notice the bed area. There is a significant difference in principle between inviting a business associate to the lounge area of an apartment and inviting them to a hotel room…

Apartments often tend to be more expensive than hotel rooms, they need more space for the kitchen and don’t drive revenue from a mini-bar or a hotel restaurant. One possibility for saving money could be to use some inflatable camping beds to sleep four people in an apartment that is designed for two. They try and discourage this by only having enough cutlery for two people and not having spare blankets, towels, etc (I’m sure that they would bill you extra if they caught you). In contrast hotel rooms often have an over-supply of towels and blankets as it’s easier to just fill every closet with them than to deal with requests from customers at odd times of the night.

Public Transport in Melbourne

Tickets for Melbourne public transport can be purchased on any bus or tram. The trams have vending machines which only accept coins so you must have sufficient coins to pay for your ticket. Neither buses nor trams sell the cheaper tickets.

Tickets have to be validated every time you get on a bus or tram. The first time a ticket is validated it will have it’s expiry time and date printed on it.

For a Sunday there’s a Sunday Saver ticket which allows you to go everywhere (in both zone 1 and 2) for the entire day for less than the price of a regular daily zone 1 ticket. As of the 6th of January 2008 a Sunday saver ticket costs $2.90 while a ticket for 5 days of travel in zone 1 costs $28.00 (separate daily tickets cost more).

The best value for money is the 10 * 2 hour ticket (which costs the same as a 5 * daily ticket). I have been told that if you use a multiple 2 hour ticket a second time in one day outside the 2 hour interval then it will be valid for the rest of the day. This means that a 10 * 2 hour ticket will give the same result as a 5 * daily ticket if you use it that way. If your journeys tend to be shorter then it can last for more than 5 days. Also a 2 hour ticket that is validated after some time in the evening (I believe it’s 6PM) will be valid for the rest of the night (until 3AM or whenever the transport closes down).