Donate

Categories

Advert

Software vs Hardware RAID

It’s a commonly held myth that hardware RAID is unconditionally better than software RAID. That claim is not true in all cases and is particularly wrong at the low end.

Really Cheap Hardware RAID

The cheapest so-called hardware RAID uses RAID in the BIOS and relies on an OS driver for support when running in protected mode. This is essentially a different sort of software RAID but with BIOS support to boot from it. Using a different disk format to the standard software RAID for your OS can make it more difficult to recover when things go wrong and there’s no benefit to this. If you use software RAID-1 from your OS and set things up correctly then you can boot from either disk. Using software RAID-1 for booting and RAID-5 or RAID-6 for the OS and data is a viable option.

Cheap Hardware RAID

Cheap hardware RAID doesn’t have write-back caching and therefore can’t give any significant performance benefit over software RAID. Note that there are different options for how RAID stripes are laid out which can affect performance, so if a cheap hardware RAID device gives any significant performance benefit over software RAID then it’s probably due to where the blocks happen to be stored working well with your filesystem. Which is of course a benefit you could get from tuning software RAID.

The Mythical CPU Benefits of Hardware RAID

It’s widely regarded that hardware RAID is faster due to taking the processing away from the CPU. But the truth is that for at least the last 10 years CPUs have been fast enough and in fact it’s often been the case that RAID controllers have been the bottleneck.

When I loaded the Linux RAID-5/RAID-6 driver on my Thinkpad T61 it’s 2.2GHz T7500 CPU (which isn’t a particularly new or powerful laptop CPU) was tested and shown to be capable of 3227MB/s for RAID-6 calculations. The fastest SATA disk I’ve benchmarked was capable of sustaining almost 120MB/s on it’s outer tracks. If we assume that newer disks are capable of 150MB/s then my Thinkpad could handle the RAID calculations for an array of 20 such disks.

An old P3-1GHz desktop system I use for a low-end server can do 591MB/s of RAID-6 calculations in software, if I was able to connect SATA disks to that old system then it could drive four of them in a RAID array at full speed!

It’s often regarded that a benefit of hardware RAID is to avoid CPU use. Contiguous IO can use a moderate amount of CPU power, I could potentially use 20% of one core of a T7500 if I had four disks running at once. But usually contiguous IO isn’t that common. If you are using a Gigabit Ethernet port to transfer data then you are limited to something slightly more than 100MB/s. But most applications don’t involve large contiguous data transfers and thus the amount of data transferred goes down.

One way that hardware RAID can save CPU time is if the interface to the hard drives was inefficient. The IDE interface didn’t seem particularly efficient and large transfers to IDE disks used to often require more CPU time than was expected. For such disks having them on a RAID controller that emulated a giant SCSI disk could save some CPU time.

Back in 2000 I did some tests on a Mylex DAC 960 hardware RAID controller that was only capable of sustaining 10MB/s. This wasn’t a problem as the applications were seek intensive and the Mylex performed well for that task. But for contiguous IO software RAID would have given much better performance.

The Real Benefits of Hardware RAID

A good hardware RAID system will have NVRAM for a write-back cache. This can dramatically improve write performance which is very important on RAID-5 and RAID-6 systems that perform really badly for small writes.

Good hardware RAID controllers will often support many more disks than a non-RAID controller. If you want to have more than 4 disks then hardware RAID has some serious benefits. But it has to have NVRAM write-back cache, otherwise you get no useful benefits and you might as well use software RAID.

Conclusion

If you can’t afford a high-end RAID system like a HP CCISS then use software RAID. Software RAID will be faster and more reliable than cheap hardware RAID.

If you need more than four disks then you can probably benefit a lot from hardware RAID with write-back caching.

SE Linux Terminology

Security Context is the SE Linux label for a process, file, or other resource. Each process or object that a process may access has exactly one security context. It has four main parts separated by colons: User:Role:Domain/Type:Sensitivity Label. Note that the Sensitivity Label is a compile-time option that all distributions enable nowadays.

User in terms of SE Linux is also known as the Identity. The program semanage can be used to add new identities and to change the roles and sensitivities assigned to them. System users often end in “_u” (EG user_u, unconfined_u, and system_u) but this is just a convention used to distinguish system users from users that associate directly with Unix accounts – which are typically the same as the name of the account. So the user with Unix account john might have a SE Linux user/identity of john. Note that as the local sysadmin can change the user names with semanage you can’t make any strong assumptions about a naming convention. When a process creates a resource (such as a file on disk) then by default the resource will have the same user as the process.

Role for a process determines the set of domains that may be used for running a child process. Through semanage you can configure which roles may be entered by each user. The default policy has the roles user_r, staff_r, sysadm_r, and system_r. Adding new roles requires recompiling the policy which is something that most sysadmins don’t do. So you can expect that all role names end in “_r“.

Object Class refers to the object that is to be accessed, there are 82 object classes in the latest policy, many of which are related to things such as the X server. Some object classes are file, dir, chr_file, are blk_file. The reason for having an object class is so that access can be granted to one object with a given type label but not be granted to another object of a different object class.

Type is the primary label for the Domain/Type or Type-Enforcement model of access control, by tradition a type name ends in “_t“. There is no strong difference between a domain and a type, a domain is the type of a process. In the DT model there are a set of rules which specify what happens when a domain tries to access an object of a certain object class for a particular access (read, write, etc).

MLS stands for Multi Level Security, it’s a hierarchical system for restricting access to sensitive data. It’s core principle is that of no write-down and no read-up. In a MLS system you can only write data to a resource with an equal or higher sensitivity label.

MCS stands for Multi Category Security.

Sensitivity Level is for a hierarchical level of sensitivity in the MLS policy. In the default policy there are 16 levels from s0 to s15. The MCS policy uses some of the mechanisms of MLS but not the level, so in MCS the level is always set to s0. The policy can be recompiled to have different numbers of levels.

Category is a primitive for the MCS and MLS policies. The default policy has 1024 categories from c0 to c1023, the policy can be recompiled to have different numbers of categories.

Sensitivity Label is for implementing MLS and MCS access controls. It may be ranged, in which case it has a form “LOW-HIGH” where both LOW and HIGH are comprised of a Sensitivity Level and a set of categories separated by a colon – EG “s0:c1-s1:c1.c10” means the range from level s0 with category c1 to the level s1 with the set of categories from c1 to c10 inclusive. If it isn’t ranged then it just has a level and a set of categories separated by a colon. In a set of categories a dot is used to indicate a range of categories (all categories between the low one and the high one are included) while a comma indicates a discontinuity in the range. So “c1.c10,c13” means the set of all categories between c1 and c10 inclusive plus the category c13. The kernel will canonicalise category sets, so if it is passed “c1,c2,c3” then it will return “c1.c3“. These raw labels may be translated into a more human readable form by mcstransd.

Constraint is a rule that restricts access. SE Linux is based on the concept of deny by default and the domain-type model uses rules to allow certain actions. Constraints are used for special cases where access needs to be restricted outside of the domain-type model. MCS and MLS are implemented using constraints.

MySQL Cheat Sheet

This document is designed to be a cheat-sheet for MySQL. I don’t plan to cover everything, just most things that a novice MySQL DBA is likely to need often or in a hurry.

Configuring mysqld

If you are going to provide a database service to other machines edit /etc/mysql/my.cnf and set the bind-address parameter to a suitable value. A value of 0.0.0.0 will cause it to accept connections on any of the server’s addresses. I recommend using a private address range (10.0.0.0/8, 192.168.0.0/16, or 172.16.0.0/12) for such database connections and ideally a
back-end VLAN or Ethernet switch that doesn’t carry any public data.

For the purpose of this post let’s consider the MySQL server to have a private IP address of 192.168.42.1. So you want the my.cnf file to have bind-address = 192.168.42.1

To start mysql administration use the command mysql -u root. In Debian the root account has no password by default, on CentOS 5.x starting mysql for the first time gives a message:
PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !
To do so, start the server, then issue the following commands:
/usr/bin/mysqladmin -u root password ‘new-password’
/usr/bin/mysqladmin -u root -h server password ‘new-password’

That is wrong, for the second mysqladmin command you need a “-p” option (or you can reverse the order of the commands).

There is also the /usr/bin/mysql_secure_installation script that has an interactive dialog for locking down the MySQL database.

Administrative Password Recovery

If you lose the administration password the recovery process is as follows:

  1. Stop the mysqld, this may require killing the daemon if the password for the system account used for shutdown access is also lost.
  2. Start mysqld with the --skip-grant-tables option.
  3. Use SQL commands such as “UPDATE mysql.user SET Password=PASSWORD('password') WHERE User='root';” to recover the passwords you need.
  4. Use the SQL command “FLUSH PRIVILEGES;
  5. Restart mysqld in the normal manner.

User Configuration

For an account to automatically login to mysql you need to create a file named ~/.my.cnf with the following contents:
[client]
user=USERNAME
password=PASSWORD
database=DBNAME

Replace USERNAME. PASSWORD, and DBNAME with the appropriate values. They are all optional parameters. This saves using mysql client parameters -u parameter for the username, “-p for the password, and specifying the database name on the command line. Note that using the “-pPASSWORD” command-line option to the mysql client is insecure on multi-user systems as (in the absence of any security system such as SE Linux) any user can briefly see the password via ps.

Note that the presence of the database= option in the config file breaks mysqlshow and mysqldump for MySQL 5.1.51 (and presumably earlier versions too). So it’s often a bad idea to use it.

Grants

To grant all access to a new database:
CREATE DATABASE foo_db;
USE foo_db;
GRANT ALL PRIVILEGES ON foo_db.* to 'user'@'10.1.2.3' IDENTIFIED BY 'pass';

Where 10.1.2.3 is the client address and pass is the password. Replace 10.1.2.3 with % if you want to allow access from any client address.

Note that if you use “foo_db” instead of “foo_db.*” then you will end up granting access to foo_db.foo_db (a table named foo_db in the foo_db database) which generally is not what you want.

To grant read-only access replace “ALL PRIVILEGES” with “SELECT“.

To show what is granted to the current user run “SHOW GRANTS;” .

To show the privs for a particular user run “SHOW GRANTS FOR ‘user’@’10.1.2.3′;

To show all entries in the user table (user-name, password, and hostname):
USE mysql;
SELECT Host,User,Password FROM user;

To do the same thing at the command-line:
echo “SELECT Host,User,Password FROM user;” | mysql mysql

To revoke access:
REVOKE ALL PRIVILEGES ON foo_db.* FROM user@10.1.2.3 IDENTIFIED BY ‘pass’;

To test a user’s access connect as the user with a command such as the following:
mysql -u user -h 10.1.2.4 -p foo_db

Then test that the user can create tables with the following mysql commands:
CREATE TABLE test (id INT);
DROP TABLE test;

Listing the Databases

To list all databases that are active on the selected server run “mysqlshow“, it uses the same methods of determining the username and password as the mysql client program.

To list all tables in a database run “SHOW TABLES;” . For more detail select from INFORMATION_SCHEMA.TABLES or run “SHOW TABLE STATUS;

For example to see the engine that is used for each table you can use the command echo “SELECT table_schema, table_name, engine FROM INFORMATION_SCHEMA.TABLES;” |mysql.

But INFORMATION_SCHEMA.TABLES is only in Mysql 5 and above, for prior versions you can use mysqldump -d to get the schema, or “SHOW CREATE TABLE table_name;” at the command-line.

Also the mysqldump program can be used to display the tables in a database via “mysqlshow database” or the columns in a table via “mysqlshow database table“.

To list active connections: “SHOW PROCESSLIST;”

Database backup

The program mysqldump is used to make a SQL dump of the database. EG: “mysqldump mysql” to dump the system tables. The data compresses well (being plain text of a regular format) so piping it through “gzip -9″ is a good idea. To backup the system database you could run “mysqldump mysql | gzip -9 > mysql.sql.gz“. To restore simply run “mysql -u user database < file“, in the case of the previous example “zcat mysql.sql.gz | mysql -u root database“.

To dump only selected tables you can run “mysqldump database table1 [table2]“.

The option --skip-extended-insert means that a single INSERT statement will be used for each row. This gives a bigger dump file but allows running diff on multiple dump files.

The option --all-databases or -A dumps all databases.

The option --add-locks causes the tables to be locked on insert and improves performance.

Note that mysqldump blocks other database write operations so don’t pipe it through less or any other process that won’t read all the data in a small amount of time.

mysqldump -d DB_NAME dumps the schema.

The option --single-transaction causes mysqldump to use a transaction for the dump (so that the database can be used in the mean time). This only works with INNODB. To convert a table to INNODB the following command can be used:
ALTER TABLE tablename ENGINE = INNODB;

To create a slave run mysqldump with the --master-data=1.

When a master has it’s binary logs get too big a command such as “PURGE MASTER LOGS BEFORE ’2008-12-02 22:46:26′;” will purge the old logs. An alternate version is of the form “PURGE MASTER LOGS TO ‘mysql-bin.010′;“. The MySQL documentation describes how to view the slave status to make sure that this doesn’t break replication.

Portslave

Portslave is a getty replacement that is designed to talk to a modem and spawn PPP or SLIP when the modem connects. It authenticates the connection via RADIUS.

thanks.txt on my Play Machine

On my SE Linux Play Machine I have a file in the root home directory named thanks.txt_append_only_dont_edit_with_vi which users can append random comments to. It kept slowly growing from the time of Fedora Core 2 to today, here is the text. Any text within brackets is my response to a question.

you can send messages to the owner through this file
should I be able to see dmesg output?
Lon was here
Is this a virtual machine? [at that time it wasn't, it is now]
kermit!

nice toy here :)
cool stuff – will you be posting instructions on how to lock down a machine like this? [yes]

Had fun poking around
Impressive stuff, though I’m not exactly a security expert ;)

heheheh
I guess it’s a bit better than LIDS. I’ll give it a try
Does there even have to be a root user? could it have been a ‘John’ instead with no impact on the fedora system? [the user name was never an issue, changing a Unix system to have "John" map to UID 0 is no big deal]
nice toy…
This is my first look at SElinux, very secure but seems broken from a desktop usability standpoint. Is FC2′s policy to be more liberal than this? [SE Linux has been continually improving]
Out of curiosity are you running exec-shield as well [sometimes yes, sometimes no, depends on the distro]

This machine is a little bit more permissive than the Gentoo machine,
I can actually read the security policy files! [by design, you can look and learn]
.
Thanx and have a nice day
I was able to coredump bash and read some history enries. see ./coredumptest Is this expected behaviour? kenny @ jevv.priv.at [you could have just read ~/.bash_history or run the "history" command]
exec-shield what is that? When I ran this command It gives a error: -bash: exec-shield: command not found [exec-shield is a kernel patch to prevent some application exploits which rely on writable and executable memory]
Where are the security policy files? Excelent job here! Thank you for the public root account ;-p

Very interesting.
Russel ! Thank You for work, Thank You for this box. SELinux Rulz ! [s/Russel/Russell/ :-)]
I was able to fill up the filesystem to 100% (/tmp) and I was able to terminate the shells of other root users
[Filling the root fs is a DOS attack, read the MOTD.]
[Killing the shells of other users is expected behavior, they are all using the same account as you!]

The tar program sure gets upset. I untar something that was originally tarred up as UID 1000, and it gets changed to that. Then I try to untar a second portion of the data, and I get all sorts of errors. Had the UID change been blocked, the errors wouldn’t happen when the second tar tries to write to the directories again. Errors look like this:

tar: procps-3.2.1/test/ps/thread-nosort-L/header: Cannot open: No such file or directory
tar: procps-3.2.1/test/ps/thread-nosort-default: Cannot mkdir: No such file or directory
tar: procps-3.2.1/test/ps/thread-nosort-default/setup: Cannot open: No such file or direc

You’re seriously short on RAM. Only about 9 MB are free. Nothing I can view is eating it. Programs are crashing due to lack of memory. [you don't have permission to see most processes]

can’t wait for fedora core 2. this is one sweet security setup. hopefully a howto will come out, plus maybe a gui for the windows folks.

thanks. you’ve inspired me to install fedora. cool stuff.

Thanks very much for setting this box up. It is a great learning tool

I note that I can’t ping, traceroute or telnet off the box. Is this intentional? Is this part of the lockdown to show me that I can’t do things I expect to be able to do with uid 0? My initial impression is that without those functions it is not very useful to have a system. [in the early days I allowed such things, but they were abused too often]

###########
Have you updated the kernel with the information in this

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&c2coff=1&safe=off&threadm=1Jw1G-551-7%40gated-at.bofh.it&rnum=3&prev=/groups%3Fq%3Dluto%2Bgroup:linux.kernel%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26group%3Dlinux.kernel%26c2coff%3D1%26safe%3Doff%26scoring%3Dd%26selm%3D1Jw1G-551-7%2540gated-at.bofh.it%26rnum%3D3

post? Have you tried whether that might be a real exploitable vulnerability?
Sorry about the formatting of the url. [there are kernel vulnerabilities all the time, I keep updating it to the latest kernel]
###########
Its very interesting. Thank you.
bagus juga pengamanan boxnya. salam dari indonesia
##\n thanks from me too\n##
##/nD’oh’/n thanks from a Windows Luser too/n##
hello althepcman was here
Thanks very much for setting this box up. I’ll try the SELinux on Fedora Core 2.
#######
ichtus
thank your for your great job, Fedora is great
######
thanks, from argentina, i really dont like fedora…in fact im a debian or gentoo user…but i think that fedora its kind a cool thing
-=-=-=-=-=-
nice small server with fine security patch. thx for the try-out. greetings from hannover/germany

Thanks from Brazil. I’m studying selinux and ids integration and probably I’m gonna come back here. marciorg at gmail.com
#####################################################
-=-=-=-=-=-=-=-
hi
is it correctly that root can sudo ?

-bash-3.00# ps auxw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 20860 0.0 0.5 5576 1432 pts/42 Ss+ 07:44 0:00 -bash
root 20910 0.0 0.5 4852 1296 pts/43 Ss+ 08:02 0:00 bash -i
root 21033 0.0 0.5 5092 1436 pts/45 Ss+ 08:29 0:00 -bash
root 21105 0.0 0.5 4860 1460 pts/46 Ss 08:39 0:00 -bash
root 21219 0.0 0.2 2708 756 pts/46 R+ 08:55 0:00 ps auxw
-bash-3.00# sudo -u mysql ps auxw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 20860 0.0 0.5 5576 1432 pts/42 Ss+ 07:44 0:00 -bash
root 20910 0.0 0.5 4852 1296 pts/43 Ss+ 08:02 0:00 bash -i
root 21033 0.0 0.5 5092 1436 pts/45 Ss+ 08:29 0:00 -bash
root 21105 0.0 0.5 4860 1460 pts/46 Ss 08:39 0:00 -bash
mysql 21220 0.0 0.0 2476 252 pts/46 R+ 08:56 0:00 sesh /bin/ps auxw
mysql 21221 0.0 0.2 3844 752 pts/46 R+ 08:56 0:00 /bin/ps auxw
-bash-3.00#

and is it realy sudo? AFAIK mysqld was started on this system, but sudo -u mysql ps auxw doesn’t show me other mysql processes…
[sudo doesn't change to the mysqld_t domain...]

-=-=-=-=-=-=-=-

i’ve written kill_rjc.pl script: i tried to kill hidden pids from /proc using sudo -u rjc kill -9 $pid.
does rjc has 2 roles ? why i couldn’t kill his shell ?
[rjc has 2 roles, neither of which is user_r, so neither of them has the domain user_t that you can kill]

-=-=-=-=-=-=-=-

thx in advance :)
#####################################################

##############
rjc, check this out: /root/ls_rjc_home_:)
[fixed - thanks for that, it was due to a bug in locate]
and don’t forget about sudo plz :)
##############
Thanks for the effort to let us experiment with SELinux/Fedora
################################
whatever
Thanks alot for this publicly accessible machine! I recenly snagged a RH-specific file for my Debian GNU/Linux-based server :)
thanks for this nice playbox :)

That’s great! I’ve just typed rm -rf / under root and nothing happened! Fantastic! Still can’t believe it!
Thanks for the opportunity to let Linux enthusiasts learn SELINux hands-on!
———————————
Very cool! Good work!
———————————
Now Thats Cool stuff man. (masud.sp at gmail.com)
COOL! thanks for your work :)
Great service, thanks
Nice :)
Thanks!
test

Excelent\!I will try this at home.

Nice demo, I’ll pass this on to the secuity team to show the concepts
I have been pushing for more SE Linux deployments but policy managment is a big cost
sbrunaso

Nice works\!
SELinux is interesting.
Dear Russell Coker,

Thanks to providing Play machine, this concept is mind blowing.

This will help community to grow.

Thanks again with regards,

Deepak Mahajan
Head – Internet
Jain Irrigation Systems Ltd.
Jalgaon – India
www.jains.com
email: internet@jaindrip.com
thanks
thanks
thanks – very interesting – perfect to get a first look at the selinux features
COOL
Thanks for the demo system, very cool!
This is just cool….Great wok Russell ………. anspuli anspuli@gmail.com

very good!
thanks. my work have a new possibilit now!

thanks for the access – rgds rhp
killroy was here
this really is amazing. thanks for the demo. – db
how do you do normal root admin stuff on a selinux system with strict policies in force? [you do it as sysadm_r:sysadm_t]
Thanks for the access.
thank you — a brave man indead !
Nice box.
Thanks for sharing it with us!
—-[ The OOM Killer ]—-
root can still eat up all memory and the next process that requests memory will be killed by the kernel. that could be something important like apache on a server, or the “top” of the admin trying to figure out what’s going on, etc
:-/
memory usage should be limited
[Limiting the number of processes root can use is impossible, therefore trying to limit memory use is not going to be very productive. So I just make the conditions of use include that DOS attacks are not acceptable. For real servers don't give the root account to hostile users and use SE Linux to help prevent hostile users getting root.]
mcgrof: how about limiting number of open binds maybe?
mcgrof: anyway, thanks , this is cool
mcgrof: I logged out and the listening ports are still here
mcgrof: I killed them for you
[Again, SE Linux isn't about resource limitation. Note that you can't bind to a port that's reserved for some other purpose.]


this server sucks, cant even do a simple rm -rf / :)

mcgrof: re: binds — yeah, makes sense, thanks anyway, this is great

—-

Nice to have a hands-on SE Linux demo available! What has been bugging me for a long time: How does SE Linux compare to RSBAC? I read the mailing list discussions^H^H^Hflamewars, but didn’t get any useful information out of it.

Neat\!
HOW-TO make demo SeLinux machine?DmA@admin.tstu.ru Tambov ,Russia
Hallo Welt
Nice Try
Test
test
Hallo
Saluton
Thanks for this test system. I just copied the thanks.txt_append_only_dont_edit_with_vi file to a different name, which it allowed me to do. It appeared to have the same permissions as the original file. ["ls -lZ" shows the SE Linux contexts of files, the file you copied had a different "type"] I couldn’t delete the original file, but it allowed me to delete the copy. I also tried to shutdown the system and was denied. Good demonstration of SELinux.

thanks, very cool

sweet! with the help of your configuration I managed to set up my Debian box; didn’t try to break it though, looks pretty hopeless concerning my security background. i’ll be back to learn more; thanks

I’m Sorry.I’ve executed it programming continuousness fork.But It’s not being malicious.Sorry really [don't worry, that happens all the time]
Very impressive, thanks for the demonstration
Thanks Russell, xor007 from South Africa
thanks for showing off your excellent work ~Alicia
helo
Kool. a very intrestig demo
iCanMakeAFile in my home directory.
good that root can still do this.
pretty wacky, see what else is around here for me to try to muck up.
Ooh, root can make files in its homedir.

thx! linio

quack

—–
Thanks for setting up a machine like this! Are there any newer packages installed than what comes with Debian Etch? Or can I build myself a machine like this using nothing but the etch packages? [during Etch I had my own repository for updated packages, now I'm doing the same for Lenny]

Rik
—–
neat. – folken from CH
eat meat
Thanks… — Philipp Kern ()(DD)

Nice one Russ.

mlh 2007 11 05 13:48

Very cool. Thanks a lot
From Russia with Fun! Thx u. skynerve
16 nov 2007
—-
Funny to allow strangers root access to your computer, but still be safe. :-)
Still I think a little more documentation for SELinux-newbies could be very useful…
—-

Test

great. just fucking great. russel FTW\!\!\!11one

That’s pretty cool.


something
Thanks
cool do you have an apparmor play machine too? [it would be possible to run an apparmor Play Machine, but no-one bothered]
Thanks for this nice setup, i’m not a security expert but the few things I tried where not allowed ;), way to go
The fact that you feel secure even after giving out the root password has motivated me to finally dive into SELinux – thanks!
Nice to meet you! I am from a university of China.
It’s strange playing on a machine in the future; I’m on the other side of
the international dateline.
amazing do i can do it on my debian too?:)
i will try second time with selinux maybe is not too diffcult for me.
Kelaz was herels

cool! gonna install this on my laptop. ^.^
-dcbunny

Hello Mon Jan 21 15:32:59 EST 2008
r0b3r7
nice… having an open box like this is a ballsy move i really respect that.
if you don’t see the fnords they can’t eat you
nice, you can’t even ls /etc/shadow ;)
Nice one

——

Hi there,

Thanks for the server, the best I can do so far is to have the box connect to itself continuously through ssh port so no one can log in.

Cheers,

Billy
ohls -lsa! i can change passwordls -lsals -lsa [I stuffed up there]
HACKED!
This is pretty cool. Unfortunately, this is only the second time I’m logging into a remote shell so I’m just basking in the novelty and not really contributing anything of worth -George
neat!
something
PRRV-Test from Austria
well, thanx ;) .. i’ll read and learn about selinux i come back ;) .. bodik civ zcu cz
I not able to delete /root/.ssh/authorized_keys, but was able to overwrite it. Should this have been allowed? [no]
sorry for the forkbombs!!!!
~rb
Thanks for the peek inside!
I noticed some crashes in the last logins, what caused the crash?
——————————————————————————
I am internet famousls – Murray.
Would be internet famous if I could spell
[crashes are usually caused by DOS attacks]
i was here
Pretty neat! Thanks Russel
root:user_r:user_t:-s0:c0.c100@play:~# hostname test
hostname: you must be root to change the host name
Nice :)
Mon Jul 7 05:48:55 EST 2008
Thanks for giving me the opportunity to test this machine
SELINUX student from INDIA
—————————-
Hi there!

Nice security. This convinces me to have a beter look at SELinux

Thanks!

JL Lacroix from Belgium
Wed Jul 23 16:15:35 EST 2008
Hi,

SELINUX is really enormous!
pretty cool setup / Henrik
Thx, nice demo!
format:c dont work, maybe a bug
Thanks for the really amusing demo! -e

thanks for this stuff. it is a good starting point for SELINUX. spallares@itsyx.com.
thanks a lot for the opportunity to try this. a big THX from MDQ, Argentina ;) ….zer0
———————————————————————
Thanks for a great demo\nMichel van Deventer, Netherlands\nmichel@het.grote.net

—-
Interesting. I’ve always had reservations about SE Linux, because it introduces another security layer on top of the standard posix model – even with the “normal” model you can sometimes accidentally miss things. I’d be interested to hear how SE Linux has an impact on the daily life of an administrator.

Anyways, thanks.

– Random person from Belgium

Someone wrote “very impressive” in banner art.

##############################

Thanks for the demostration\!\!
I really need to learn more about SELinux
Great job\!
######################################################

Hi!
My name is Alexandre Stefani
What you do is really cool. I_m learning SELinux and will install it on my Debian
Thanks a lot. I_ll purchase a T-Shirt soon.

Thanks for the demo.
please install iptraf and mc. it would be real fun. thanks!
Hello. Leave your handle here:
Malformation – 27/10/08
I don’t remember writing that! -Malformation

Hello from SELinux course from Austria
reerzrzr
SO, root is no more the boss now, \n but you do have a boss i.e SE admin \n root is a normal handicap user on this machine

Ahoy from around the world! This is an amazing demonstration! Are the files in /selinux supposed to be world readable (even though the parent directory restricts access)? Seems to me that a tiny privacy issue exists with concurrent play users and their /proc/${SESSIONPROCNUM}/environ file. Then again, I am a newb…. Thanks & feel free to reply to my comments at vulariter-selinuxplay@yahoo.com! [When two users login with the same UID and context then they can mess with each other, the privacy issue of the environ file is just the tip of the iceberg.]

…. what I meant to say was world-writable… lol.. later -peritus

Cool.Best regards!
best regards from Poland:) Nice work here. When will be the demo how to create this kind of machine? ^_^
Greetings from Chicago, i’m very much interested inlearning SELinux. Thanks for kindly providing this resource
robwuzhere
Thanks from pl.
thank you for providing this. I really want to learn selinux.

Thanks Russell.

[update Oct 2009]
It would be nice if you explained how to setup such a play machine.
[that's on my todo list]
when i grow up ill build such mashines for educational puropses. Necessary docs, tutorials, and an ability to tune the system during one paticular session. And of course – tests: are you sucsessfull. Such a system could be a wonderful alternative to e.g. LPI exams: show me. Communications inside one particular computer system. When i grow up – i’ll know English better =)
hallo
seLinux is fun.
Hi, All.
test
the point is to break the machine?
[the point is to discover security flaws]
Interesting, going to read up on this and maybe set up a VM… sounds like fun! Thanks :)
Hola. Archivo de pruebas.
nice setup
hi??
hi all
oru kundhoom nadakkunnilla
Thanks for making this available – I’m just starting to look into SELinux in the hopes that it offers a usably simple security model…
I am fascinated by the fact that I can append to this file, but not remo
ve or truncate it. I like the fine-grained opermissions!
bla bla bla
selinux looks very cool. thank you for providing this.
enhorabuena
Hello <3 selinux
win, or WIN.
all your base are belong to us
[Section 2 of the MOTD clearly says that DOS attacks are out of scope]
======================================================
Hello Kind Sir,
I am Dr. Adamu Salaam, the the bank manager of bank of africa (BOA) Burkina Faso West
I am sending you this message about the $3.14159 million dollars in bank
account number 2718281828450945. I will give you this money in exchange
for the password to the ‘bofh’ account.
======================================================
[Thanks for the amusing offer. I've been offered stolen credit cards and other
junk for the password, Pi million dollars in the account numbered "e" is a
refreshing change.]
Can you recommend any textbooks that teach selinux? Presumably targeted at a Linux SA.
weird stuff this. doesn’t feel like being root :)
Why no /proc/mtrr ? I want to run exploit!
[/proc/mtrr doesn't exist in a Xen DomU, there wouldn't be much point in it]
muahahahah
—-TONE WAS HERE —-
; DROP DATABASE –
SQL injection doesn’t work on flat files
Hello, boys! :)
Really good
pretty cool…gonna be learning this reall soon. — Glitch
good job! Is this a custom build of selinux policy? wright.keith@gmail.com
[Custom configuration, but the main policy package is the same one that everyone else should be using]
Great setup, Mr Coker. :)
Cool. Thanks for the opportunity to play with this.
good job SELINUX is really great :)
congratulations Sir it’s really good fun to play with Your server.. SELINUX rules cat thanks.txt_append_only_dont_edit_with_vi ! ~kawooem
seems untouchable… please post your SELinux recepies
-Jack
Also thanks for this Testmachine, i could test my ISP if he was allowing ssh over cable network.
Greets JacksOn
thanks… interesting
thanks.. interesting CANARIS
@ CANARIS: Yes, just what I was going to say :) ~gmatht
mmm4m5m: Nice. Thanks. I was here.
Managed to get the server to reboot with your tight selinux … ;)
18:56:43 up 1 min, 1 user, load average: 0.08, 0.06, 0.02
[That was the watchdog responding to your DOS attack. NB DOS attacks are out of scope.]
Cheers
David Jacobson
From South Africa – Down under! [jakes@leet.org]
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
I’m curious about malicious commands, i.e. do you consider malicious commands such as:
rm -rf / or using mkfs on / or using a fork bomb liek :(){:|:&};: is considered a security flaw or a type of DOS, didn’t want to try them just incase.
[A fork bomb is a DOS attack, rm -rf and mkfs are legitimate tests of the
security of the system. I encourage you to use rm and mkfs to test the system.]
I’m also curious, if you log on to the console, not ssh, but physical, as root, are the SEL restrestrictions lifted?
[No, the restrictions are based on the context not the terminal. It is
possible to have pam restrict which accounts can login via various methods, so
accounts that allow higher levels of access could be denied ssh logins.
Also there is a boolean to determine whether the administrator can login via
ssh, I have that turned on but for best security you would turn it off.]
Thanks for letting us play on this box. It is a good demo. Perhaps I should not discount SELinux as just a pain in the butt like I traditionally have.
impressive indeed. -reablettoz
This is really cool stuff, thanks for the demo! Gotta say, the real
“wow” moment for me was when I ran top and couldn’t see any procs but my own.
BTW ssh is a bit laggy for me when logged into this box, moreso that most
machines I ssh into. Would selinux have anything to do with this, or have I
just ended up with a slow/laggy link?
– Daniel Gnoutcheff <gnoutchd@union.edu>
Sun Jul 19 23:24:51 UTC 2009
[I was in the middle of doing a big file transfer when you logged in. But even if I wasn't the link is a SOHO grade connection so you don't expect the same quality as a proper data-center.]
Nice, I’ll have to look into this. Thanks for the demo\!
Herro people :3
nice one
Sorry bout the fork-bomb yesterday :3
you know it works when your instinct is to rerun with sudo before realizeing youralready root lol
test test
Wow, this is cool! SELinux rules! I got to try this on my own machine
BENSON WAS HERE
Hello from Russia
=====================================================
Hello from San Juan, Puerto Rico!
I just found out about this server by reading the SELinux book from O’Reilly. The book is pretty old (2004) and I’m glad to know the URL provided on the book still works!
All the best,
=====================================================
22:09:47 up 21:34, 1 user, load average: 0.00, 0.00, 0.00
Great job with this one, i’ve tried a number of things -
attemtping to get cron to run the files as bofh (no luck, cron transitions to the context im in)
attempting to put hard links in /root so that it relabels key files (no luck, /root is on a different partition)
attempting to mknod a block device (no luck, nodev is set in the mount options and there isnt many places I can write to anyway)
attempting to signal a coredump of “chage” (which doesnt complain when i run it by the way!) so I can read shadow.
attempting to perform sigstop on chage so i can ouput the file descriptor (no luck, chage transitions, i cant read its proc entry nor can i signal it anyway)
attempting to chroot a new environment (no luck, no chroot process privilege)
I think the closest i got was trying to manipulate chage, but i was far far off then. That or being able to write to bofh crontab.
The most effective way to get around the selinux restrictuions would probably be to get read access to /dev/hdc then run debugfs on it to dump the shadow file. But I spent too long on this now anyway!
Great work!
———————–
Matthew

Installing SE Linux on Debian/Lenny

Currently Debian/Lenny contains all packages needed to run SE Linux. Development continues so there are periodic updates which sit in Unstable for a while before migrating to Lenny (testing).

I have set up my own APT repository for SE Linux packages. This has packages that need newer versions than in Lenny but which will be in Lenny eventually (which includes the latest policy packages) as well as my own modified packages to fix bugs that won’t be fixed in Lenny. After Lenny is released I will maintain the repository for i386 and AMD64 for bug fixes and new features above what is in Lenny.

gpg --keyserver hkp://subkeys.pgp.net --recv-key F5C75256
gpg -a --export F5C75256 | apt-key add -

To enable the use of my repository you must first run the above two commands to retrieve and install my GPG key (take appropriate measures to verify that you have the correct key).

deb http://www.coker.com.au lenny selinux

Then add the above line to /etc/apt/sources.list and run “apt-get update” to download the list of packages.

Next run the command “apt-get install selinux-policy-default selinux-basics” to install all the necessary packages. After that is done you need the file /.autorelabel to exist for the next boot to cause the filesystems to be labeled. The file /boot/grub/menu.lst needs to have “selinux=1” on the end of the line which starts with “# kopt=” (and the kernel command-lines for each kernel). You can do this manually but the recommended thing to do is to run the command selinux-activate, if given no parameters it will apply all the necessary tweaks to enable SE Linux (it changes PAM configuration files, GRUB configuration, and creates /.autorelabel.

Note that if you use gdm then the file /etc/pam.d/gdm needs to have the pam_selinux.so line moved to before the GNOME key lines. I need to update the selinux-basics package for this.

Then reboot and the filesystems will be relabeled. The relabel process will cause a second automatic reboot of the machine (it needs to be rebooted so that init gets the correct context). After that is finished the machine will be running in “permissive mode“, this means that SE Linux will log the actions that it would deny, but they will still be performed.

To put the machine in “enforcing mode” you can run the command “setenforce 1“, this means that SE Linux actually controls access to the machine. When you are confident that the machine is working correctly you can edit the file /etc/selinux/config and change the SELINUX= line to specify that it is in “enforcing” mode. The script selinux-config-enforcing will do this for you (with no parameters if configures SE Linux to be in enforcing mode at the next boot). If you need to override this (for example if critical files get the wrong labels and prevent booting) then the kernel command-line option enforcing=0 will override it. I will add a new command selinux-config-enforcing to the selinux-basics package to manage this (it will hopefully be there for Lenny).

If you use Postfix then you need to run it without chroot, the command postfix-nochroot will configure Postfix to not use chroot and will restart it. This script is included in the selinux-basics package but will hopefully be in Postfix for Lenny+1 (I think that many people who don’t use SE Linux will be able to use it).

In summary here are the commands you need:
apt-get install selinux-policy-default selinux-basics
selinux-activate
reboot
postfix-nochroot
(optional)
selinux-config-enforcing

Porting NSA SE Linux to Hand Held devices

Notes

I presented this paper at the 2003 Ottawa Linux Symposium (OLS).

http://lsm.immunix.org/ is defunct, since about 2004, so I removed the link.

The NSA changed the URLs on their web site, so this version of the paper has the new ones.

The SE Linux kernel interfaces have changed, now it’s all through the proc and selinuxfs filesystems and there are no SE Linux specific system calls. Equivalent functionality is provided.

With significant changes to the code base (kernel, policy, and tools) the amounts of memory used will differ. But the methods of saving memory will remain the same.

Abstract

In the first part of this paper I will describe how I ported SE Linux to User-Mode-Linux and to the ARM CPU. I will focus on providing information that is useful to people who are porting to other platforms as well. In the second part I will describe the changes necessary to applications and security policy to run on small devices. This will be focussed on hand-held devices but can also be used for embedded applications such as router or firewall type devices, and any machine that has limited memory and storage.

Introduction

SE Linux offers significant benefits for security. It accomplishes this by adding another layer of security in addition to the default Unix permissions model. This is achieved by firstly assigning a type to every file, device, network socket, etc. Then every process has a domain and the level of access permitted to a type is determined by the domain of the process that is attempting the access (in addition to the usual Unix permission checks). Domains may only be changed at process execution time. The domain may automatically be changed when a process is executed based on the type of the executable program file and the domain of the process that is executing it, or a privileged process may specify the new domain for the child process.

In addition to the use of domains and types for access control SE Linux tracks the identity of the user (which will be system_u for processes that are part of the operating system or the Unix user-name) and the role. Each identity will have a list of roles that it is permitted to assume, and each role will have a list of domains that it may use. This gives a high level of control over the actions of a user which is tracked through the system. When the user runs SUID or SGID programs the original identity will still be tracked and their privileges in the SE security scheme will not change. This is very different to the standard Unix permissions where after a SUID program runs another SUID program it’s impossible to determine who ran the original process. Also of note is the fact that operations that are denied by the security [smalley] have the identity of the process in question logged.

I often run SE Linux demonstration machines on the Internet which provide root access to the world and an invitation to try and break the security [play-machine].

For a detailed description of how SE Linux works I recommend reading the paper Peter Loscocco presented at OLS in 2001 [ols2001:loscocco-smalley].

SE Linux has been shown to provide significant security benefits for little overhead on servers, desktop workstations, and laptops. However it has not had much use in embedded devices yet.

Some people believe that SE Linux is only needed for server systems. I think that is incorrect, and I believe that in many situations laptops and hand-held devices need more protection than servers. A server will usually have a firewall protecting it, with a small number of running applications which are well maintained and easy to upgrade. Portable computers are often used in hostile environments that servers do not experience, they have no firewalls to protect them, and often they are connected to routers operated by potentially negligent or hostile organizations.

But there are two main factors that cause an increased need for security on portable devices. One is that it is usually extremely difficult and expensive to upgrade them if a new security fix is needed. This means that in commercial use portable computers tend to never have security fixes applied. Another factor is that often the person in posession of a hand-held computer is not authorised to access all the data it contains, and may even be hostile to the owner of the machine.

Naturally for a full security solution for portable computers a strong encryption system will need to be used for all persistent file systems. There are various methods of doing this, but all aspects of such encryption are outside the scope of this project and can be implemented independently.

Kernel Porting

The current stable series of SE Linux is based on the 2.4.x kernels and uses the Linux Security Modules (LSM) lsm interface. The current LSM interface has a single sys_security() system call that is used to multiplex all the system calls for all of the security modules. SE Linux uses 52 different system calls through this interface. Due to problems in porting the kernel code to some platforms (particularly those that have a mixed 32 and 64bit memory model) the decision was made to change the LSM interface for kernel 2.6.0. The new interface will make the code fully portable and remove the painful porting work that is currently required. However I needed to have SE Linux working with the 2.4.x kernels so I couldn’t wait for kernel 2.6.0.

The main difficulty in porting the code is the system call execve_secure() which is used to specify the security context for the new process. This calls the kernel funtion do_exec() to perform the execution, and do_exec() needs a pointer to the stack, thus requiring architecture specific code in the sys_execve_secure() function. The sys_security_selinux_worker() function (which determines which SE Linux system call is desired and passes the appropriate parameters to it) calls sys_execve_secure() and therefore also needs architecture specific code, and so does the main system call sys_security_selinux().

My first port of SE Linux was to User-Mode Linux [uml]. This was a practice effort for the main porting work. It is quite easy to debug kernel code under UML, and as it uses the i386 system call interface I could port the kernel code without any need to port application code.

The main architecture dependent code is in the source file security/selinux/arch/i386/wrapper.c, which has code to look on the stack for the contents of particular registers. This needs to be changed for platforms with different register names, and for UML which does not permit such direct access of registers.

The solution in the case of UML was to not have a wrapper function, as the current structure had a pointer to the stack anyway that could be used inside the sys_execve_secure() function. So I renamed the sys_security_selinux_worker() function to sys_security_selinux() for the UML port and entirely removed all reference to the wrapper. Then I moved the implementation of sys_execve_secure() into the platform specific directory and implemented a different version for each port.

This was essentially all that was required to complete the port, the core code of SE Linux was all cleanly written and could just be compiled. The only other work involved getting the Makefile’s correctly configured, and adding a hook to sys_ptrace().

One thing I did differently with my port to the ARM architecture was that I removed the code to replace the system call entry. When the SE Linux kernel code loads on UML and i386 it replaces the system call with a direct call to the SE Linux code (rather than using the option for LSM to multiplex between different modules). As there is currently no support for having SE Linux be a loadable module there seems to be no benefit in this, and it seems that on ARM there will be more overhead for adding an extra level of indirection for this. So I made the SE Linux patch hard-code the SE system call into the sys-call table.

iPaQ Design Constraints

The CompaQ/HP iPaQ [ipaq] computers are small hand-held devices. The most powerful iPaQ machines on sale have a 400MHz ARM based CPU that is of comparable speed to a 300MHz Intel Celeron CPU, with 64M of RAM and 48M of flash storage.

An iPaQ is not designed for memory upgrades. There are some companies that perform such upgrades, but they don’t support all models, and this will void your warantee. Therefore you are stuck with a memory limit of 64M.

The flash storage in an iPaQ can only be written a limited number of times, this combined with the small amount of storage makes it impossible to use a swap space for virtual memory unless you purchase a special sleeve for using an external hard drive. Attaching an external hard drive such as the IBM/Hitachi Micro Drive is expensive and bulky. Therefore if you have a limited budget then storage expansion (for increased file storage or swap space) is not an option.

For storing files, the 32M file system can contain quite a lot. The Familiar distribution is optimised for low overheads (no documentation or man pages) and all programs are optimised for size not speed. Also the JFFS2 [jffs2] file system used by Familiar supports several compression algorithms including the Lempel-Ziv algorithm implemented in zlib, so more than 32M of files can fit in storage.

For a system such as SE Linux to be viable on an iPaQ it has to take up a small portion of the 32M of flash storage and 64M of RAM, and not require any long CPU intensive operations.

Finally the screen of an iPaQ only has a resolution of 240×320 and the default input device is a keyboard displayed on the screen. This makes an iPaQ unsuitable for interactive tasks that involve security contexts as it takes too much typing to enter them and too much screen space to display them. As a strictly end-user device this does not cause any problems.

CPU Requirements

Benchmarks that were performed on SE Linux operational overheads in the past show that trivial system calls (reading from /dev/zero and writing to /dev/null) can take up to 33% longer to complete when SE Linux is running, but that the overhead on complex operations such as compiles is so small as to be negligible [freenix]. The machines that were used for such tests had similar CPU power to a modern iPaQ.

One time consuming operation related to SE Linux installation is compiling the policy (which can take over a minute depending on the size of the policy and the speed of the CPU). This however is not an issue for an iPaQ as the policy takes over a megabyte of permanent storage and 5 megs of temporary file storage, as well as requiring many tools that are not normally installed (make, m4, the SE Linux policy compilation program checkpolicy, etc). The storage requirements make it impractical to compile policy on the iPaQ, and the typical use involves configuration being developed on other machines for deployment on iPaQ. So the time taken to compile the policy database is not relevant.

The only SE Linux operation which can take a lot of time that must be performed on an iPaQ is labeling the file system. The file system must be relabeled when SE Linux is first installed, and after an upgrade. On my iPaQ (H3900 with 400MHz X-Scale CPU) it takes 29.7 seconds of CPU time to label the root file system which contains 2421 files. For an operation that is only performed at installation or upgrade time 29.7 seconds is not going to cause any problems. Also the setfiles program that is used to label the file system could be optimised to reduce that time if it was considered to be a problem.

I conclude that for typical use of a hand-held machine SE Linux only requires the CPU power of an iPaQ. In fact the CPU use is small enough that even the older iPaQ machines (which had half the CPU power) should deliver more than adequate performance.

Kernel Resource Use

To compare the amounts of disk space and memory I compiled three kernels. One was 2.4.19-rmk6-pxa1-hh13 with the default config for the H3900 iPaQ. One was a SE Linux version of the same kernel with the options CONFIG_SECURITY, CONFIG_SECURITY_CAPABILITIES, and CONFIG_SECURITY_SELINUX. Another was the same SE Linux kernel with development mode enabled (which slightly increases the size and memory).

For this project I have no need for the multi-level-security (MLS) functionality of SE Linux or the options for labelled networking and extended socket calls. This optional functionality would increase the kernel size. I am focussing on evaluating the choice of whether or not to use SE Linux for specific applications, once you have decided to use SE Linux you would then need to decide whether the optional functionality provides useful benefits to your use to justify the extra disk space and memory use.

The kernel binaries are 658648 bytes for a non-SE kernel, 704708 bytes for the base SE Linux kernel, and 705560 bytes for the development mode kernel. The difference between the kernel with development mode enabled and the regular one is that the development kernel allows booting without policy loaded, and booting in permissive mode (with the policy decisions not being enforced). For most development work a kernel with development mode enabled will be used, also for this test it allowed me to determine the resource consumption of SE Linux without a policy loaded.

To test the memory use of the different kernels I configured an iPaQ to not load any kernel modules. My test method was to boot the machine, login at the serial console, wait 30 seconds to make sure that all daemons have started, and run free to see the amount of memory that is free. This is not entirely accurate as random factors may result in different amounts of memory usage, however this is not as significant on the Familiar distribution due to the use of devfs for device nodes and tmpfs for /var and /tmp which means that in the normal mode of operation almost nothing is written to the root file system, so two boots will be working on almost the same data.

From the results I looked at the total field in the results (which gives the amount of RAM that is available for user processes after the kernel has used memory in the early stages of the boot process), and the used field which shows how much of that has been used. The kernel message log gives a break-down of RAM that is used by the kernel for code and data in the early stages of boot, however that is not of relevance to this study only the total amount that is used matters.

The total memory available was reported as 63412k for the non-SE kernel, 63308k for the SE Linux kernel, and 63300k for the development mode kernel. So SE Linux takes 104k of kernel memory early in the boot process and 112k if you use the development mode option.

The memory reported as used varied slightly with each boot. For the vanilla kernel the value 18256k was reported in two out of four tests, with values of 18252k and 18260k also being reported. I am taking the value 18256k as the working value which I consider accurate to within 8k.

For a standard SE Linux kernel the amount reported as used was 19516k in three out of six tests with the values of 19532k, 19520k, and 19524k also being returned. So I consider 19516k as the working value and the accuracy to be within 16k.

For the SE Linux kernel with development mode enabled the memory used was 19516k in three out of four tests, and the other test was 19524k. So the difference between the development mode kernel and the regular SE Linux kernel is only 8K of kernel memory in the early stages of the boot process.

Finally I did a test of a development mode kernel with no policy loaded. The purpose of this test was to determine how much memory is used on a SE Linux kernel if the SE Linux code is not loading the policy. For this the memory reported as used was 18292k in three out of five tests, with the values of 18296k and 18300k also being returned.

Kernel memory used
non-SE 18256k
SE no policy 18292k
SE with policy 19516k

So an SE Linux kernel without policy loaded uses approximately 36K more memory after boot than a non-SE kernel in addition to the 104k or 112k used in the early stages of boot.

With a small policy loaded (360 types and 23,386 rules for a policy file that is 583771 bytes in size) the memory used by the kernel is about 1224k for the policy and other SE Linux data structures. The policy could be reduced in size as there are many rules which would only apply to other systems (the sample policy is quite generic and was quickly ported to the iPaQ), although there may be other areas of functionality that are desired which would use any saved space.

So it seems that when using SE Linux the memory cost is 104k when the kernel is loaded, and a further 1260k for SE Linux memory structures and policy when the boot process is complete. The total is 1364k of non-swappable kernel memory out of 64M of total RAM in an iPaQ, this is about 2% of RAM.

All tests were done with GCC 3.2.3, a modified Linux 2.4.19, and an X-scale CPU. Different hardware, kernel version, and GCC version will give different results.

Porting Utilities

The main login program used on the Familiar [familiar] distribution is gpe-login, which is an xdm type program for a GUI login. This program had to be patched to check a configuration file and the security policy to determine the correct security context for the user and to launch their login shell in that context. The patch for this functionality made the binary take 4556 bytes more disk space in my build (29988 bytes for the non-SE build compared to 34544 bytes for the version with SE Linux support).

The largest porting task was to provide SE Linux support in Busybox [busybox]. Busybox provides a large number of essential utility programs that are linked into one program. Linking several programs into one reduces disk space consumption by spreading the overhead for process startup and termination code across many programs. On arm it seems that the minimum size of an executable generated by GCC 3.2.3 is 2536 bytes. In the default configuration of Familiar Busybox is used for 115 commonly used utilities, having them in one program means that the 2.5K overhead is only used once not 115 times. So approximately 285K of uncompressed disk space is saved by using busybox if the only saving is from this overhead. The amount of disk space used for initialisation and termination code would probably increase the space used by more than 80% if all the applets were compiled separately (my build of Busybox for the iPaQ is 337028 bytes).

The programs that are of most immediate note in busybox are ls, ps, id, and login. ls needs the ability to show the security contexts of the files, ps needs to show the security contexts of the running processes, and id needs to show the context of the current process. Also the /bin/login applet had to be modified in the same manner as the gpe-login program. These changes resulted in the binary being 5600 bytes larger (337028 bytes for a non-SE version and 342628 bytes for the version with SE Linux support.

Busybox Wrappers for Domain Transition

In SE Linux different programs run in different security domains. A domain change can be brought about by using the execve_secure() system call, or it can come from an automatic domain transition. An example of an automatic domain transition is when the init process (running in the init_t domain) runs /sbin/getty which has the type getty_exec_t, which causes an automatic transition to the domain getty_t. Another example is when getty runs /bin/login which has the type login_exec_t and causes an automatic transition to the domain local_login_t. This works well for a typical Linux machine where /sbin/getty and /bin/login are separate programs.

When using Busybox the getty and login programs will both be sym-links to /bin/busybox and the type of the file as used for domain transitions will be the type of /bin/busybox, which is bin_t. SE Linux does not perform domain transitions based on the type of the sym-link, and it assignes security types to the Inodes not file names (so a file with multiple hard links will only have one type). This means that we can’t have a single Busybox program automatically transitioning into the different domains.

There are several possible solutions to this problem, one possible partial solution would be to have Busybox use execve_secure() to run copies of itself in the appropriate domain. Busybox already has similar code for determining when to change UID so that some of the Busybox applets can be effectively SETUID while others aren’t. The SETUID management of Busybox requires that it be SETUID root, and involves some risk (any bug in busybox can potentially be exploited to provide root access). Providing a similar mechanism for transitioning between SE Linux security domains would have the same security problems whereby if you crack one of the Busybox applets you could then gain full access to any domain that it could transition to. This does not provide adequate security. Also it would only work for transitions between privileged domains (it would not work for transitions from unprivileged domains). I did not even bother writing a test program for this case as it is not worth considering due to a lack of security and functionality.

A better option is to split the Busybox program into smaller programs so transitions can work in the regular manner. With the current range of applets that would require one program for getty, one for login, one for klogd, one for syslogd, one for mount and umount, one for insmod, rmmod, and modprobe, one for ifconfig, one for hwclock, one for all the fsck type programs, one for su, and one for ping. Of course there would also be one final build of busybox with all the utility programs (ls, ps, etc) which run with no special privilege. To test how this would work I compiled Busybox with all the usual options apart from modutils, and I did a separate build with only support for modutils. The non-modutils build was 323236 bytes and the build with only modutils was 37764 bytes. This gave a total of 361000 bytes compared to 342628 bytes for a single image, so an extra 18372 bytes of disk space was required for doing such a split.

Splitting the binary in such a simple fashion would likely cost 18K for each of the eleven extra programs. If we changed the policy to have syslogd and klogd run in the same domain (and thus the same program) and have hwclock run with no special privs (IE the domain that runs it needs to have access to /dev/rtc) then there would only be nine extra programs for a cost of approximately 162K of disk space. This disk space use could be reduced by further optimisation of some of the applets, for example in the case of ifconfig the code to check argv[0] to determine the applet name could be removed. A simple split in this manner would also make it more difficult for an attacker to make the program perform unauthorized actions. When a single program has /bin/login functionality as well as /bin/sh then there is potential for a buffer overflow in the login code to trigger a jump to the shell code under control of the attacker! When the shell is a separate program that can only be entered through a domain transition it is much more difficult to use an attack on the login program to gain further access to the system.

Finally if we have a single Busybox program that includes applets running in different domains we need to make some significant changes to the policy. The default policy has assert rules to prevent compilation of a policy that contains mistakes which may lead to security holes. For the domains getty_t, klogd_t, and syslogd_t there are assertions to prevent them from executing other programs without a domain transition, and to prevent those domains being entered through executing files of types other than the matching executable type (this requires that each of those domains have a separate executable type, IE they are not all the same program). Adding policy which requires removing these assertions weakens the security of the base domains and also makes the policy tree different from the default tree which has been audited by many people.

Another way of doing this which uses less disk space is to have a wrapper program such as the following:

#include <unistd.h>
#include <string.h>

int main(int argc, char **argv
       , char **envp)
{
  /* ptr is the basename of the
   executable that is being run */
  char *ptr = strrchr(argv[0], '/');
  if(!ptr)
    ptr = argv[0];
  else
    ptr++;

  /* basename must match one of
     the allowed applets,
     otherwise it's a hacking
     attempt and we exit   */
  if(strcmp(ptr, "insmod")
  && strcmp(ptr, "modprobe")
  && strcmp(ptr, "rmmod"))
    return 1;
  return execve("/bin/busybox"
              , argv, envp);
}

This program takes 2912 bytes of disk space. The idea would be to have a copy of it named /sbin/insmod with type insmod_exec_t which has symlinks /sbin/rmmod and modprobe pointing to it. Then when insmod, rmmod, or modprobe is executed an automatic domain transition to the insmod_t domain will take place, and then the Busybox program will be executed in the correct context for that applet.

This option is easy to implement, one advantage is that there is no need to change the Busybox program. The fact that the entire Busybox code base is available in privileged domains is a minor weakness. Implementing this takes about 2900 bytes of disk space for each of the nine domains (or seven domains depending on whether you have separate domains for klogd and syslogd and whether you have a domain for hwclock). It will take less than 33K or 27K of disk space (depending on the number of domains). This saves about 130K over the option of having separate binaries for implementing the functionality.

A final option is to have a single program to act as a wrapper and change domains appropriately. Such a program would run in its own domain with an automatic domain transition rule to allow it to be run from all source somains. Then it would look at its parent domain and the type of the symlink to determine the domain of the child process. For example I want to have insmod run in domain insmod_t when run from sysadm_t. So I have an automatic transition rule to transition from sysadm_t to the domain for my wrapper (bbwrap_t). Then the wrapper determines that its parent domain is sysadm_t, determines that the type of the symlink for its argv[0] is insmod_exec_t and asks the kernel what domain should be entered when a process in sysadm_t executes a program of type insmod_exec_t, and the answer is insmod_t. So the wrapper then uses the execve_secure() system call to execute Busybox in the insmod_t domain and tell it to run the insmod applet.

I implemented a prototype program for this. For my prototype I used a configuration file to specify the domain transitions instead of asking the kernel. The resulting program was 6K in size (saving 27K of disk space over the multiple-wrapper method, and 156K of disk space over the separate programs method), although it did require some new SE Linux policy to be written which takes a small amount of disk space and kernel memory.

One problem with this method is that it allows security decisions to be made by an application instead of the kernel. It is preferrable that only the minimum number of applications can make such security decisions. In a typical configuration of SE Linux the only such applications will be login, an X login program (in this case gpe-login), cron (which is not installed in Familiar), and newrole (the SE Linux utility for changing the security context which operates in a similar manner to su).

The single Busybox wrapper is more of a risk than most of these other programs. The login programs are only executed by the system and can not be run by the user with any elevated privileges which makes them less vulnerable to attack. Newrole is well audited and the domains it can transition to are limited by kernel to only include domains that might be used for a login process (dangerous domains such as login_t are not permitted).

Due to the risks involved with a single busybox wrapper, and the fact that the benefits of using 6K on disk instead of 33K are very small (and are further reduced by an increase in kernel memory for the larger policy) I conclude that it is a bad idea.

I conclude that the only viable methods of using Busybox on a SE Linux system are having separate wrapper programs for each domain to be entered (taking 33K of extra disk space and requiring minor policy changes), or having entirely separate programs compiled from the Busybox source for each domain (taking approximately 162K of extra disk space with no other problems). Also with some careful optimisation the 162K of overhead could be reduced for the option of splitting the Busybox program. If 162K of disk space can be spared (which should not be a problem with a 32M file system) then splitting Busybox is the right solution.

Removed Functionality

A hand-held distribution doesn’t require all the features that are needed on bigger machines such as servers, desktop workstations, and laptops. Therefore we can reduce the size of the SE Linux policy and the number of support programs to save disk space and memory.

For a full SE Linux installation there are wrappers for the commands useradd, userdel, usermod, groupadd, groupdel, groupmod, chfn, chsh, and vipw. These can possibly be removed as there is less need for adding, deleting, or modifying users or groups on a hand-held device in the field. These programs would take 27K of disk space if they were included.

A default installation of Familiar does not include support for /etc/shadow, and therefore there is no need for the wrapper programs for the administrator to modify users’ accounts. However I think that the right solution here is to add /etc/shadow support to Familiar rather than removing functionality from SE Linux. This will slightly increase the size of the login programs.

In a full install of SE Linux there are programs chsid and chcon to allow changing the security type of files. These are of less importance for a small device. There will be fewer types available, and the effort of typing in long names of security contexts will be unbearable on a touch-screen input device. A hand-held device has to be configured to not require changing the contexts of files, and therefore these programs can be removed.

In the Debian distribution there is support for installing packages on a live server and having the security contexts automatically assigned to the files. As iPaQ’s are used in a different environment I believe that there is less need for such upgrades and such support could optionally be removed to save disk space. I have not written the code for this yet, but I estimate it to be about 100K.

The default policy for SE Linux has separate domains for loading policy and for policy compilation. On the iPaQ we can’t compile policy due to not having tools such as m4 and make, so we can skip the compilation program and its policy. Also the policy for a special domain for loading new policy is not needed as the system administration domain sysadm_t can be used for this purpose. It is possible to even save 3500 bytes of disk space by not including the program to load the policy (a reboot will cause the new policy to take affect).

A server configuration of SE Linux (or a full workstation configuration) includes the run_init program to start daemons in the correct security context. On a typical install of Familiar there are only three daemons, a program to manage X logins, a daemon to manage bluetooth connections, and the PCMCIA cardmgr daemon. For restarting these daemons it should be acceptable to reboot the iPaQ, so run_init is not needed.

Disk Space and RAM Use

In the section on kernel resource usage I determined that the kernel was using 1364K of RAM for SE Linux with a 583771 byte policy comprising 23,386 rules loaded. Since the time that I performed those tests I reduced the policy to 455,422 bytes and 18,141 rules which would reduce the kernel memory use. I did not do any further tests as it is likely that I will add new functionality which uses the memory I have freed. So I can expect that 1.3M of kernel memory is taken by SE Linux.

The SE Linux policy that is loaded by the kernel takes 67K on disk when compressed. The file_contexts file (which specifies the security contexts of files for the initial installation and for upgrades) takes 24K. The kernel binary takes 64K more disk space for the SE Linux kernel. So the kernel code and SE Linux configuration data takes 156K of disk space (most of which is compressed data).

The program setfiles is needed to apply the file_contexts data to the file system. Setfiles takes 20K of disk space. The file_contexts file could be reduced in size to 1K if necessary to save extra disk space, but in my current implementation it can not be removed entirely. In Familiar a large number of important system directories (such as /var) on Familiar are on a ramfs file system. I am using setfiles to label /mnt/ramfs. So far it has not seemed beneficial to have a small file_contexts file for booting the system and an optional larger one for use when installing new packages or upgrading, but this is an option to save 23K. Another option would be to write a separate program that hard-codes the security contexts for the ramfs. It would be smaller than setfiles and not require a file_contexts file, thus saving 30K or more of disk space. Currently this has not seemed worth implementing as I am still in a prototype phase, but it would not be a difficult task. Also if such a program was written then the next step would be to use a [jffs2] loop-back mount to label the root file system on a server before installation to the iPaQ (so that setfiles never needs to run on the iPaQ.

The patches for the gpe-login and busybox programs to provide SE Linux login support and modified ls, ps, and id programs cause the binaries to take a total of 10K extra disk space.

Splitting Busybox into separate programs for each domain will take an estimated 162K of disk space.

The total of this is approximately 348K of additional disk space for a minimal installation of SE Linux on an iPaQ. Adding support for /etc/shadow and other desirable features may increase that to as much as 450K depending on the features chosen. However if you use multiple Busybox wrappers instead of splitting Busybox then the disk space for SE Linux could be reduced to less than 213K. If you then replaced setfiles for the system boot labeling of the ramfs then it could be reduced to 190K.

Conclusion

Security Enhanced Linux on a hand-held device can consume less than 1.3M of RAM and less than 400K of disk space (or less than 200K if you really squeeze things). While the memory use is larger than I had hoped it is within a bearable range, and it could potentially be reduced by changing the kernel code to optimise for reduced memory use. The disk space usage is trivial and I don’t think it is a concern.

I believe that the benefits of reducing repair and maintenance problems with hand-held devices that are deployed in the field through better security outweigh the disadvantage of increased memory use for many applications.

All source code and security policy code releated to this article will be on my web site [my-site].

References

SE Linux Magic

Here is a complete list of entries for /etc/magic related to SE Linux.

# SE Linux policy database for Fedora versions less than 5, RHEL 4, and Debian before Etch
# http://doc.coker.com.au/computers/selinux-magic
0      lelong  0xf97cff8c      SE Linux policy
>16    lelong  x              v%d
>20    lelong  1      MLS
>24    lelong  x      %d symbols
>28    lelong  x      %d ocons

# SE Linux policy modules *.pp reference policy for Fedora 5 to 9,
# RHEL5, and Debian Etch and Lenny.
# http://doc.coker.com.au/computers/selinux-magic
0      lelong  0xf97cff8f      SE Linux modular policy
>4      lelong  x      version %d,
>8      lelong  x      %d sections,
>>(12.l) lelong 0xf97cff8d
>>>(12.l+27) lelong x          mod version %d,
>>>(12.l+31) lelong 0          Not MLS,
>>>(12.l+31) lelong 1          MLS,
>>>(12.l+23) lelong 2
>>>>(12.l+47) string >\0        module name %s
>>>(12.l+23) lelong 1          base

# for SE Linux policy source for reference policy
# http://doc.coker.com.au/computers/selinux-magic
0      string  policy_module(  SE Linux policy module source
1      string  policy_module(  SE Linux policy module source
2      string  policy_module(  SE Linux policy module source

0      string ##\ <summary>    SE Linux policy interface source

0      search  gen_context(    SE Linux policy file contexts

0        search        gen_sens(        SE Linux policy MLS constraints source

Log Tools

The Logtools package contains a number of programs for managing log files (mainly for web servers).

  • clfmerge will merge a number of Common Logfile Format web log files into a single file while also re-ordering them in a sliding window to cope with web servers that generate log entries with the start-time of the request and write them in order of completion.
  • logprn operates like tail -f but will (after a specified period of inactivity) spawn a process and write the new data in the log file
    to it’s standard input.
  • clfsplit will split up a single CLF format web log into a number of files based on the client’s IP address.
  • funnel will write it’s standard-input to a number of files or processes.
  • clfdomainsplit split a CLF format web log containing fully qualified URLs (including the host name) into separate files, one for each host.

Download:

Polyinstantiation of directories in an SE Linux system

Notes

I presented this paper at the 2006 SAGE-AU conference.

Abstract

This paper describes the problems related to shared directories such as /tmp and /var/tmp as well as problems related to having multiple SE Linux security contexts used for accessing a single home directory. It then provides detailed information on the solution to this problem that has been implemented with polyinstantiated directories by using the pam_namespace module.

Introduction

It is a long-standing Unix tradition that the directories /tmp and /var/tmp are used for temporary storage by all programs and on behalf of all users. This used to not be considered a problem, however in recent times it has been recognised that the use of such a shared directory is vulnerable to race-condition attacks with symbolic links.

Another problem is that in some situations a file name may convey secret information. If the file in question is in a public directory such as /tmp or /var/tmp (which may be an unintended result of a command by the user) then this will represent an information leak if there are any less privileged processes running on the machine.

Past attempts to deal with these problems have included restrictions on creating sym-links and hiding file names, which have both been inadequate. The solution chosen for use with SE Linux (which is also designed to work without SE Linux) is to have polyinstantiated directories based on Unix account name and/or SE Linux context. This means that every user will see a different version of the directory in question based on their context.

In the past this feature has been implemented as part of Multi-Level Security (Dr. Rick Smith [HREF2]) systems under the name multi-level directories. I believe that the multi-level directory variant of this solution was based on file system support, while the Linux support for this type of operation that I will describe is based in the VFS layer and thus does not require modification to any of the file systems that may be used.

Summary of Attacks that can be Prevented by Poly-Instantiated Directories

In this paper I am considering the following attack scenarios:

  1. Attack by user on user (including the case of a non-PI user as attacker or victim)
  2. Attack by user on daemon (including the case of a non-PI user as attacker)
  3. Attack by non-root daemon on user
  4. Attack by root daemon on user (will always succeed without SE Linux)

Each of the above four attack scenarios may occur with one of the following three attacks:

  1. Race-condition attacks on the integrity of processes and data (sym-link attacks, race conditions on renaming objects, or pre-creating a file to take ownership of data)
  2. Leaks of confidential data via secrets in file names
  3. Denial Of Service (DOS) attacks based on race conditions and pre-allocating file/directory names

Other Solutions

One attempt at solving this problem that has been implemented in some Linux security systems is to hide file names. This can work as long as it is not possible to guess any of the file names in question. If the file name can be guessed then the hostile party can attempt to create a new file of the same name, failure to create the file in question indicates existence. But this only solves the problem of secret data in file names.

Another partial attempt at dealing with this problem is controlling the ability to create hard-links and/or sym-links to try and prevent race conditions. A well-known implementation of this is in the OpenWall kernel patch [HREF3] which prevents the user from creating hard-links to files to which they have no write access and from creating sym-links in a +t directory (a directory such as /tmp or /var/tmp) which point to a file that they don’t own. It also prevents writing to named pipes in +t directories which are owned by a different user. This deals with some of the issues related to race-condition attacks but there are potential issues that it does not address, such as a hostile user creating sym-links to their own files to divert output or creating a file with no write permissions as a denial of service against a program that uses a fixed file name.

But this only deals with the case of race conditions used to attack system integrity. It does not prevent DOS attacks or protect secret data when it is used in a file name.

SE Linux Requirements for Shared Directories

SE Linux does not attempt to hide file names in a directory, if the name of a file contains secret data then this can be a security problem on shared directories such as /tmp and /var/tmp, this is an issue that has to be solved outside of the core SE Linux code base.

The SE Linux strict and mls policies provide good protection against most race condition attacks. Most domains are not permitted to create hard links to privileged files (types such as etc_t). Daemons are all protected from sym-link attacks by each other due to being denied access to sym-links created by other daemons and by users, and users are given similar protection against attack by daemons (both root and non-root). The main benefit for PI directories in strict and mls SE Linux systems is for protection against users attacking other users, in most cases large numbers of users will have the same SE Linux domain and therefore there will not be any effective protection against such attacks in the domain-type model (the integrity protection part of SE Linux).

When a Unix account is associated with more than one SE Linux context it is necessary to have multiple instances of the home directory to match the SE Linux context. If there is only one instance of the home directory and different SE Linux contexts are used for user logins then one of the contexts may be denied access to shared files such as .bashrc and .bash_history, or they may serve as information leaks. This use creates a requirement for PI home directories in SE Linux that does not exist for non-SE systems.

The problem of multiple logins with different contexts can occur in the older version of SE Linux (known as the example policy) that was used in Red Hat Enterprise Linux 4 and Fedora Core versions 2 to 4 when running the strict policy that permits multiple roles to be allocated to a user. But this is more of an issue with the newer versions of SE Linux policy that have functional support for MLS labels and the new MCS policy that permits different sets of categories to be assigned to a user session.

Non-SE Linux Requirements for Shared Directories

Polyinstantiation of shared directories also provides benefits for non-SE Linux systems, in fact there are probably more benefits to be gained from using this on non-SE systems. The SE Linux strict policy provides protection against sym-link race condition attacks launched by users against users in different roles, attacks by users against daemons, and attacks by daemons against users. The SE Linux MLS policy provides these benefits and also protects against attack from programs running at different levels, for example a process running at sensitivity level s2 could not be tricked into leaking data to a program running at level s1, even if the two programs ran in the same domain and with the same UID. Also SE Linux prevents unprivileged processes from creating hard links to files that are important to system integrity or data confidentiality (which is almost a complete solution to hard-link based attacks).

A non-SE system has none of the above protections and only has the Unix UID to protect both system integrity and confidentiality of data.

Linux Kernel Support for Poly-Instantiated Directories

In recent versions of Linux the current list of mounted file systems is available from the /proc/mounts file which is a sym-link to /proc/self/mounts, this permits displaying the name-space which applies to the current process. If /etc/mtab is a sym-link to /proc/mounts then programs such as df will display information on the mount points that are associated with the name-space for the process.

The initial support for PI directories was via the CLONE_NEWNS flag to the clone() system call. This flag causes the child process to be allocated a separate name space. That process and each child process that it launched would have a separate name space to the process which called clone(), and to any process that resulted from another call to clone() with the CLONE_NEWNS flag. The problem with this was the requirement that applications be modified to use clone() with this flag instead of using fork().

To solve this problem a new system call sys_unshare [HREF4] was added to the Linux kernel. The unshare system call can create a separate name-space for mounted file systems among other things (the set of kernel datra structures that can be unshared has been steadily increased since the introduction of unshare).

The unshare system call requires the SYS_ADMIN capability but does not require a fork, exec, or other operation. So it can be called from a PAM module and thus work with unmodified login programs. Also it is possible for multiple PAM modules to unshare different kernel data structures.

Shared Subtrees

One obvious problem with the functionality described in the previous section is the situation where the administrator wants to mount file systems and have all users see them, or have daemons mount file systems (such as autofs).

The solution to this is a development known as Shared Subtrees [HREF5]. This gives the option of specifying that certain subtrees will not be shared. For example if the directories /tmp and /var/tmp are being instantiated
then the following commands could be run from a system boot script to cause all other mount operations to propagate to all users:

mount --make-shared /
mount --bind /tmp /tmp
mount --make-private /tmp
mount --bind /var/tmp /var/tmp
mount --make-private /var/tmp

The above commands make the root of the name-space shared and then make /tmp and /var/tmp private. Note that the --make-private option to the mount command only applies to mount points. As on my test system both /tmp and /var/tmp are on the root file system I have to bind mount them to themselves to have a mount point that can be made private. Be aware that if you don’t correctly exclude the PI directories from the shared name space then each user who logs in may get PI directories under another user’s directories, and things generally won’t work.

Design Overview of PI Directories in Linux

The initial design for PI directories was based on having them only created for user sessions at login time by PAM [HREF6] or similar mechanisms. To implement this the PAM module will create a directory under the directory that is being instantiated, create an unshared name space, and then bind mount the new directory over the PI directory. For example if /tmp is to be PI for user rjc then the directory /tmp/tmp.inst-rjc-rjc would be created as the instance of /tmp for the user rjc. After the directory is created an unshared name space would be created via the unshare system call. Finally in the new name space a bind mount would be used to replace /tmp with /tmp/tmp.inst-rjc-rjc, the bind mount operation would be equivalent to the command:

mount --bind /tmp/tmp.inst-rjc-rjc /tmp

The directory that was created was given the Unix permission mode 1777 (all users can create files and directories, but it is only permitted to remove files or directories that you own). This solved many of the problems related to users attacking users and users attacking daemons. But it does not solve the problem of a daemon attacking a user as the daemon has access to the parent of the PI directories. Also there is a configuration option to have a user excluded from the PI directory system, a user who is granted such access (either deliberately or accidentally) would also be able to attack other users. As all directories were created under /tmp with mode 1777 there was no protection of secret file names from daemons and users who were outside the PI system (for most systems I expect that there wil be some users who will be excluded from the PI configuration).

Another problem with the initial implementation was that the directories were all created at login time, therefore a hostile process could guess the names and pre-allocate directories to allow taking over ownership and potentially allowing other race condition attacks. For example any privileged process which relies on files not being unlinked or renamed for correct operation would operate incorrectly (and possibly be subject to attack) if run in a situation where the /tmp directory did not have the mode 1777 to prevent such rename and unlink operations.

Finally the initial implementation did not have a fall-back case for when the desired name for a PI directory had been taken by a file and would cause the login process to abort, this could be used as a DOS attack against user login sessions.

I have identified two possible solutions to the problem of DOS attacks against the pam_namespace module. One solution is to have it check whether the PI directory already exists, if it exists but has the wrong permissions (either Unix or SE Linux) or if there is an object other than a directory using that name then it would try creating the directory under a different name (maybe the original name with “.1″ appended) and keep trying different names until it finds one that is available. This solution does not solve the problem of protecting secret file names.

The other solution I have identified solves the problems of DOS attacks and race conditions as well as the leaks of secret data in file names. This requires that a directory be pre-allocated on the system to contain all PI directories. So instead of a PI directory having the name /tmp/tmp.inst-rjc-rjc it might have the name /tmp/.inst/tmp.inst-rjc-rjc. The /tmp/.inst directory would be created and/or verified at system boot time and would have Unix permission mode 000 (the capability dac_override which every login program posesses would be requred to access it) and would also have a SE Linux context that permits only very restrictive access. Therefore non-root daemons will be denied access to /tmp/.inst and therefore would not be able to launch attacks on users via the /tmp directory. On SE Linux systems root daemons will also be denied such access. If a user session is launched with a shared system name space (through misconfiguration or unusual requirements) then they would also be denied access to the instance of /tmp used by other users.

In the initial design of PI directories the aim was to confine users to prevent them from attacking the rest of the system, and such a confined user was still vulnerable to attack from outside. The second of the two solutions that I propose above is the one that I believe to be the best, it will protect users who have a PI version of a shared directory from attack by non-root daemons on a non-SE system and from attack by root daemons as well on a SE Linux system. It would be a viable option for the sys-admin to give a single user a PI version of /tmp to protect the files for that user while allowing all other users access to the system shared name space.

At the time of writing we have agreement on the concept of using a naming system somewhat like /tmp/.inst/tmp.inst-something where the directory /tmp/.inst will have Unix mode 000 and restrictive SE Linux access controls. This will prevent daemons and users that are not included in the PI configuration from attacking daemons and users that have it enabled. This makes PI protect the user who has such a PI directory as well as protecting the rest of the system from that user. Note that at the time of writing there was no final agreement on the directory names, while the concept of a two-level directory is agreed the actual name of the directory in the default configuration is still to be resolved.

A feature that has been discussed and agreed in concept is to have the pam_namespace.so module check the permissions of the /tmp/.inst directory and abort the login process if the directory does not have Unix permission mode 000, root ownership, and a suitable SE Linux label (if SE Linux is enabled). There will be a configuration option to disable this functionality as not all systems will need this level of protection (and not all administrators will want a system to fail-closed on such a minor security issue).

Currently Released Code

As of the time of writing Fedora Rawhide has a shared object named pam_namespace.so that implements the basic functionality. To use it the PAM configuration files in the /etc/pam.d directory must be modified to have the following line at the end:

session    required     pam_namespace.so

The system will work if the pam_namespace object is not the last in the list, but the creation of the namespace may interfere with some other PAM modules (for example if a PAM module wanted to access files in the /tmp directory) and in general it is safest to have it last. The only situation in which you might not want to have pam_namespace as the last session module is if you are using pam_mkhomedir and also using pam_namespace to provide PI home directories. But currently pam_mkhomedir does not work correctly in situations where PI home directories are desired so this should not be an issue.

The most noteworthy parameter for the pam_namespace module is the optional parameter unmnt_remnt. This is used by programs that run from an unshared namespace and need to create another unshared namespace. The primary example of this is su, all other programs that perform actions which are similar in concept (IE they are run from a user session and launch a new session on behalf of another user) will have the same requirement.

The pam_namespace module uses the configuration file /etc/security/namespace.conf. This file currently has four parameters, the first gives a directory that should be instantiated (there is an option of $HOME for instantiating the user home directory). The second is the name of the real directory to be used for the instance which has variables $USER and $HOME to represent the user-name and the home directory of the user. The third parameter may have as it’s value user, context, or both to indicate whether the instantiation should be based on user-name, SE Linux context, or both. The final parameter is a list of comma-separated user-names for accounts that are exempt from poly-instantiation of the directory in question. I believe that it will be standard practice to include root in this list of accounts (usually there will be no need for other users to be excluded).

Preventing Daemons From Attacking Each Other

In the currently released code there is no protection against daemons attacking each other. I believe that to take advantage of the full benefits offered by PI directories most daemons that run as non-root need the same protection so that they can not attack each other.

In Fedora there is a new program called runuser that will start a daemon as a user other than root. It is linked against PAM and can be configured to call the pam_namespace module. When I finish the debugging then every time it launches a daemon as non-root it will be able to create a new unshared namespace. Non-root daemons that require the system shared name space will need to have their user-names specified in the namespace.conf file.

In Debian daemons are started via a program named start-stop-daemon. I plan to modify this program to have the necessary name-space functionality.

Interesting Features

There is no requirement that the PI directory be a sub-directory of the directory it replaces. In fact it can be on a different filesystem. If you have a separate filesystem for /tmp and don’t want to have a separate filesystem for /var/tmp too then you could just configure namespace.conf such that /var/tmp is instantiated under /tmp. The following is one sample configuration:

/tmp     /tmp/.inst/tmp.inst-$USER-       both      rjc,root
/var/tmp /tmp/.inst/var-tmp.inst-$USER-   both      rjc,root

Conclusion

With a two-level directory configuration (such as /tmp/.inst/whatever) we protect against all the attack scenarios that I consider (including an attack launched by a root-owned daemon on users when running SE Linux). The protection provided by the PI shared directory works in two ways, it protects the process with the unshared namespace and it also protects all other processes on the system against attack from that process.

Most operations that are described in this paper are usable in Fedora Rawhide as of the 31st of May 2006. The only operation that is not usable at this time is PI support for daemons. I hope to have PI working in Debian and have PI support for daemons in both Debian and Fedora Rawhide by the time this paper is published, I will describe my success in these efforts when I present this paper.


Hypertext References

HREF1 http://www.coker.com.au/selinux/
HREF2 http://www.cs.stthomas.edu/faculty/resmith/r/mls/index.html
HREF3 http://www.openwall.com/linux/README.shtml
HREF4 http://marc2.theaimsgroup.com/?l=linux-kernel&m=112350785026703&w=2
HREF5 http://lwn.net/Articles/159092/
HREF6 http://www.kernel.org/pub/linux/libs/pam/


Copyright

The System Administrators Guild of Australia© 2006. The authors assign to The System Administrator’s Guild of Australia and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to The System Administrators Guild of Australia to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.