@BEGIN_HEADER Title: Linux HOWTOs Author: The Internet @END_HEADER The 3 Button Serial Mouse mini-HOWTO Geoff Short, geoff@kipper.york.ac.uk v1.33, 31 May 1998 How to get a 3 button serial mouse working properly under Linux. ______________________________________________________________________ Table of Contents 1. Disclaimer 2. Introduction 3. Serial Ports 4. Switched Mice 5. Normal Mice 6. Switching a Mouse to 3-Button Mode 7. Wheeled mice 8. Using gpm to Switch Mouse Modes 9. Using two mice 10. XF86Config and Xconfig file examples 11. Cables, extensions and adaptors 12. Miscellaneous Problems and Setups 13. Models Tested 14. Further Information 15. Mouse Tail ______________________________________________________________________ 1. Disclaimer The following document is offered in good faith as comprising only safe programming and procedures. No responsibility is accepted by the author for any loss or damage caused in any way to any person or equipment, as a direct or indirect consequence of following these instructions. 2. Introduction The most recent version of this document can always be found at http://kipper.york.ac.uk/mouse.html There is a Japanese translation at http://jf.gee.kyoto-u.ac.jp/JF/JF- ftp/euc/3-Button-Mouse.euc; and a French one at http://www.freenix.fr/linux/HOWTO/mini/3-Button-Mouse.html. Other translations may be available - check your local LDP mirrors. Most X applications are written with the assumption that the user will be working with a 3 button mouse. Serial mice are commonly used on computers and are cheap to buy. Many of these mice have 3 buttons and claim to use the Microsoft protocol, which in theory means they are ideal for the X windows setup. (The record for the cheapest working 3 button mouse currently stands at $1.14!) Most dual-protocol mice will work in two modes: · 2-button Microsoft mode. · 3-button MouseSystems mode. This document leads you through the different steps needed to configure your mouse in these two different modes, especially the steps needed to use the more useful 3-button mode. As distributions become easier to set up, some of the problems ought to go away. For instance, RedHat have a mouseconfig program to set things up for you. However, some versions of RH5.0 had a bug in mouseconfig, so make sure you check for patches. 3. Serial Ports The first thing to do is to make sure the software can find the mouse. Work out which serial port your mouse is connected to - usually this will be /dev/ttyS0 (COM1 under DOS) or /dev/ttyS1 (COM2). (ttyS0 is usually the 9 pin socket, ttyS1 the 25 pin socket, but of course there is no hard and fast rule about these things.) There are also an equivalent number of /dev/cua devices, which are almost the same as the ttyS ones, but their use is now discouraged. For convenience make a new link /dev/mouse pointing at this port. For instance, for ttyS0: ln -s /dev/ttyS0 /dev/mouse 4. Switched Mice Some mice, not usually the cheapest ones, have a switch on the bottom marked `2/3'. Sometimes this may be `PC/MS'. In this case the `2' setting is for 2 button Microsoft mode, and the `3' for 3 button MouseSystems mode. The `PC/MS' switch is a bit more complicated. You will probably find the `MS' setting is for Microsoft, and the `PC' is for MouseSystems. You may find the `PC' setting described as ps/2 mode, but it should do MouseSystems as well. If you have such a mouse, you can switch the switch to `3' or `PC', put the MouseSystems settings in your XConfigs (see below) and the mouse should work perfectly in 3-button mode. 5. Normal Mice If you don't have any switches, and no instructions, then a little bit of experimentation is needed. The first thing to try is to assume the mouse maker is telling the truth, and the mouse is full Microsoft. Set up your Xconfigs to expect a Microsoft mouse (see the Xconfig section) and give it a try. If the mouse didn't work at all, then you don't have a Microsoft mouse, or there is some other problem. Try the other protocols in the configs, the man page for the config file is the best place to start looking. Also look in the Miscellaneous Problems section below. What you will probably find is that when you run X, the mouse works fine but only the outer two buttons do anything. You can of course accept this, and emulate the third button (press both buttons at once to click the middle one) like you do with a two button mouse. To do this, change your Xconfig file as shown in the Xconfig example section below. This may mean you have bought a 3 button mouse for no good reason, and you are certainly no further forward. So, now you need to look at your hardware. 6. Switching a Mouse to 3-Button Mode Even cheap mice can also work under the Mouse Systems protocol, with all three buttons working. The trick is to get the mouse to think it's a Mouse Systems one, something you rarely see in your instructions. · Before you power up your computer, hold down the left mouse button (and keep it held down until it has booted to be on the safe side). When the mouse first gets power, if the left button is held down it switches into Mouse Systems mode. A simple fact, but not always publicised. Note that a soft reboot of your computer may not cut the mouse power and therefore may not work. There are a number of other ways of switching the mode, which may or may not work with your particular mouse. Some of these are less drastic than rebooting your computer, two are more so! · If your computer is get-at-able you can unplug the mouse and plug it back in with the button held down (although you shouldn't normally plug things in to a live computer, the RS232 spec says it is OK). · You may be able to reset the mouse by typing echo "*n" > /dev/mouse, which should have the same effect as unplugging it. Hold the left button down for Mouse Systems mode, not for Microsoft. You could put this in whatever script you use to start X up. · Bob Nichols (rnichols@interaccess.com) has written a small c program to do the same thing, which may work if echo "*n" does not (and vice versa). You can find a copy of his source code at http://kipper.york.ac.uk/src/fix-mouse.c · Someone has reported that the `ClearDTR' line in the Xconfig is enough to switch their mouse into Mouse Systems mode. · If you are brave enough, open the mouse up (remember that this will invalidate your warranty) and have a look inside. In some cases, the mouse may have a switch inside, for some strange reason known only to the manufacturer. More likely on the cheap mice is a jumper which you can move. The switch or jumper may have the same effect as a `MS/PC' switch described in the ``Switched Mice section'' above. You may find that the circuit board is designed for a switch between 2 & 3 buttons, but it hasn't been fitted. It will look something like: ----------- | o | o | o | SW1 ----------- 1 2 3 Try linking pins 1-2 or 2-3, and see if it changes the behaviour of the mouse. If it does, you can either fit a small switch, or solder across the contacts for a quick and permanent solution. · Another soldering solution which might be a last-resort for mice which don't understand MouseSystems at all, from Peter Benie (pjb1008@chiark.chu.cam.ac.uk). If the middle button's switch is double-pole, connect one side of the switch to the left button's switch, and the other side to right button's switch. If it's not a double pole switch then use diodes rather than wire. Now, the middle button pushes the left and right buttons down together. Select ChordMiddle in the XF86Config and you have a working middle button. · The ultimate recourse with the soldering iron was first described to me by Brian Craft (bcboy@pyramid.bio.brandeis.edu). Two common generic mouse chips are the 16 pin Z8350, and the 18 pin HM8350A. On each of these chips, one pin controls the mode of the chip, as follows. Pin 3 Mode ----- ---- Open Default Microsoft. Mouse Systems if a button is held on power-up. GND Always Mouse Systems. Vdd Always Microsoft. (Pins are numbered as follows:) ____ pin1 -| \/ |- pin2 -| |- pin3 -| |- -| |- -| |- -| |- -| |- pin8 -|____|- (This info comes courtesy of Hans-Christoph Wirth, and Juergen Exner, who posted it to de.comp.os.linux.hardware) You can solder a link between pin 3 and gnd, which will fix the mouse into MouseSystems mode. · Peter Fredriksson (peterf@lysator.liu.se) has tried the SYSGRATION SYS2005 chip, and found that linking Pin 3 to Gnd forced Mouse System mode. · Uli Drescher (ud@digi.ruhr.de) confirms it works on an HN8348A chip; Ben Ketcham (bketcham@anvilite.murkworks.net) confirms the HM8348A (Pin 9 is Gnd). · Urban Widmark (ubbe@ts.umu.se) says the same applies to the EC3567A1 chip, where Pin 8 is ground. I've tried it as well and it works fine. · Timo T Metsala (metsala@cc.helsinki.fi) has found that on the HT6510A chip pin 3 is mode select, pin 9 is Gnd. The same works for the HT6513A chip. Holtek also make HT6513B and HT6513F chips - on these, pin 8 is Gnd. · Robert Romanowski (robin@cs.tu-berlin.de) says pin 3 - pin 8 (Gnd) works on an EM83701BP chip too. · Robert Kaiser (rkaiser@sysgo.de) confirms that pin 3 - Gnd works on a EC3576A1 chip too. · Sean Cross (secross@whidbey.com) found it was pin 2 - pin 7 (Gnd) on a HM8370GP chip. · Peter Fox (fox@roestock.demon.co.uk) used pin 3 - pin 8 on a HM8348A chip. · Jon Klein (jbklein@mindspring.com) found pin 3 - pin 9 did the trick for a UA5212S chip. · As an alternative to the above soldering methods, you can get the mouse to hold it's own button down when booting: this circuit from Mathias Katzer. ----- --- R ---------O------ + Supply | ----- | | C = 100nF capacitor | | E | R = 100kOhm | __ / | T = BC557 transistor | / \ O | B | #V | T / |-----|-# | / Left button switch of the mouse | | #\ | O | \__/ | --- \ C | --- C ------O----------> (to somewhere deep inside the mouse) | ### Ground The test mouse was a no-name model MUS2S - whether this works in other mice depends on the circuit of the mouse; if the switch is connected to ground and not to +Supply, an npn-transistor like the BC547 should work; R and C have to be swapped then, too. So there you have it, the choice is yours. Stick with the default Microsoft two buttons, or work out how to switch the mode and set X up to take advantage of this. 7. Wheeled mice Mice with wheels have emerged in the last few years, starting with the Microsoft Intellimouse and spreading to other manufacturers. The wheel can be clicked like a button, or rolled up and down. Far and away the best reference for information is http://www.inria.fr/koala/colas/mouse-wheel-scroll/ which describes how to get lots of X applications to recognise the scrolling action. In general, you'll need a fairly new Xserver to use the scrolling action, but some older servers will recognise the clicking actions. For instance, the Intellimouse is supported by XFree 3.3.1 and later. 8. Using gpm to Switch Mouse Modes gpm is the program that lets you use the mouse in console mode. It is usually included in linux distributions, and can be started from the command line or in the startup script /etc/rc.d/rc.local. Note that distributions don't always have the most recent version (1.13 at time of writing) which can be found on mirrors of sunsite.unc.edu. The main modes for serial mice under gpm are: gpm -t ms gpm -t msc gpm -t help for Microsoft or MouseSystems modes, or to probe the mouse for you and tell you what it found. To run gpm in MouseSystems mode, you may need a -3 flag, and possibly a DTR option, using the -o dtr flag: gpm -3 -o dtr -t msc gpm is often able to recognise all three buttons of the mouse even in Microsoft mode. And newer versions (Version 1.0 and later (?)) can then make this information available to other programs. For this to work, you need to run gpm with the -R tag, like this: gpm -R -t ms This will make gpm re-export the mouse data to a new device, called /dev/gpmdata, which looks like a mouse to any other program. Note that this device always uses the MouseSystems protocol. You can then set your Xconfig to use this instead of /dev/mouse as shown below, but of course you must ensure gpm is always running when you use X. Some people have reported that some middle-button events are not correctly interpreted by X using this technique, this may be down to an individ­ ual mouse setup. Changing button mapping for gpm and X (gustafso@math.utah.edu) You may find that gpm uses different default button mappings to X, so using both systems on the same machine can be confusing. To make X use the same buttons for select and paste operations as gpm, use the X command xmodmap -e "pointer = 1 3 2" which causes the left button to select and the right button to paste, for either 2-button or 3-button mice. To force gpm to use the X stan­ dard button mapping, start it with a -B command, eg: gpm -t msc -B 132 9. Using two mice In some cases, for instance a laptop with a built-in pointing device, you may wish to use a serial mouse as a second device. In most cases the built-in device uses the PS/2 protocol, and can be ignored if you don't wish to use it. Simply configure gpm or X to use /dev/ttyS0 (or whatever) as usual. To use both at once, you can use gpm -M to re-export the devices. More details in the gpm man page. Also, XFree 3.3.1 and later support muliple input devices, using the XInput mechanism. Auto-generated XF86Config files should have the necessary comments in them. 10. XF86Config and Xconfig file examples The location of your configuration file for X depends on the particular release and distribution you have. It will probably be either /etc/Xconfig, /etc/XF86Config or /usr/X11/lib/X11/XF86Config. You should see which one it is when you start X - it will be echoed to the screen before all the options are displayed. The syntax is slightly different between the XF86Config and Xconfig files, so both are given. Microsoft Serial Mouse · XF86config: Section "Pointer" Protocol "microsoft" Device "/dev/mouse" EndSection · Xconfig: # # Mouse definition and related parameters # Microsoft "/dev/mouse" Microsoft Serial Mouse with Three Button Emulation · XF86config: Section "Pointer" Protocol "microsoft" Device "/dev/mouse" Emulate3Buttons EndSection · Xconfig: # # Mouse definition and related parameters # Microsoft "/dev/mouse" Emulate3Buttons MouseSystems Three Button Serial Mouse · XF86config: Section "Pointer" Protocol "mousesystems" Device "/dev/mouse" ClearDTR # These two lines probably won't be needed, ClearRTS # try without first and then just the DTR EndSection · Xconfig: # # Mouse definition and related parameters # MouseSystems "/dev/mouse" ClearDTR # These two lines probably won't be needed, ClearRTS # try without first and then just the DTR Microsoft Serial Mouse with gpm -R · XF86config: Section "Pointer" Protocol "MouseSystems" Device "/dev/gpmdata" EndSection · Xconfig: # # Mouse definition and related parameters # MouseSystems "/dev/gpmdata" 11. Cables, extensions and adaptors The only wires needed in a mouse cable are as follows: TxD and RxD for data transfer, RTS and/or DTR for power sources, and ground. Translated into pin numbers, they are: 9-pin port 25-pin port TxD 3 2 RxD 2 3 RTS 7 4 DTR 4 20 Gnd 5 7 The above table may be of use if you wish to make adaptors between 9- and 25-pin plugs, or extension cables. 12. Miscellaneous Problems and Setups · If you have trouble with your mouse in X or console mode, check you are not running a getty on the serial line, or anything else such as a modem for that matter. Also check for IRQ conflicts. · It is possible that you need to hold down the left button when booting X windows. Some systems may send some sort of signal or spike to the mouse when X starts. · Problems with serial devices may be due to the serial port not being initialised correctly at boot. This is done by the setserial command, run from the start-up script /etc/rc.d/rc.serial. Check the man page for setserial and the Serial-HOWTO for more details. It may be worth a little experimentation with types, for instance try setserial /dev/mouse uart 16550 or 16550a regardless of what port you actually have. (For instance, mice don't like the 16c550AF). · The ClearDTR flag may not work properly on some systems, unless you disable the RTS/CTS handshaking with the command: stty -crtscts < /dev/mouse (Tested on UART 16450/Pentium by Vladimir Geogjaev geog­ jaev@wave.sio.rssi.ru) · Logitech mice may require the line ChordMiddle to enable the middle of the three buttons to work. This line replaces Emulate3Buttons or goes after the /dev/mouse line in the config file. You may well need the ClearDTR and ClearRTS lines in your Xconfig. Some Logitech mice positively do not need the ChordMiddle line - one symptom of this problem is that menus seem to move with the mouse instead of scrolling down. (From: chang@platform.com) · Swapping buttons: use the xmodmap command to change which physical button registers as each mouse click. eg: xmodmap -e "pointer = 3 2 1" will turn round the buttons for use in the left hand. If you only have a two-button mouse then it's just numbers 1 & 2. · Acceleration: use the xset m command to change the mouse settings. eg xset m 2 will set the acceleration to 2. Look at the manpage for full details. · Pointer offset: If the click action appears to be coming from the left or right of where the cursor is, it may be that your screen is not aligned. This is a problem with the S3 driver, which you may be able to fix using xvidtune. Try Invert_VCLK/InvertVCLK, or EarlySC. This info from Bill Lavender (lavender@MCS.COM) and Simon Hargrave. In the XF86Config, it might look like this: Subsection "Display" Modes "1024x768" "800x600" "640x480" "1280x1024" Invert_VCLK "*" 1 ... · If you are getting `bouncing' of the mouse buttons, ie two clicks when you only wanted one, there may be something wrong with the mouse. This problem has been solved for Logitech mice by Bob Nichols (rnichols@interaccess.com) and involves soldering some resistors and a chip in the mouse to debounce the microswitches. · If some users cannot get the mouse to work but some (eg root) can, it is possible that the users are not running exactly the same thing - for instance a different version of X or a different Xconfig. Check the X start-up messages carefully to make sure. · If you find the mouse pointer is erasing things from your screen, you have a server config problem. Try adding the option linear, or maybe nolinear to the graphics card section, or if it is a PCI board, the options tgui_pci_write_off and tgui_pci_read_off. (This seems to be a Trident Card problem.) · If the mouse cursor doesn't show up on the screen, but otherwise seems to be working, try the option "sw_cursor" in the Device section of the config file. · If your mouse stops working when its sunny or when you turn a light on, it may be that the sensors are being swamped by light getting through the case. You could try painting the inside of the case black, or putting some card in the top. · Microsoft Brand mice are often a cause of problems. The newest ``Microsoft Serial Mouse 2.1A'' has been reported not to work on many systems, although unplugging it and plugging it in again may help. gpm version 1.13 and higher should also support 2.1A mice, using the pnp mouse type. (See the gpm section for how to re- export this.) The ``Microsoft Intellimouse'' also causes problems, although it should now be supported by XFree version 3.3 and later. 13. Models Tested There are a lot of different mice out there, and I cannot honestly say that you should go out and buy one rather than the other. What I can do is give a list of what I think these mice do, based on experience and heresay. Even with this information you should be a little cautious - we had two identical mice in our office on two computers, some things worked on one and not t'other! Any additions to this list would be welcome. Mouse Systems optical mouse, serial version Works well (as you might expect from the name!) without ClearDTR or ClearRTS in the config. WiN mouse, as sold by Office World for eight quid. Standard dual-mode Microsoft/MouseSystems. Agiler Mouse 2900 Standard dual-mode Microsoft/MouseSystems. SYSGRATION SYS2005 chip is solderable. Sicos mouse, Works ok, needs ClearDTR & Clear RTS in config. Index sell a mouse for 10 quid, Doesn't work in 3 button mode, but does have nice instructions :-) Artec mouse Usual dual-protocol mouse, needs `ClearDTR' set in config, NOT `ClearRTS' DynaPoint 3 button serial mouse. Usual dual-protocol mouse, needs `ClearDTR' AND `ClearRTS' in Xconfig. Genius Easymouse 3 button mouse Works fine with Mouseman protocol without the ChordMiddle parameter set. From Roderick Johnstone (rmj@ast.cam.ac.uk) Truemouse, made in Taiwan Works OK, needs `ClearDTR' in config. (From Tim MacEachern) Champ brand mouse Needs to have switch in PC mode, which enables MouseSystems protocol also. (From tnugent@gucis.cit.gu.edu.au) MicroSpeed mouse Usual dual-protocol mouse. Venus brand ($7) Has a jumper inside to switch between 2 and 3 button mode. (From mhoward@mth.com ) Saturn Switched mouse, works OK as MouseSystems in 3-button position. (From grant@oj.rsmas.miami.edu .) Manhattan mouse. Switch for `MS AM' / `PC AT' modes, MS mode works fine with the gpm -R method. (From komanec@umel.fee.vutbr.cz). Inland mouse. Switch for `PC/MS' modes, works fine. (From http://ptsg.eecs.berkeley.edu/~venkates). qMouse (3-button), FCC ID E6qmouse X31. Sells in the USA for about $10. Works with `gpm -t msc -r 20'. No jumpers or switches for MouseSystems 3-button mode. Unreliable in X. Does not respond to echo "*n" > /dev/mouse. Mitsumi Mouse (2-button), FCC ID EW4ECM-S3101. Sells in the USA for about $12. Reliable in X and under gpm, smooth double-button. (These two from gustafso@math.utah.edu) PC Accessories mouse that i got from CompUSA for under $10. Has PC/MS switch on bottom. Works OK. (From steveb@communique.net) First Mouse - seriously cheap at 7.79 pounds at Tempo. Dual Microsoft/MouseSystems, mode set by button depress at power-up. No switches, no links. Four wire connection, echo '*n' doesn't work. `gpm -R' works a treat. (From peterk@henhouse.demon.co.uk) Trust 3-button mouse. Dual-mode with switch, works OK as MouseSystems in `PC' mode. gpm doesn't like the Microsoft mode. Chic 410 Works perfectly when kept in ms mode and used with the gpm -R command. From Stephen M. Weiss (steve@esc.ie.lehigh.edu) KeyMouse 3-button mouse. Works OK with ClearDTR and ClearRTS in Xconfig; `-o dtr' needed with gpm. (From EZ4PHIL@aol.com) Qtronix keyboard `Scorpio 60' All three buttons work in MouseSystems protocol. (From hwe@uebemc.siemens.de) Tecra 720 laptop The glidepoint is on /dev/cua0; the stick is on /dev/psaux. (From apollo@anl.gov) Anubis mouse Works fine, need to hold down left button whenever switching to the X virtual console. (From Joel Crisp) Yakumo No.1900 mouse Works with gpm -R -t ms exporting to X. (From Oliver Schwank) Genius `Easy Trak' Trackball Is not Microsoft compatible, use Mouseman in the Xconfig and it will work fine. (From VTanger@aol.com.) Highscreen Mouse Pro `Works fine' says alfonso@univaq.it. Logitech CA series Works in X using MMseries protocol, at 2400 Baud, 150 SampleRate. (Should also apply to Logitech CC, CE, C7 & C9 mice). (From vkochend@nyx.net.) A4-Tech mouse Works OK, needs DTR line under both X and gpm. (From deane@gooroos.com) Vertech mouse Normal Microsoft/Mousesystems behaviour, can be soldered for a permenant fix. (From duncan@fs3.ph.man.ac.uk.) Boeder M-7 ``Bit Star'' (and other M series apart from M13) Switches to Mousesystems protocol by holding any button down at power- on. (From mailto:sjt@tappin.force9.co.uk.) Mouse Systems ``Scroll'' Mouse (four buttons and a roller/button) Has a 2/3 switch - in mode 3 functions as a three button MouseSystems mouse, ignoring extra button & wheel. Doesn't need ClearRTS/DTR. (From parker1@airmail.net.) Radio Shack 3-button Serial Mouse Model 26-8432, available in Tandy for about 20 quid. Works as Mousesystems with ClearDTR. (From Sherilyn@sidaway.demon.co.uk.) Dexxa serial mouse Works fine using Microsoft protocol in Xconfig, no ChordMiddle or anything needed. (From mailto:slevy@ncsa.uiuc.edu.) Belkin 3 button mouse As purchased from Sears (\$10), needs -o rts under gpm (and probably ClearRTS under X) when in PC mode. (From mailto:mmicek@csz.com.) 14. Further Information · Mouse Systems has a web site at http://www.mousesystems.com/. They have a Windows driver if you need one. · The Linux Serial HOWTO is available from mirrors of sunsite around the world. If you don't know where your nearest mirror is, start at http://sunsite.unc.edu/mdw/linux.html · There is a very good explanation of how mice work at http://www.4QD.co.uk/faq/meece.html. · Fuller details of the Xconfig and XF86Config files are found on the relevant man pages, and in the documentation about installing X windows such as the Xfree86 HOWTO. Also, see the XFree86 FAQ at a mirror of http://www.XFree86.org/. · Information about gpm can be found on the man page, also try the web page of Darin Ernst at http://www.castle.net/X- notebook/mouse.txt. · Lots of information on mice hardware and software can be found at http://www.hut.fi/Misc/Electronics/pc/interface.html#mouse 15. Mouse Tail Much of the information for this document has been trawled from the various linux newsgroups. I am sorry I did not keep a record of everyone who has indirectly contributed by this route, thank you all very much. So, to sum up: · Even cheap 3 button Microsoft mice can be made to work. · Configure your copy of X to expect a Mouse Systems mouse. · Hold down the left button at power-on to switch the mouse to MouseSystems mode. · You might need to hold the left button down when starting X. · Mice are more intelligent than you think. The Linux 3Dfx HOWTO Bernd Kreimeier (bk@gamers.org) v1.16, 6 February 1998 This document describes 3Dfx graphics accelerator chip support for Linux. It lists some supported hardware, describes how to configure the drivers, and answers frequently asked questions. ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Contributors and Contacts 1.2 Acknowledgments 1.3 Revision History 1.4 New versions of this document 1.5 Feedback 1.6 Distribution Policy 2. Graphics Accelerator Technology 2.1 Basics 2.2 Hardware configuration 2.3 A bit of Voodoo Graphics (tm) architecture 3. Installation 3.1 Installing the board 3.1.1 Troubleshooting the hardware installation 3.1.2 Configuring the kernel 3.1.3 Configuring devices 3.2 Setting up the Displays 3.2.1 Single screen display solution 3.2.2 Single screen dual cable setup 3.2.3 Dual screen display solution 3.3 Installing the Glide distribution 3.3.1 Using the detect program 3.3.2 Using the test programs 4. Answers To Frequently Asked Questions 5. FAQ: Requirements? 5.1 What are the system requirements? 5.2 Does it work with Linux-Alpha? 5.3 Which 3Dfx chipsets are supported? 5.4 Is the Voodoo Rush (tm) supported? 5.5 Which boards are supported? 5.6 How do boards differ? 5.7 What about AGP? 6. FAQ: Voodoo Graphics (tm)? 3Dfx? 6.1 Who is 3Dfx? 6.2 Who is Quantum3D? 6.3 What is the Voodoo Graphics (tm)? 6.4 What is the Voodoo Rush (tm)? 6.5 What is the Voodoo 2 (tm)? 6.6 What is VGA pass-though? 6.7 What is Texelfx or TMU? 6.8 What is a Pixelfx unit? 6.9 What is SLI mode? 6.10 Is there a single board SLI setup? 6.11 How much memory? How many buffers? 6.12 Does the Voodoo Graphics (tm) do 24 or 32 bit color? 6.13 Does the Voodoo Graphics (tm) store 24 or 32 bit z-buffer per pixel? 6.14 What resolutions does the Voodoo Graphics (tm) support? 6.15 What texture sizes are supported? 6.16 Does the Voodoo Graphics (tm) support paletted textures? 6.17 What about overclocking? 6.18 Where could I get additional info on Voodoo Graphics (tm)? 7. FAQ: Glide? TexUS? 7.1 What is Glide anyway? 7.2 What is TexUS? 7.3 Is Glide freeware? 7.4 Where do I get Glide? 7.5 Is the Glide source available? 7.6 Is Linux Glide supported? 7.7 Where could I post Glide questions? 7.8 Where to send bug reports? 7.9 Who is maintaining it? 7.10 How can I contribute to Linux Glide? 7.11 Do I have to use Glide? 7.12 Should I program using the Glide API? 7.13 What is the Glide current version? 7.14 Does it support multiple Texelfx already? 7.15 Is Linux Glide identical to DOS/Windows Glide? 7.16 Where to I get information on Glide? 7.17 Where to get some Glide demos? 7.18 What is ATB? 8. FAQ: Glide and XFree86? 8.1 Does it run with XFree86? 8.2 Does it only run full screen? 8.3 What is the problem with AT3D/Voodoo Rush (tm) boards? 8.4 What about GLX for XFree86? 8.5 Glide and commerical X Servers? 8.6 Glide and SVGA? 8.7 Glide and GGI? 9. FAQ: OpenGL/Mesa? 9.1 What is OpenGL? 9.2 Where to get additional information on OpenGL? 9.3 Is Glide an OpenGL implementation? 9.4 Is there an OpenGL driver from 3Dfx? 9.5 Is there a commercial OpenGL for Linux and 3Dfx? 9.6 What is Mesa? 9.7 Does Mesa work with 3Dfx? 9.8 How portable is Mesa with Glide? 9.9 Where to get info on Mesa? 9.10 Where to get information on Mesa Voodoo? 9.11 Does Mesa support multitexturing? 9.12 Does Mesa support single pass trilinear mipmapping? 9.13 What is the Mesa "Window Hack"? 9.14 How about GLUT? 10. FAQ: But Quake? 10.1 What about that 3Dfx GL driver for Quake? 10.2 Is there a 3Dfx based glQuake for Linux? 10.3 Does glQuake run in an XFree86 window? 10.4 Known Linux Quake problems? 10.5 Know Linux Quake security problems? 10.6 Does LinuxQuake use multitexturing? 10.7 Where can I get current information on Linux glQuake? 11. FAQ: Troubleshooting? 11.1 Has this hardware been tested? 11.2 Failed to change I/O privilege? 11.3 Does it work without root privilege? 11.4 Displayed images looks awful (single screen)? 11.5 The last frame is still there (single or dual screen)? 11.6 Powersave kicks in (dual screen)? 11.7 My machine seem to lock (X11, single screen)? 11.8 My machine locks (single or dual screen)? 11.9 My machine locks (used with S3 VGA board)? 11.10 No address conflict, but locks anyway? 11.11 Mesa runs, but does not access the board? 11.12 Resetting dual board SLI? 11.13 Resetting single board SLI? ______________________________________________________________________ 1. Introduction This is the Linux 3Dfx HOWTO document. It is intended as a quick reference covering everything you need to know to install and configure 3Dfx support under Linux. Frequently asked questions regarding the 3Dfx support are answered, and references are given to some other sources of information on a variety of topics related to computer generated, hardware accelerated 3D graphics. This information is only valid for Linux on the Intel platform. Some information may be applicable to other processor architectures, but I have no first hand experience or information on this. It is only applicable to boards based on 3Dfx technology, any other graphics accelerator hardware is beyond the scope of this document. 1.1. Contributors and Contacts This document would not have been possible without all the information contributed by other people - those involved in the Linux Glide port and the beta testing process, in the development of Mesa and the Mesa Voodoo drivers, or rewieving the document on behalf of 3Dfx and Quantum3D. Some of them contributed entire sections to this document. Daryll Strauss daryll@harlot.rb.ca.us did the port, Paul J. Metzger pjm@rbd.com modified the Mesa Voodoo driver (written by David Bucciarelli tech.hmw@plus.it) for Linux, Brian Paul brianp@RA.AVID.COM integrated it with his famous Mesa library. With respect to Voodoo Graphics (tm) accelerated Mesa, additional thanks has to go to Henri Fousse, Gary McTaggart, and the maintainer of the 3Dfx Mesa for DOS, Charlie Wallace Charlie.Wallace@unistudios.com. The folks at 3Dfx, notably Gary Sanders, Rod Hughes, and Marty Franz, provided valuable input, as did Ross Q. Smith of Quantum3D. The pages on the Voodoo Extreme and Operation 3Dfx websites provided useful info as well, and in some case I relied on the 3Dfx local Newsgroups. The Linux glQuake2 port that uses Linux Glide and Mesa is maintained by Dave Kirsch zoid@idsoftware.com. Thanks to all those who sent e-mail regarding corrections and updates, and special thanks to Mark Atkinson for reminding me of the dual cable setup. Thanks to the SGML-Tools package (formerly known as Linuxdoc-SGML), this HOWTO is available in several formats, all generated from a common source file. For information on SGML-Tools see its homepage at pobox.com/~cg/sgmltools. 1.2. Acknowledgments 3Dfx, the 3Dfx Interactive logo, Voodoo Graphics (tm), and Voodoo Rush (tm) are registered trademarks of 3Dfx Interactive, Inc. Glide, TexUS, Pixelfx and Texelfx are trademarks of 3Dfx Interactive, Inc. OpenGL is a registered trademark of Silicon Graphics. Obsidian is a trademark of Quantum3D. Other product names are trademarks of the respective holders, and are hereby considered properly acknowledged. 1.3. Revision History Version 1.03 First version for public release. Version 1.16 Current version v1.16 6 February 1998. 1.4. New versions of this document You will find the most recent version of this document at www.gamers.org/dEngine/xf3D/. New versions of this document will be periodically posted to the comp.os.linux.answers newsgroup. They will also be uploaded to various anonymous ftp sites that archive such information including ftp://sunsite.unc.edu/pub/Linux/docs/HOWTO/. Hypertext versions of this and other Linux HOWTOs are available on many World-Wide-Web sites, including sunsite.unc.edu/LDP/. Most Linux CD-ROM distributions include the HOWTOs, often under the /usr/doc/directory, and you can also buy printed copies from several vendors. If you make a translation of this document into another language, let me know and I'll include a reference to it here. 1.5. Feedback I rely on you, the reader, to make this HOWTO useful. If you have any suggestions, corrections, or comments, please send them to me ( bk@gamers.org), and I will try to incorporate them in the next revision. Please add HOWTO 3Dfx to the Subject-line of the mail, so procmail will dump it in the appropriate folder. Before sending bug reports or questions, please read all of the information in this HOWTO, and send detailed information about the problem. If you publish this document on a CD-ROM or in hardcopy form, a complimentary copy would be appreciated. Mail me for my postal address. Also consider making a donation to the Linux Documentation Project to help support free documentation for Linux. Contact the Linux HOWTO coordinator, Tim Bynum (linux-howto@sunsite.unc.edu), for more information. 1.6. Distribution Policy Copyright (c) 1997, 1998 by Bernd Kreimeier. This document may be distributed under the terms set forth in the LDP license at sunsite.unc.edu/LDP/COPYRIGHT.html. This HOWTO is free documentation; you can redistribute it and/or modify it under the terms of the LDP license. This document is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the LDP license for more details. 2. Graphics Accelerator Technology 2.1. Basics This section gives a very cursory overview of computer graphics accelerator technology, in order to help you understand the concepts used later in the document. You should consult e.g. a book on OpenGL in order to learn more. 2.2. Hardware configuration Graphics accelerators come in different flavors: either as a separate PCI board that is able to pass through the video signal of a (possibly 2D or video accelerated) VGA board, or as a PCI board that does both VGA and 3D graphics (effectively replacing older VGA controllers). The 3Dfx boards based on the Voodoo Graphics (tm) belong to the former category. We will get into this again later. If there is no address conflict, any 3D accelerator board could be present under Linux without interfering, but in order to access the accelerator, you will need a driver. A combined 2D/3D accelerator might behave differently. 2.3. A bit of Voodoo Graphics (tm) architecture Usually, accessing texture memory and frame/depth buffer is a major bottleneck. For each pixel on the screen, there are at least one (nearest), four (bi-linear), or eight (tri-linear mipmapped) read accesses to texture memory, plus a read/write to the depth buffer, and a read/write to frame buffer memory. The Voodoo Graphics (tm) architecture separates texture memory from frame/depth buffer memory by introducing two separate rendering stages, with two corresponding units (Pixelfx and Texelfx), each having a separate memory interface to dedicated memory. This gives an above-average fill rate, paid for restrictions in memory management (e.g. unused framebuffer memory can not be used for texture caching). Moreover, a Voodoo Graphics (tm) could use two TMU's (texture management or texelfx units), and finally, two Voodoo Graphics (tm) could be combined with a mechanism called Scan-Line Interleaving (SLI). SLI essentially means that each Pixelfx unit effectively provides only every other scanline, which decreases bandwidth impact on each Pixelfx' framebuffer memory. 3. Installation Configuring Linux to support 3Dfx accelerators involves the following steps: 1. Installing the board. 2. Installing the Glide distribution. 3. Compiling, linking and/or running the application. The next sections will cover each of these steps in detail. 3.1. Installing the board Follow the manufacturer's instructions for installing the hardware or have your dealer perform the installation. It should not be necessary to select settings for IRQ, DMA channel, either Plug&Pray (tm) or factory defaults should work. The add-on boards described here are memory mapped devices and do not use IRQ's. The only kind of conflict to avoid is memory overlap with other devices. As 3Dfx does not develop or sell any boards, do not contact them on any problems. 3.1.1. Troubleshooting the hardware installation To check the installation and the memory mapping, do cat /proc/pci. The output should contain something like ______________________________________________________________________ Bus 0, device 12, function 0: VGA compatible controller: S3 Inc. Vision 968 (rev 0). Medium devsel. IRQ 11. Non-prefetchable 32 bit memory at 0xf4000000. Bus 0, device 9, function 0: Multimedia video controller: Unknown vendor Unknown device (rev 2). Vendor id=121a. Device id=1. Fast devsel. Fast back-to-back capable. Prefetchable 32 bit memory at 0xfb000000. ______________________________________________________________________ for a Diamond Monster 3D used with a Diamond Stealth-64. Additionally a cat /proc/cpuinfo /proc/meminfo might be helpfull for tracking down conflicts and/or submitting a bug report. With current kernels, you will probably get a boot warning like ______________________________________________________________________ Jun 12 12:31:52 hal kernel: Warning : Unknown PCI device (121a:1). Please read include/linux/pci.h ______________________________________________________________________ which could be safely ignored. If you happen to have a board not very common, or have encountered a new revision, you should take the time to follow the advice in /usr/include/linux/pci.h and send all neces- sary information to linux-pcisupport@cao-vlsi.ibp.fr. If you experience any problems with the board, you should try to verify that DOS and/or Win95 or NT support works. You will probably not receive any useful response from a board manufacturer on a bug report or request regarding Linux. Having dealt with the Diamond support e-mail system, I would not expect useful responses for other operating systems either. 3.1.2. Configuring the kernel There is no kernel configuration necessary, as long as PCI support is enabled. The Linux Kernel HOWTO should be consulted for the details of building a kernel. 3.1.3. Configuring devices The current drivers do not (yet) require any special devices. This is different from other driver developments (e.g. the sound drivers, where you will find a /dev/dsp and /dev/audio). The driver uses the /dev/mem device which should always be available. In consequence, you need to use setuid or root privileges to access the accelerator board. 3.2. Setting up the Displays There are two possible setups with add-on boards. You could either pass-through the video signal from your regular VGA board via the accelerator board to the display, or you could use two displays at the same time. Rely to the manual provided by the board manufacturer for details. Both configurations have been tried with the Monster 3D board. 3.2.1. Single screen display solution This configuration allows you to check basic operations of the accelerator board - if the video signal is not transmitted to the display, hardware failure is possible. Beware that the video output signal might deteoriate significantly if passed through the video board. To a degree, this is inevitable. However, reviews have complained about below-average of the cables provided e.g. with the Monster 3D, and judging from the one I tested, this has not changed. There are other pitfalls in single screen configurations. Switching from the VGA display mode to the accelerated display mode will change resolution and refresh rate as well, even if you are using 640x480 e.g. with X11, too. Moreover, if you are running X11, your application is responsible for demanding all keyboard and mouse events, or you might get stuck because of changed scope and exposure on the X11 display (that is effectively invisible when the accelerated mode is used) You could use SVGA console mode instead of X11. If you are going to use a single screen configuration and switch modes often, remember that your monitor hardware might not enjoy this kind of use. 3.2.2. Single screen dual cable setup Some high end monitors (e.g. the EIZO F-784-T) come with two connectors, one with 5 BNC connectors for RGB, HSync, VSync, the other e.g. a regular VGA or a 13W3 Sub-D VGA. These displays usually also feature a front panel input selector to safely switch from one to the other. It is thus possible to use e.g. a VGA-to-BNC cable with your high end 2D card, and a VGA-to-13W3 Sub-D cable with your 3Dfx, and effectively run dual screen on one display. 3.2.3. Dual screen display solution The accelerator board does not need the VGA input signal. Instead of routing the common video output through the accelerator board, you could attach a second monitor to its output, and use both at the same time. This solution is more expensive, but gives best results, as your main display will still be hires and without the signal quality losses involved in a pass-through solution. In addition, you could use X11 and the accelerated full screen display in parallel, for development and debugging. A common problem is that the accelerator board will not provide any video signal when not used. In consequence, each time the graphics application terminates, the hardware screensave/powersave might kick in depending on your monitors configuration. Again, your hardware might not enjoy being treated like this. You should use ______________________________________________________________________ setenv SST_DUALSCREEN 1 ______________________________________________________________________ to force continued video output in this setup. 3.3. Installing the Glide distribution The Glide driver and library are provided as a single compressed archive. Use tar and gzip to unpack, and follow the instructions in the README and INSTALL accompanying the distribution. Read the install script and run it. Installation puts everything in /usr/local/glide/include,lib,bin and sets the ld.conf to look there. Where it installs and setting ld.conf are independent actions. If you skip the ld.conf step then you need the LD_LIBRARY_PATH. You will need to install the header files in a location available at compile time, if you want to compile your own graphics applications. If you do not want to use the installation as above (i.e. you insist on a different location), make sure that any application could access the shared libary at runtime, or you will get a response like can't load library 'libglide.so'. 3.3.1. Using the detect program There is a bin/detect program in the distribution (the source is not available). You have to run it as root, and you will get something like ______________________________________________________________________ slot vendorId devId baseAddr0 command description ---- -------- ------ ---------- ------- ----------- 00 0x8086 0x122d 0x00000000 0x0006 Intel:430FX (Triton) 07 0x8086 0x122e 0x00000000 0x0007 Intel:ISA bridge 09 0x121a 0x0001 0xfb000008 0x0002 3Dfx:video multimedia adapter 10 0x1000 0x0001 0x0000e401 0x0007 ???:SCSI bus controller 11 0x9004 0x8178 0x0000e001 0x0017 Adaptec:SCSI bus controller 12 0x5333 0x88f0 0xf4000000 0x0083 S3:VGA-compatible display co ______________________________________________________________________ as a result. If you do not have root privileges, the program will bail out with ______________________________________________________________________ Permission denied: Failed to change I/O privilege. Are you root? ______________________________________________________________________ output might come handy for a bug report as well. 3.3.2. Using the test programs Within the Glide distribution, you will find a folder with test programs. Note that these test programs are under 3Dfx copyright, and are legally available for use only if you have purchased a board with a 3Dfx chipset. See the LICENSE file in the distribution, or their web site www.3dfx.com for details. It is recommend to compile and link the test programs even if there happen to be binaries in the distribution. Note that some of the programs will requires some files like alpha.3df from the distribution to be available in the same folder. All test programs use the 640x480 screen resolution. Some will request a veriety of single character inputs, others will just state Press A Key To Begin Test. Beware of loss of input scope if running X11 on the same screen at the same time. See the README.test for a list of programs, and other details. 4. Answers To Frequently Asked Questions The following section answers some of the questions that (will) have been asked on the Usenet news groups and mailing lists. The FAQ has been subdivided into several parts for convenience, namely o FAQ: Requirements? o FAQ: Voodoo Graphics (tm)? 3Dfx? o FAQ: Glide? o FAQ: Glide and SVGA? o FAQ: Glide and XFree86? o FAQ: Glide versus OpenGL/Mesa? o FAQ: But Quake? o FAQ: Troubleshooting? Each section lists several questions and answers, which will hopefully address most problems. 5. FAQ: Requirements? 5.1. What are the system requirements? A Linux PC, PCI 2.1 compliant, a monitor capable of 640x480, and a 3D accelerator board based on the 3Dfx Voodoo Graphics (tm). It will work on a P5 or P6, with or without MMX. The current version does not use MMX, but it has some optimized code paths for P6. At one point, some 3Dfx statements seemed to imply that using Linux Glide required using a RedHat distribution. Note that while Linux Glide has originally been ported in a RedHat 4.1 environment, it has been used and tested with many other Linux distributions, including homebrew, Slackware, and Debian 1.3.1. 5.2. Does it work with Linux-Alpha? There is currently no Linux Glide distribution available for any platform besides i586. As the Glide sources are not available for distribution, you will have to wait for the binary. Quantum3D has DEC Alpha support announced for 2H97. Please contact Daryll Strauss if you are interested in supporting this. There is also the issue of porting the the assembly modules. While there are alternative C paths in the code, the assembly module in Glide (essentially triangle setup) offered significant performance gains depending on the P5 CPU used. 5.3. Which 3Dfx chipsets are supported? Currently, the 3Dfx Voodoo Graphics (tm) chipset is supported under Linux. The Voodoo Rush (tm) chipset is not yet supported. 5.4. Is the Voodoo Rush (tm) supported? The current port of Glide to Linux does not support the Voodoo Rush (tm). An update is in the works. The problem is that at one point the Voodoo Rush (tm) driver code in Glide depended on Direct Draw. There was an SST96 based DOS portion in the library that could theoretically be used for Linux, as soon as all portions residing in the 2D/Direct Draw/D3D combo driver are replaced. Thus Voodoo Rush (tm) based boards like the Hercules Stingray 128/3D or Intergraph Intense Rush are not supported yet. 5.5. Which boards are supported? There are no officially supported boards, as 3Dfx does not sell any boards. This section does not attempt to list all boards, it will just give an overview, and will list only boards that have been found to cause trouble. It is important to recognize that Linux support for a given board does not only require a driver for the 3D accelerator component. If a board features its own VGA core as well, support by either Linux SVGA or XFree86 is required as well (see section about Voodoo Rush (tm) chipset). Currently, an add-on solution is recommended, as it allows you to choose a regular graphics board well supported for Linux. There are other aspects discussed below. All Quantum3D Obsidian boards, independend of texture memory, frame buffer memory, number of Pixelfx and Texelfx units, and SLI should work. Same for all other Voodoo Graphics (tm) based boards, like Orchid Righteous 3D, Canopus Pure 3D, Flash 3D, and Diamond Monster 3D. Voodoo Rush (tm) based boards are not yet supported. Boards that are not based on 3Dfx chipsets (e.g. manufactured by S3, Matrox, 3Dlabs, Videologic) do not work with the 3Dfx drivers and are beyond the scope of this document. 5.6. How do boards differ? As the board manufacturers are using the same chipset, any differences are due to board design. Examples are quality of the pass-through cable and connectors (reportedly, Orchid provided better quality than Diamond), availability of a TV-compliant video signal output (Canopus Pure 3D), and, most notably, memory size on board. Most common were boards for games with 2MB texture cache and 2 MB framebuffer memory, however, the Canopus Pure3D comes with a maximal 4 MB texture cache, which is an advantage e.g. with games using dynamically changed textures, and/or illumation textures (Quake, most notably). The memory architecture of a typical Voodoo Graphics (tm) board is described below, in a separate section. Quantum 3D offers the widest selection of 3Dfx-based boards, and is probably the place to go if you are looking for a high end Voodoo Graphics (tm) based board configuration. Quantum 3D is addressing the visual simulation market, while most of the other vendors are only targetting the consumer-level PC-game market. 5.7. What about AGP? There is no Voodoo Graphics (tm) or Voodoo Rush (tm) AGP board that I am aware of. I am not aware of AGP support under Linux, and I do not know whether upcmong AGP boards using 3Dfx technology might possibly be supported with Linux. 6. FAQ: Voodoo Graphics (tm)? 3Dfx? 6.1. Who is 3Dfx? 3Dfx is a San Jose based manufacturer of 3D graphics accelerator hardware for arcade games, game consoles, and PC boards. Their official website is www.3dfx.com. 3Dfx does not sell any boards, but other companies do, e.g. Quantum3D. 6.2. Who is Quantum3D? Quantum3D started as a 3Dfx spin-off, manufacturing high end accelerator boards based on 3Dfx chip technology for consumer and business market, and supplying arcade game technology. See their home page at www.quantum3d.com for additional information. For general inquiries regarding Quantum3D, please send mail to info@quantum3d. 6.3. What is the Voodoo Graphics (tm)? The Voodoo Graphics (tm) is a chipset manufactured by 3Dfx. It is used in hardware acceleration boards for the PC. See the HOWTO section on supported hardware. 6.4. What is the Voodoo Rush (tm)? The Voodoo Rush (tm) is a derivate of the Voodoo Graphics (tm) that has an interface to cooperate with a 2D VGA video accelerator, effectively supporting accelerated graphics in windows. This combo is currently not supported with Linux. 6.5. What is the Voodoo 2 (tm)? The Voodoo 2 (tm) is the successor of the Voodoo Graphics (tm) chipset, featuring several improvements. It is announced for late March 1998, and annoucements of Voodoo 2 (tm) based boards have been published e.g. by Quantum 3D, by Creative Labs, Orchid Technologies, and Diamond Multimedia. The Voodoo 2 (tm) is supposed to be backwards compatible. However, a new version of Glide will have to be ported to Linux. 6.6. What is VGA pass-though? The Voodoo Graphics (tm) (but not the Voodoo Rush (tm)) boards are add-on boards, meant to be used with a regular 2D VGA video accelerator board. In short, the video output of your regular VGA board is used as input for the Voodoo Graphics (tm) based add-on board, which by default passes it through to the display also connected to the Voodoo Graphics (tm) board. If the Voodoo Graphics (tm) is used (e.g. by a game), it will disconnect the VGA input signal, switch the display to a 640x480 fullscreen mode with the refresh rate configured by SST variables and the application/driver, and generate the video signal itself. The VGA doesn't need to be aware of this, and won't be. This setup has several advantages: free choice of 2D VGA board, which is an issue with Linux, as XFree86 drivers aren't available for all chipsets and revisions, and a cost effective migration path to accelerated 3D graphics. It also has several disadvantages: an application using the Voodoo Graphics (tm) might not re-enable video output when crashing, and regular VGA video signal deteoriates in the the pass-through process. 6.7. What is Texelfx or TMU? Voodoo Graphics (tm) chipsets have two units. The first one interfaces the texture memory on the board, does the texture mapping, and ultimately generates the input for the second unit that interfaces the framebuffer. This one is called Texelfx, aka Texture Management Unit, aka TMU. The neat thing about this is that a board can use two Texelfx instead of only one, like some of the Quantum3D Obsidian boards did, effectively doubling the processing power in some cases, depending on the application. As each Texelfx can address 4MB texture memory, a dual Texelfx setup has an effective texture cache of up to 8MB. This can be true even if only one Texelfx is actually needed by a particular application, as textures can be distributed to both Texelfx, which are used depending on the requested texture. Both Texelfx are used together to perform certain operations as trilinear filtering and illumination texture/lightmap passes (e.g. in glQuake) in a single pass instead of the two passes that are required with only one Texelfx. To actually exploit the theoretically available speedup and cache size increase, a Glide application has to use both Texelfx properly. The two Texelfx can not be used separately to each draw a textured triangle at the same time. A triangle is always drawn using whatever the current setup is, which can be to use both Texelfx for a single pass operation combining two textures, or one Texelfx for only a single texture. Each Texelfx can only access its own memory. 6.8. What is a Pixelfx unit? Voodoo Graphics (tm) chipsets have two units. The second one interfaces the framebuffer and ultimately generates the depth buffer and pixel color updates. This one is called Pixelfx. The neat thing here is that two Pixelfx units can cooperate in SLI mode, like with some of the Quantum3D Obsidian boards, effectively doubling the frame rate. 6.9. What is SLI mode? SLI means "Scanline Interleave". In this mode, two Pixelfx are connected and render in alternate turns, one handling odd, the other handling even scanlines of the actual output. Inthis mode, each Pixelfx stores only half of the image and half of the depth buffer data in its own local framebuffer, effectively doubling the number of pixels. The Pixelfx in question can be on the same board, or on two boards properly connected. Some Quantum3D Obsidian boards support SLI with Voodoo Graphics (tm). As two cards can decode the same PCI addresses and receive the same data, there is not necessarily additional bus bandwidth required by SLI. On the other hand, texture data will have to be replicated on both boards, thus the amount of texture memory effectively stays the same. 6.10. Is there a single board SLI setup? There are now two types of Quantum3D SLI boards. The intial setup used two boards, two PCI slots, and an interconnect (e.g. the Obsidian 100-4440). The later revision which performs identically is contained on one full-length PCI board (e.g. Obsidian 100-4440SB). Thus a single board SLI solution is possible, and has been done. 6.11. How much memory? How many buffers? The most essential difference between different boards using the Voodoo Graphics (tm) chipset is the amount and organization of memory. Quantum3D used a three digit scheme to descibe boards. Here is a slightly modifed one (anticipating Voodoo 2 (tm)). Note that if you use more than one Texelfx, they need the same amount of texture cache memory each, and if you combine two Pixelfx, each needs the same amount of frame buffer memory. ______________________________________________________________________ "SLI / Pixelfx / Texelfx1 / Texelfx2 " ______________________________________________________________________ It means that a common 2MB+2MB board would be a 1/2/2/0 solution, with the minimally required total 4Mb of memory. A Canopus Pure 3D would be 1/2/4/0, or 6MB. An Obsidian-2220 board with two Texelfx would be 1/2/2/2, and an Obsidian SLI-2440 board would be 2/2/4/4. A fully featured dual board solution (2 Pixelfx, each with 2 Texelfx and 4MB frame buffer, each Texelfx 4 MB texture cache) would be 2/4/4/4, and the total amount of memory would be SLI*(Pixelfx+Texelfx1+Texelfx2), or 24 MB. So there. 6.12. Does the Voodoo Graphics (tm) do 24 or 32 bit color? No. The Voodoo Graphics (tm) architecture uses 16bpp internally. This is true for Voodoo Graphics (tm), Voodoo Rush (tm) and Voodoo 2 (tm) alike. Quantum3D claims to implement 22-bpp effective color depth with an enhanced 16-bpp frame buffer, though. 6.13. Does the Voodoo Graphics (tm) store 24 or 32 bit z-buffer per pixel? No. The Voodoo Graphics (tm) architecture uses 16bpp internally for the depth buffer, too. This again is true for Voodoo Graphics (tm), Voodoo Rush (tm) and Voodoo 2 (tm) alike. Again, Quantum3D claims that using the floating point 16-bits per pixel (bpp) depth buffering provides 22-bpp effective Z-buffer precision. 6.14. What resolutions does the Voodoo Graphics (tm) support? The Voodoo Graphics (tm) chipset supports up to 4 MB frame buffer memory. Presuming double buffering and a depth buffer, a 2MB framebuffer will support a resolution of 640x480. With 4 MB frame buffer, 800x600 is possible. Unfortunately 960x720 is not supported. The Voodoo Graphics (tm) chipset requires that the amount of memory for a particular resolution must be such that the vertical and horizontal resolutions must be evenly divisible by 32. The video refresh controller, though can output any particular resolution, but the "virtual" size required for the memory footprint must be in dimensions evenly divisible by 32. So, 960x720 actually requires 960x736 amount of memory, and 960x736x2x3 = 4.04MBytes. However, using two boards with SLI, or a dual Pixelfx SLI board means that each framebuffer will only have to store half of the image. Thus 2 times 4 MB in SLI mode are good up to 1024x768, which is the maximum because of the overall hardware design. You will be able to do 1024x768 tripled buffered with Z, but you will not be able to do e.g. 1280x960 with double buffering. Note that triple buffering (no VSync synchonization required by the application), stereo buffering (for interfacing LCD shutters) and other more demanding setups will severely decrease the available resolution. 6.15. What texture sizes are supported? The maximum texture size for the Voodoo Graphics (tm) chipset is 256x256, and you have to use powers of two. Note that for really small textures (e.g. 16x16) you are better off merging them into a large texture, and adjusting your effective texture coordinates appropriately. 6.16. Does the Voodoo Graphics (tm) support paletted textures? The Voodoo Graphics (tm) hardware and Glide support the palette extension to OpenGL. The most recent version of Mesa does support the GL_EXT_paletted_texture and GL_EXT_shared_texture_palette extensions. 6.17. What about overclocking? If you want to put aside considerations about warranty and overheating, and want to do overclocking to boost up performance even further, there is related info out on the web. The basic mechanism is to use Glide environment variables to adjust the clock. Note that the actual recommended clock is board dependend. While the default clock speed is 50 Mhz, the Diamond Monster 3D property sheet lets you set up a clock of 57 MHz. It all comes down to the design of a specific board, and which components are used with the Voodoo Graphics (tm) chipset - most notably access speed of the RAM in question. If you exceed the limits of your hardware, rendering artifacts will occur to say the least. Reportedly, 57 MHz usually works, while 60 MHz or more is already pushing it. Increasing the clock frequency also means increasing the waste heat disposed in the chips, in a nonlinear dependency (10% increase in frequency means a lot larger increase in heating). In consequence, for permanent overclocking you might want to educate yourself about ways to add cooling fans to the board in a way that does not affect warranty. A very recommendable source is the "3Dfx Voodoo Heat Report" by Eric van Ballegoie, available on the web. 6.18. Where could I get additional info on Voodoo Graphics (tm)? There is a FAQ by 3Dfx, which should be available at their web site. You will find retail information at the following locations: www.3dfx.com and www.quantum3d.com. Inofficial sites that have good info are "Voodoo Extreme" at www.ve3d.com, and "Operation 3Dfx" at www.ve3d.com. 7. FAQ: Glide? TexUS? 7.1. What is Glide anyway? Glide is a proprietary API plus drivers to access 3D graphics accelerator hardware based on chipsets manufactured by 3Dfx. Glide has been developed and implemented for DOS, Windows, and Macintosh, and has been ported to Linux by Daryll Strauss. 7.2. What is TexUS? In the distribution is a libtexus.so, which is the 3Dfx Interactive Texture Utility Software. It is an image processing libary and utility program for preparing images for use with the 3Dfx Interactive Glide library. Features of TexUS include file format conversion, MIPmap creation, and support for 3Dfx Interactive Narrow Channel Compression textures. The TexUS utility program texus reads images in several popular formats (TGA, PPM, RGT), generates MIPmaps, and writes the images as 3Dfx Interactive textures files (see e.g. alpha.3df, as found in the distribution) or as an image file for inspection. For details on the parameters for texus, and the API, see the TexUS documentation. 7.3. Is Glide freeware? Nope. Glide is neither GPL'ed nor subject to any other public license. See LICENSE in the distribution for any details. Effectively, by downloading and using it, you agree to the End User License Agreement (EULA) on the 3Dfx web site. Glide is provided as binary only, and you should neither use nor distribute any files but the ones released to the public, if you have not signed an NDA. The Glide distribution including the test program sources are copyrighted by 3Dfx. The same is true for all the sources in the Glide distribution. In the words of 3Dfx: These are not public domain, but they can be freely distributed to owners of 3Dfx products only. No card, No code! 7.4. Where do I get Glide? The entire 3Dfx SDK is available for download off their public web- site located at www.3dfx.com/software/download_glide.html. Anything else 3Dfx publicly released by 3Dfx is nearby on their website, too. There is also an FTP site, ftp.3dfx.com. The FTP has a longer timeout, and some of the larger files have been broken into 3 files (approx. 3MB each). 7.5. Is the Glide source available? Nope. The Glide source is made available only based on a special agreement and NDA with 3Dfx. 7.6. Is Linux Glide supported? Currently, Linux Glide is unsupported. Basically, it is provided under the same disclaimers as the 3Dfx GL DLL (see below). However, 3Dfx definitely wants to provide as much support as possible, and is in the process of setting up some prerequisites. For the time being, you will have to rely on the 3Dfx newsgroup (see below). In addition, the Quantum3D web page claims that Linux support (for Obsidian) is planned for both Intel and AXP architecture systems in 2H97. 7.7. Where could I post Glide questions? There are newsgroups currently available only on the NNTP server news.3dfx.com run by 3Dfx. This USENET groups are dedicated to 3Dfx and Glide in general, and will mainly provide assistance for DOS, Win95, and NT. The current list includes: ______________________________________________________________________ 3dfx.events 3dfx.games.glquake 3dfx.glide 3dfx.glide.linux 3dfx.products 3dfx.test ______________________________________________________________________ and the 3dfx.oem.products.* group for specific boards, eg. 3dfx.oem.products.quantum3d.obsidian. Please use news.3dfx.com/3dfx.glide.linux for all Lnux Glide related questions. A mailing list dedicated to Linux Glide is in preparation for 1Q98. Send mail to majordomo@gamers.org, no subject, body of the message info linux-3dfx to get information about the posting guidelines, the hypermail archive and how to subscribe to the list or the digest. 7.8. Where to send bug reports? Currently, you should rely on the newsgroup (see above), that is news.3dfx.com/3dfx.glide.linux. There is no official support e-mail set up yet. For questions not specific to Linux Glide, make sure to use the other newsgroups. 7.9. Who is maintaining it? 3Dfx will appoint an official maintainer soon. Currently, inofficial maintainer of the Linux Glide port is Daryll Strauss. Please post bug reports in the newsgroup (above). If you are confident that you found a bug not previously reported, please mail to Daryll at daryll@harlot.rb.ca.us 7.10. How can I contribute to Linux Glide? You could submit precise bug reports. Providing sample programs to be included in the distribution is another possibility. A major contribution would be adding code to the Glide based Mesa Voodoo driver source. See section on Mesa Voodoo below. 7.11. Do I have to use Glide? Yes. As of now, there is no other Voodoo Graphics (tm) driver available for Linux. At the lowest level, Glide is the only interface that talks directly to the hardware. However, you can write OpenGL code without knowing anything about Glide, and use Mesa with the Glide based Mesa Voodoo driver. It helps to be aware of the involvement of Glide for recognizing driver limitations and bugs, though. 7.12. Should I program using the Glide API? That depends on the application you are heading for. Glide is a proprietary API that is partly similar to OpenGL or Mesa, partly contains features only available as EXTensions to some OpenGL implementations, and partly contains features not available anywhere but within Glide. If you want to use the OpenGL API, you will need Mesa (see below). Mesa, namely the Mesa Voodoo driver, offers an API resembling the well documented and widely used OpenGL API. However, the Mesa Voodoo driver is in early alpha, and you will have to accept performance losses and lack of support for some features. In summary, the decision is up to you - if you are heading for maximum performance while accepting potential problems with porting to non-3Dfx hardware, Glide is not a bad choice. If you care about maintenance, OpenGL might be the best bet in the long run. 7.13. What is the Glide current version? The current version of Linux Glide is 2.4. The next version will probably be identical to the current version for DOS/Windows, which is 2.4.3, which comes in two distributions. Right now, various parts of Glide are different for Voodoo Rush (tm) (VR) and Voodoo Graphics (tm) (VG) boards. Thus you have to pick up separate distributions (under Windows) for VR and VG. The same will be true for Linux. There will possibly be another chunk of code and another distribution for Voodoo 2 (tm) (V2) boards. There is also a Glide 3.0 in preparation that will extend the API for use of triangle fans and triangle strips, and provide better state change optimization. Support for fans and strips will in some situations significantly reduce the amount of data sent ber triangle, and the Mesa driver will benefit from this, as the OpenGL API has separate modes for this. For a detailed explanation on this see e.g. the OpenGL documentation. 7.14. Does it support multiple Texelfx already? Multiple Texelfx/TMU's can be used for single pass trilinear mipmapping for improvement image quality without performance penalty in current Linux Glide already. You will need a board with two Texelfx (that is, one of the appropriate Quantum3D Obsidian boards). The application needs to specify the use of both Texelfx accordingly, it does not happen automatically. Note that because most applications are implemented for consumer boards with a single Texelfx, they might not query the presence of a second Texelfx, and thus not use it. This is not a flaw of Glide but of the application. 7.15. Is Linux Glide identical to DOS/Windows Glide? The publicly available version of Linux Glide should be identical to the respective DOS/Windows versions. Delays in releasing the Linux port of newer DOS/Windows releases are possible. 7.16. Where to I get information on Glide? There is exhaustive information available from 3Dfx. You could download it from their home page at www.3dfx.com/software/download_glide.html. These are for free, presuming you bought a 3Dfx hardware based board. Please read the licensing regulations. Basically, you should look for some of the following: o Glide Release Notes o Glide Programming Guide o Glide Reference Manual o Glide Porting Guide o TexUs Texture Utility Software o ATB Release Notes o Installing and Using the Obsidian These are available as Microsoft Word documents, and part of the Windows Glide distribution, i.e. the self-extracting archive file. Postscript copies for separate download should be available at www.3dfx.com as well. Note that the release numbers are not always in sync with those of Glide. 7.17. Where to get some Glide demos? You will find demo sources for Glide within the distribution (test programs), and on the 3Dfx home page. The problem with the latter is that some require ATB. To port these demos to Linux, the event handling has to be completely rewritten. In addition, you might find useful some of the OpenGL demo sources accompanying Mesa and GLUT. While the Glide API is different from the OpenGL API, they target the same hardware rendering pipeline. 7.18. What is ATB? Some of the 3Dfx demo programs for Glide depend not only on Glide but also on 3Dfx's proprietary Arcade Toolbox (ATB), which is available for DOS and Win32, but has not been ported for Linux. If you are a devleoper, the sources are available within the Total Immersion program, so porting ATB to Linux would be possible. 8. FAQ: Glide and XFree86? 8.1. Does it run with XFree86? Basically, the Voodoo Graphics (tm) hardware does not care about X. The X server will not even notice that the video signal generated by the VGA hardware does not reach the display in single screen configurations. If your application is not written X aware, Glide switching to full screen mode might cause problems (see troubleshooting section). If you do not want the overhead of writing an X11-aware application, you might want to use SVGA console mode instead. So yes, it does run with XFree86, but no, it is not cooperating if you don't write your application accordingly. You can use the Mesa "window hack", which will be significantly slower than fullscreen, but still a lot faster than software rendering (see section below). 8.2. Does it only run full screen? See above. The Voodoo Graphics (tm) hardware is not window environment aware, neither is Linux Glide. Again, the experimental Mesa "window hack" covered below will allow for pasting the Voodoo Graphics (tm) board framebuffer's content into an X11 window. 8.3. What is the problem with AT3D/Voodoo Rush (tm) boards? There is an inherent problem when using Voodoo Rush (tm) boards with Linux: Basically, these boards are meant to be VGA 2D/3D accelerator boards, either as a single board solution, or with a Voodoo Rush (tm) based daughterboard used transparently. The VGA component tied to the Voodoo Rush (tm) is a Alliance Semiconductor's ProMotion-AT3D multimedia accelerator. To use this e.g. with XFree86 at all, you need a driver for the AT3D chipset. There is a mailing list on this, and a web site with FAQ at www.frozenwave.com/linux-stingray128. Look there for most current info. There is a SuSE maintained driver at ftp.suse.com/suse_update/special/xat3d.tgz. Reportedly, the XFree86 SVGA server also works, supporting 8, 16 and 32 bpp. Official support will probably be in XFree86 4.0. XFree86 decided to prepare an intermediate XFree86 3.3.2 release as well, which might already address the issues. The following XF86Config settings reportedly work. ______________________________________________________________________ # device section settings Chipset "AT24" Videoram 4032 # videomodes tested by Oliver Schaertel # 25.18 28.32 for 640 x 480 (70hz) # 61.60 for 1024 x 786 (60hz) # 120 for 1280 x 1024 (66hz) ______________________________________________________________________ In summary, there is nothing prohibiting this except for the fact that the drivers in XFree86 are not yet finished. If you want a more technical explanation: Voodoo Rush (tm) support requires X server changes to support grabbing a buffer area in the video memory on the AT3D board, as the Voodoo Rush (tm) based boards need to store their back buffer and z buffer there. This memory allocation and locking requirement is not a 3Dfx specific problem, it is also needed e.g. for support of TV capture cards, and is thus under active development for XFree86. This means changes at the device dependend X level (thus XAA), which are currently implemented as an extension to XFree86 DGA (Direct Graphics Access, an X11 extension proposal implemented in different ways by Sun and XFree86, that is not part of the final X11R6.1 standard and thus not portable). It might be part of an XFree86 GLX implementation later on. The currently distributed X servers assume they have full control of the framebuffer, and use anything that is not used by the visual region of the framebuffer as pixmap cache, e.g. for caching fonts. 8.4. What about GLX for XFree86? There are a couple of problems. The currently supported Voodoo Graphics (tm) hardware and the available revision of Linux Glide are full screen only, and not set up to share a framebuffer with a window environment. Thus GLX or other integration with X11 is not yet possible. The Voodoo Rush (tm) might be capable of cooperating with XFree86 (that is, an SVGA compliant board will work with the XFree86 SVGA server), but it is not yet supported by Linux Glide, nor do S3 or other XFree86 servers support these boards yet. In addition, GLX is tied to OpenGL or, in the Linux case, to Mesa. The XFree86 team is currently working on integrating Mesa with their X Server. GLX is in beta, XFree86 3.3 has the hooks for GLX. See Steve Parker's GLX pages at www.cs.utah.edu/~sparker/xfree86-3d/ for the most recent information. Moreover, there is a joint effort by XFree86 and SuSe, which includes a GLX, see www.suse.de/~sim/. Currently, Mesa still uses its GLX emulation with Linux. 8.5. Glide and commerical X Servers? I have not received any mail regarding use of Glide and/or Mesa with commercial X Servers. I would be interested to get confirmation on this, especially on Mesa and Glide with a commercial X Server that has GLX support. 8.6. Glide and SVGA? You should have no problems running Glide based applications either single or dual screen using VGA modes. It might be a good idea to set up the 640x480 resolution in the SVGA modes, too, if you are using a single screen setup. 8.7. Glide and GGI? A GGI driver for Glide is under development by Jon M. Taylor, but has not officially been released and was put on hold till completion of GGI 0.0.9. For information about GGI see synergy.caltech.edu/~ggi/. If you are adventurous, you might find the combination of XGGI (a GGI based X Server for XFree86) and GGI for Glide an interesting prospect. There is also a GGI driver interfacing the OpenGL API; tested with unaccelerated Mesa. Essentially, this means X11R6 running on a Voodoo Graphics (tm), using either Mesa or Glide directly. 9. FAQ: OpenGL/Mesa? 9.1. What is OpenGL? OpenGL is an immediate mode graphics programming API originally developed by SGI based on their previous proprietary Iris GL, and became in industry standard several years ago. It is defined and maintained by the Architectural Revision Board (ARB), an organization that includes members as SGI, IBM, and DEC, and Microsoft. OpenGL provides a complete feature set for 2D and 3D graphics operations in a pipelined hardware accelerated architecture for triangle and polygon rendering. In a broader sense, OpenGL is a powerful and generic toolset for hardware assisted computer graphics. 9.2. Where to get additional information on OpenGL? The official site for OpenGL maintained by the members of the ARB, is www.opengl.org, A most recommended site is Mark Kilgard's Gateway to OpenGL Info at reality.sgi.com/mjk_asd/opengl-links.html: it provides pointers to book, online manual pages, GLUT, GLE, Mesa, ports to several OS, tons of demos and tools. If you are interested in game programming using OpenGL, there is the OpenGL-GameDev-L@fatcity.com at Listserv@fatcity.com. Be warned, this is a high traffic list with very technical content, and you will probably prefer to use procmail to handle the 100 messages per day coming in. You cut down bandwidth using the SET OpenGL-GameDev-L DIGEST command. It is also not appropriate if you are looking for introductions. The archive is handled by the ListServ software, use the INDEX OpenGL-GameDev-L and GET OpenGL-GameDev-L "filename" commands to get a preview before subscribing. 9.3. Is Glide an OpenGL implementation? No, Glide is a proprietary 3Dfx API which several features specific to the Voodoo Graphics (tm) and Voodoo Rush (tm). A 3Dfx OpenGL is in preparation (see below). Several Glide features would require EXTensions to OpenGL, some of which already found in other implementations (e.g. paletted textures). The closest thing to a hardware accelerated Linux OpenGL you could currently get is Brian Paul's Mesa along with David Bucciarelli's Mesa Voodoo driver (see below). 9.4. Is there an OpenGL driver from 3Dfx? Both the 3Dfx website and the Quantum3D website announced OpenGL for Voodoo Graphics (tm) to be available 4Q97. The driver is currently in Beta, and accessible only to registered deverloper's under written Beta test agreement. A linux port has not been announced yet. 9.5. Is there a commercial OpenGL for Linux and 3Dfx? I am not aware of any third party commercial OpenGL that supports the Voodoo Graphics (tm). Last time I paid attention, neither MetroX nor XInside OpenGL did. 9.6. What is Mesa? Mesa is a free implementation of the OpenGL API, designed and written by Brian Paul, with contributions from many others. Its performance is competitive, and while it is not officially certified, it is an almost fully compliant OpenGL implementation conforming to the ARB specifications - more complete than some commercial products out, actually. 9.7. Does Mesa work with 3Dfx? The latest Mesa MesaVer; release works with Linux Glide 2.4. In fact, support was included in earlier versions, however, this driver is still under development, so be prepared for bugs and less than optimal performance. It is steadily improving, though, and bugs are usually fixed very fast. You will need to get the Mesa library archive from the iris.ssec.wisc.edu FTP site. It is recommended to subscribe to the mailing list as well, especially when trying to track down bugs, hardware, or driver limitations. Make sure to get the most recent distribution. A Mesa-3.0 is in preparation. 9.8. How portable is Mesa with Glide? It is available for Linux and Win32, and any application based on Mesa will only have the usual system specific code, which should usually mean XWindows vs. Windows, or GLX vs. WGL. If you use e.g. GLUT or Qt, you should get away with any system specifics at all for virtually most applications. There are only a few issues (like sampling relative mouse movement) that are not adressed by the available portable GUI toolkits. Mesa/Glide is also available for DOS. The port which is 32bit DOS is maintained by Charlie Wallace and kept up to date with the main Mesa base. See www.geocities.com/~charlie_x/.for the most current releases. 9.9. Where to get info on Mesa? The Mesa home page is at www.ssec.wisc.edu/~brianp/Mesa.html. There is an archive of the Mesa mailing list. at www.iqm.unicamp.br/mesa/. This list is not specific to 3Dfx and Glide, but if you are interested in using 3Dfx hardware to accelerate Mesa, it is a good place to start. 9.10. Where to get information on Mesa Voodoo? For latest information on the Mesa Voodoo driver maintained by David Bucciarelli tech.hmw@plus.it see the home page at www- hmw.caribel.pisa.it/fxmesa/. 9.11. Does Mesa support multitexturing? Not yet (as of Mesa 2.6), but it is on the list. In Mesa you will probably have to use the OpenGL EXT_multitexture extension once it is available. There is no final specification for multitextures in OpenGL, which is supposed to be part of the upcoming OpenGL 1.2 revision. There might be a Glide driver specific implementation of the extension in upcoming Mesa releases, but as long as only certain Quantum3D Obsidian boards come with multiple TMU's, it is not a top priority. This will surely change once Voodoo 2 (tm) based boards are in widespread use. 9.12. Does Mesa support single pass trilinear mipmapping? Multiple TMU's should be used for single pass trilinear mipmapping for improvement image quality without performance penalty in current Linux Glide already. Mesa support is not yet done (as of Mesa 2.6), but is in preparation. 9.13. What is the Mesa "Window Hack"? The most recent revisions of Mesa contain an experimental feature for Linux XFree86. Basically, the GLX emulation used by Mesa copies the contents of the Voodoo Graphics (tm) board's most recently finished framebuffer content into video memory on each glXSwapBuffers call. This feature is also available with Mesa for Windows. This obviously puts some drain on the PCI, doubled by the fact that this uses X11 MIT SHM, not XFree86 DGA to access the video memory. The same approach could theoretically be used with e.g. SVGA. The major benefit is that you could use a Voodoo Graphics (tm) board for accelerated rendering into a window, and that you don't have to use the VGA passthrough mode (video output of the VGA board deteoriates in passing through, which is very visible with high end monitors like e.g. EIZO F784-T). Note that this experimental feature is NOT Voodoo Rush (tm) support by any means. It applies only to the Voodoo Graphics (tm) based boards. Moreover, you need to use a modified GLUT, as interfacing the window management system and handling the events appropriately has to be done by the application, it is not handled in the driver. Make really sure that you have enabled the following environment variables: ______________________________________________________________________ export SST_VGA_PASS=1 # to stop video signal switching export SST_NOSHUTDOWN=1 # to stop video signal switching export MESA_GLX_FX="window" # to initiate Mesa window mode ______________________________________________________________________ If you manage to forget one of the SST variables, your VGA board will be shut off, and you will loose the display (but not the actual X). It is pretty hard to get that back being effectively blind. Finally, note that the libMesaGL.a (or .so) library can contain multiple client interfaces. I.e. the GLX, OSMesa, and fxMesa (and even SVGAMesa) interfaces call all be compiled into the same libMesaGL.a. The client program can use any of them freely, even simultaneously if it's careful. 9.14. How about GLUT? Mark Kilgard's GLUT distribution is a very good place to get sample applications plus a lot of useful utilities. You will find it at reality.sgi.com/mjk_asd/glut3/, and you should get it anyway. The current release is GLUT 3.6, and discussion on a GLUT 3.7 (aka GameGLUT) has begun. Note that Mark Kilgard has left SGI recently, so the archive might move some time this year - for the time being it will be kept at SGI. There is also a GLUT mailing list, glut@perp.com. Send mail to majordomo@perp.com, with the (on of the) following in the body of your email message: ______________________________________________________________________ help info glut subscribe glut end ______________________________________________________________________ As GLUT handles double buffers, windows, events, and other operations closely tied to hardware and operating system, using GLUT with Voodoo Graphics (tm) requires support, which is currently in development within GLX for Mesa. It already works for most cases. 10. FAQ: But Quake? 10.1. What about that 3Dfx GL driver for Quake? The 3Dfx Quake GL, aka mini-driver, aka miniport, aka Game GL, aka 3Dfx GL alpha, implemented only a Quake-specific subset of OpenGL (see http://www.cs.unc.edu/~martin/3dfx.html for an inofficial list of supported code paths). It is not supported, and not updated anymore. It was a Win32 DLL (opengl32.dll) released by 3Dfx and was available for Windows only. This DLL is not, and will not be ported to Linux. 10.2. Is there a 3Dfx based glQuake for Linux? Yes. A Quake linuxquake v0.97 binary has been released based on Mesa with Glide. The Quake2 q2test binary for Linux and Voodoo Graphics (tm) has been made available as well. A full Quake2 for Linux was released in January 1998, with linuxquake2-3.10. Dave "Zoid" Kirsch is the official maintainer of all Linux ports of Quake, Quakeworld, and Quake2, including all the recent Mesa based ports. Note that all Linux ports, including the Mesa based ones, are not officially supported by id Software. See ftp.idsoftware.com/idstuff/quake/unix/ for the latest releases. 10.3. Does glQuake run in an XFree86 window? A revision of Mesa and the Mesa-based Linux glQuake is in preparation. Mesa already does support this by GLX, but Linux glQuake does not use GLX. 10.4. Known Linux Quake problems? Here is an excerpt, as of January 7th, 1998. I omitted most stuff not specific to &3Dfx; hardware. o You really should run Quake2 as root when using the SVGALib and/or GL renders. You don't have to run as root for the X11 refresh, but the modes on the mouse and sound devices must be read/writable by whatever user you run it as. Dedicated server requires no special permissions. o X11 has some garbage on the screen when 'loading'. This is normal in 16bit color mode. X11 doesn't work in 24bit (TrueColor). It would be very slow in any case. o Some people are experiencing crashes with the GL renderer. Make sure you install the libMesa that comes with Quake2! Older versions of libMesa don't work properly. o If you are experience video 'lag' in the GL renderer (the frame rate feels like it's lagging behind your mouse movement) type "gl_finish 1" in the console. This forces update on a per frame basis. o When running the GL renderer, make sure you have killed selection and/or gpm or the mouse won't work as they won't "release" it while Quake2 is running in GL mode. 10.5. Know Linux Quake security problems? As Dave Kirsch posted on January 28th, 1998: an exploit for Quake2 under Linux has been published. Quake2 is using shared libraries. While the READMRE so far does not specifically mention it, note that Quake2 should not be setuid. If you want to use the ref_soft and ref_gl renderers, you should run Quake2 as root. Do not make the binary setuid. You can only run both those renderers at the console only, so being root is not that much of an issue. The X11 render does not need any root permissions (if /dev/dsp is writable by others for sound). The dedicated server mode does not need to be root either, obviously. Problems such as root requirements for games has been sort of a sore spot in Linux for a number of years now. This is one of the goals that e.g. GGI is targetting to fix. A ref_ggi might be supported in the near future. 10.6. Does LinuxQuake use multitexturing? To my understadnding, glQuake will use a multitexture EXTension if the OpenGL driver in question offers it. The current Mesa implementation and the Glide driver for Linux do not yet support this extension, so for the time being the answer is no. See section on Mesa and multitexturing for details. 10.7. Where can I get current information on Linux glQuake? Try some of these sites: the "The Linux Quake Resource" at linuxquake.telefragged.com, or the "Linux Quake Page" at www.planetquake.com/threewave/linux/. Alternatively, you could look for Linux Quake sites in the "SlipgateCentral" database at www.slipgatecentral.com. 11. FAQ: Troubleshooting? 11.1. Has this hardware been tested? See hardware requirements list above. I currently do not maintain a conclusive list of vendors and boards, as no particular board specific problems have been verified. Currently, only 3Dfx and Quantum3D provide boards for testing to the developers, so Quantum3D consumer boards are a safe bet. Every other Voodoo Graphics (tm) based board should work, too. I have reports regarding the Orchid Righteous 3D, Guillemot Maxi 3D Gamer, and Diamond Monster 3D. If you are a board manufacturer who wants to make sure his Voodoo Graphics (tm), Voodoo Rush (tm) or Voodoo 2 (tm) boards work with upcoming releases of Linux, Xfree86, Linux Glide and/or Mesa, please contact me, and I will happily forward your request to the persons maintaining the drivers in question. If you are interested in support for Linux Glide on other then the PC platfrom, e.g. DEC Alpha, please contact the maintainer of Linux Glide Daryll Strauss, at daryll@harlot.rb.ca.us 11.2. Failed to change I/O privilege? You need to be root, or setuid your application to run a Glide based application. For DMA, the driver accesses /dev/mem, which is not writeable for anybody but root, with good reasons. See the README in the Glide distribution for Linux. 11.3. Does it work without root privilege? There are compelling case where the setuid requirement is a problem, obviously. There are currently solutions in preparation, which require changes to the library internals itself. 11.4. Displayed images looks awful (single screen)? If you are using the analog pass through configuration, the common SVGA or X11 display might look pretty bad. You could try to get a better connector cable than the one provided with the accelerator board (the ones delivered with the Diamond Monster 3D are reportedly worse then the one accompanying the Orchid Righteous 3D), but up to a degree there will inevitably be signal loss with an additional transmission added. If the 640x480 full screen image created by the accelerator board does look awful, this might indicate a real hardware problem. You will have to contact the board manufacturer, not 3Dfx for details, as the quality of the video signal has nothing to do with the accelerator - the board manufacturer chooses the RAMDAC, output drivers, and other components responsible. 11.5. The last frame is still there (single or dual screen)? You terminated your application with Ctrl-C, or it did not exit normally. The accelerator board will dutifully provide the current content of the framebuffer as a video signal unless told otherwise. 11.6. Powersave kicks in (dual screen)? When you application terminates in dual screen setups, the accelerator board does not provide video output any longer. Thus powersave kicks each time. To avoid this, use ______________________________________________________________________ setenv SST_DUALSCREEN 1 ______________________________________________________________________ 11.7. My machine seem to lock (X11, single screen)? If you are running X when calling a Glide application, you probably moved the mouse out of the window, and the keyboard inputs do not reach the application anymore. If you application is supposed to run concurrently with X11, it is recommend to expose a full screen window, or use the XGrabPointer and XGrabServer functions to redirect all inputs to the application while the X server cannot access the display. Note that grabbing all input with XGrabPointer and XGrabServer does not qualify as well-behaved application, and that your program might block the entire system. If you experience this problem without running X, be sure that there is no hardware conflict (see below). 11.8. My machine locks (single or dual screen)? If the system definitely does not respond to any inputs (you are running two displays and know about the loss of focus), you might experience a more or less subtle hardware conflict. See installation troubleshooting section for details. If there is no obvious address conflict, there might still be other problems (below). If you are writing your own code the most common reason for locking is that you didn't snap your vertices. See the section on snapping in the Glide documentation. 11.9. My machine locks (used with S3 VGA board)? It is possible you have a problem with memory region overlap specific to S3. There is some info and a patch to the so-called S3 problem in the 3Dfx web site, but these apply to Windows only. To my understanding, the cause of the problem is that some S3 boards (older revisions of Diamond Stealth S3 968) reserve more memory space than actually used, thus the Voodoo Graphics (tm) has to be mapped to a different location. However, this has not been reported as a problem with Linux, and might be Windows-specific. 11.10. No address conflict, but locks anyway? If you happen to use a motherboard with non-standard or incomplete PCI support, you could try to shuffle the boards a bit. I am running an ASUS TP4XE that has that non-standard modified "Media Slot", i.e. PCI slot4 with additional connector for ASUS-manufactured SCSI/Sound combo boards, and I experienced severe problems while running a Diamond Monster 3D in that slot. The system operates flawlessly since I put the board in one of the regular slots. 11.11. Mesa runs, but does not access the board? Be sure that you recompiled all the libraries (including the toolkits the demo programs use - remember that GLUT does not yet support Voodoo Graphics (tm)), and that you removed the older libraries, run ldconfig, and/or set your LD_LIBRARY_PATH properly. Mesa supports several drivers in parallel (you could use X11 SHM, off screen rendering, and Mesa Voodoo at the same time), and you might have to create and switch contexts explicitely (see MakeCurrent function) if the Voodoo Graphics (tm) isn't chosen by default. 11.12. Resetting dual board SLI? If a Quantum 3D Obsidian board using in an SLI setup exits abruptly (i.e., the application crashes, or is aborted by user), the boards are left in an undefined state. With the dual-board set, you can run a program called resetsli to reset them. Until you run the resetsli program, you will not be able to re-initialize the Obsidian board. 11.13. Resetting single board SLI? The resetsli program mentioned above does not yet work with a single board Obsidian SLI (e.g. the Obsidian 100-4440SB). You will have to reboot your system by reset in order to reset the board. 3D Graphics Modelling and Rendering mini-HOWTO Dave Jarvis v1.1, 27 March 2001 Details download and installation instructions for a graphics render­ ing and modelling development environment using RedHat Linux. ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Preamble 1.2 Modelling vs. Modeling 1.3 Copyright Information 2. Background Information 2.1 The Graphics Library 2.2 The Graphics Modeller 2.3 The Graphics Renderer 3. Installation Instructions 3.1 Warning 3.2 Download the Software 3.3 Install the Graphics Library 3.4 Install the Graphics Renderer 3.5 Install the Graphics Modeller 3.6 Clean Up 4. Miscellaneous Information 4.1 Lighting 4.2 Tutorials 5. Related Links 5.1 Graphics Libraries 5.2 Graphics Renderers 5.3 Graphics Modellers 5.4 Miscellaneous Links 6. Acknowledgements ______________________________________________________________________ 1. Introduction 1.1. Preamble This document will guide you through the steps used to install and configure an environment for modelling and rendering three-dimensional graphics using Linux. In this section you will also find information in laymans terms about the required components and how they piece together. The installation section is purposely minimal; merely the quick and dirty steps needed to take to get up and running (if it doesn't work, more information is available). For those that want more information about the software components and what they do (in general), please continue reading. There are, at the minimum, three software packages you'll need in order to get up and running. These are as follows (in the order they are explained, not the order they are installed): · a graphics library; · a graphics modeller; · a graphics renderer. 1.2. Modelling vs. Modeling The spelling modelling is Canadian. The spelling modeling is American. The original author of this document is Canadian. ;-) 1.3. Copyright Information Copyright © 2000-2001 Dave Jarvis This document may be reproduced in whole or in part, without fee, subject to the following restrictions: · the copyright notice above and this permission notice must be preserved complete on all complete or partial copies; · any translation or derived work must be approved by the author in writing before distribution; · if you distribute this work in part, instructions for obtaining the complete version of this manual must be included, and a means for obtaining a complete version provided; · small portions may be reproduced as illustrations for reviews or quotes in other works without this permission notice if proper citation is given. 2. Background Information The content of this section exists only to describe, in general, the three main components required for three-dimensional modelling and rendering with a Linux-based system. 2.1. The Graphics Library A graphics library consists of the most basic tools used for manipulating graphical images. Think of all the things needed to build a house: wiring, plumbing, wood, bricks, and such. The graphics library can be thought of as not these items, but rather the tools used create such items. After all, wire, metal tubes, planks, and bricks don't magically appear; rather they are created and formed as entities unto themselves. On a similar note, graphics don't magically appear on the screen -- typically they consist of lower-level graphics primatives (lines, rectangles, and individual pixels, for example). So the graphics library, then, can be thought of as the low-level graphics primatives used to build more complex objects (spheres, boxes, complex polygons, etc.). Those complex objects are then used to build even more complicated shapes and figures. The graphics library installed was the freeware implementation of OpenGL called Mesa. 2.2. The Graphics Modeller Since the graphics renderer is, ideally, completely hidden from the end-user, we'll deal with that last (besides which, modelling is the next logical step in keeping with my house-building analogy). However, when it comes to the actual installation, a graphics modeller relies on the renderer already being installed. If the graphics library is akin to the tools used to build the tools used to build a house (!), then graphics modellers can be thought of as the tools used to build the blueprints for the house -- sophisticated blueprints, as modellers let you dictate exactly where the wiring, plumbing, wood panels, bricks, and forth are supposed to go. Furthermore, they let you pick the style of panelling and the colour of the bricks you desire. The graphics modeller installed was the freeware package called The Mops, which produces RenderMan-compatible files. 2.3. The Graphics Renderer In keeping with the house-building analogy, the graphics renderer is then the construction workers. Once you have the blueprints and materials ready to go, you need something to actually build the house so it appears how it was designed. The graphics renderer is given information (i.e., the blueprints in the form of a RenderMan- compatible file, or equivalent) from the the modeller to produce the final result. Just as the graphics modeller needs the graphics renderer before it can be installed, the renderer relies on the graphics library being installed beforehand. The graphics renderer installed was the Blue Moon Rendering Toolkit which uses RenderMan files. 3. Installation Instructions Keep in mind that these are brief instructions; a quick summary of the more important details you'll find listed in README files for the corresponding software packages. It is, by no means, a substitute for actually reading those files (as they contain copyright information and other instructions not necessarily covered by this document). 3.1. Warning First, let it be known that this document only covers how to get up and running using RedHat v7.0. Whenever given the choice as to which software package to download, please make sure it is compatible with the flavour of Linux you happen to be running. Second, please only send E-mail if you have information that would be helpful to other people who might read this document (such as explaining how to install other tools, pointers to other tutorials, missing steps grammar and/or speling mistakes and/or tpyos, etc.). If software doesn't compile, or you can't figure it out, please read its accompanying documentation. Please understand that your system may be completely different, and as such debugging problems via E-mail across the Internet is not a task anyone enjoys. ;-) Third, these are software packages that installed without any severe hitches (read: severe headaches). In the Related Links section, there are alternate software packages along side the ones covered below. Note that just because a given software package is not covered in depth does not mean it is any worse (or better) than those chosen to install. Good luck! 3.2. Download the Software Before you begin, you will need a web browser and Unix shell. If you don't know how to use a shell [bash, ksh, etc.], you're own your own (although instructions are given in both English and shell commands). Unless otherwise specified, all instructions are to be carried out as root. 1. Create a new directory /usr/local/archives for the packages: mkdir /usr/local/archives 2. Download the following packages (in .tar.gz form) into the newly created directory (homepages are given, as well as links to download pages, and minimum software version): · Mesa Graphics Library v3.4.1: www.mesa3d.org/download.html · Blue Moon Rendering Toolkit v2.6beta: www.bmrt.org/BMRTdownload/index.html · The Mops v0.42d: www.informatik.uni-rostock.de/~rschultz/mops/download.html 3.3. Install the Graphics Library Old versions of tar do not support the z argument. For those systems, leave out the z argument and use gunzip on the file before using tar. 1. Change to the /usr/local/archives directory: cd /usr/local/archives 2. Extract Mesa (substitute version number where required): tar zxf MesaLib-3.4.1.tar.gz tar zxf MesaDemos-3.4.1.tar.gz 3. Change to the MesaLib subdirectory: cd Mesa-3.4.1 4. Configure, make, and install Mesa with the following sequence of commands: ./configure; make; make install 5. Edit /etc/ld.so.conf, and ensure you have a line that reads: /usr/local/lib 6. Run the dynamic library configuration program: ldconfig 3.4. Install the Graphics Renderer 1. Return to the /usr/local/archives directory: cd .. 2. Extract the Blue Moon Rendering Toolkit (substitute version number where required): tar zxf BMRT2.6beta.linux-glibc2.tar.gz 3. Change to the BMRT subdirectory: cd BMRT2.6 4. Copy files to appropriate destination directories: cp bin/* /usr/local/bin/ cp lib/lib* /usr/local/lib/ cp include/* /usr/local/include/ 5. Make a directory for the shaders, ensure it is world-writable, then copy the shader files into it: mkdir /usr/local/shaders chmod 777 /usr/local/shaders cp shaders/*.sl* /usr/local/shaders/ cp shaders/*.h /usr/local/shaders/ cp examples/*.sl* /usr/local/shaders/ cp examples/*.h /usr/local/shaders/ 6. Edit the system login profile (/etc/profile or equivalent), and add the line: export SHADERS=.:/usr/local/shaders 7. Copy the .rendribrc file to each user's home directory. If anything goes wrong, please consult the README file that accompanies the Blue Moon Rendering Toolkit, or visit their website. 3.5. Install the Graphics Modeller The Mops may be installed on a per-user basis, or on a system-wide basis by root (or equivalent). In this example, it is installed using a non-administrative account, which should yield positive results. Note that the compile failed during the install (missing a C header file), so the precompiled binaries (compatible with RedHat v6.0, your system may vary) were installed, as follows: 1. Change to one directory above where you'd like The Mops to reside. For example, if /usr/local/mops was desired, then issue the following command: cd /usr/local 2. Extract the mops (substitute number where required), then change into its directory: tar zxf /usr/local/archives/mops-0.42d-BMRT26-linux.tar.gz cd mops 3. Move the following files from /usr/local/mops/src to /usr/local/mops: mv src/crtmopssh.sh . mv src/mfio.so . mv src/mops . 4. Copy the .mopsrc file to the home directory of each user wanting to run The Mops. For example, the user "jane" would need the following commands run: cp src/mopsrc /home/jane/.mopsrc 5. Create /usr/local/lib/mops and move the buttons and shaders: mkdir /usr/local/lib/mops/ mv buttons/* /usr/local/lib/mops/ mv shader/*.sl* /usr/local/shaders/ If anything goes wrong, please consult the README and Setup.txt files that accompany The Mops, or visit their website. Log out from root. Log in as a regular user, and run The Mops as follows: /usr/local/mops/mops You may wish to create a subdirectory within $HOME/mops called models for saving 3D models. 3.6. Clean Up Now that the installation is complete, you can remove from your system all files that you no longer require (substituting version numbers where required). cd /usr/local/archives/ rm -rf BMRT2.6 rm -rf Mesa-3.4.1 Note: Be cautious when using rm -rf ... make sure you are in the correct directory, and the files and/or directories you wish to delete are present. 4. Miscellaneous Information Instead of a frequently asked questions section, here is information about some of the (almost embarassing) problems faced. 4.1. Lighting The most frustrating problem, initially, was trying to figure out why everything was black -- and then how to actually light objects up. In these "virtual worlds" where you are modelling objects, the worlds are created from scratch. There is no light in the world until you actually put a light source in it! The light sources then shine a given direction, illuminating things in their path (according to the surface properties of the objects). Make certain that your light source is: 1. pointing (rotated and translated) in the correct direction; 2. intense enough to actually cast discernable lighting. 4.2. Tutorials The most basic thing a person would want to do with modelling/rendering packages is position a sphere on a surface, give it some lighting, and see the result. A decent tutorial should describe that first. That said, The Mops has a wonderful first tutorial . 5. Related Links 5.1. Graphics Libraries Mesa - An OpenGL-compliant Graphics Library. 5.2. Graphics Renderers BMRT - The Blue Moon Rendering Toolkit. POV-Ray - The Persistence of Vision Raytracer. 5.3. Graphics Modellers The Mops - A 3D modelling package that uses BMRT. Blender - Freeware modelling and rendering suite of tools. 5.4. Miscellaneous Links Here are some links that don't really fit into any other category, yet are still worthwhile checking out if you are seriously considering using your Linux computer as a 3D modelling and rendering machine. 3D Software for Linux - Contains most (if not all) links in this document and then some. 3D Modelling Software for Linux - Links to software packages chiefly related to modelling. 3D Modelling and Rendering using Linux - A comprehensive site with articles and software that explains what this document summarizes. 6. Acknowledgements I would like to extend a heart-felt thanks to the developers of the software packages detailed in this document. The quality of their products is of a commercial level, yet they keep the spirit of free software alive. Well done! 4mb Laptop HOWTO Bruce Richardson 25 March 2000 How to put a "grown-up" Linux on a small-spec (4mb RAM, <=200mb hard disk) laptop. ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Why this document was written. 1.2 What use is a small laptop? 1.3 Why not just upgrade the laptop? 1.4 What about 4mb desktop machines? 1.5 What this document doesn't do. 1.6 Where to find this document. 1.7 Copyright 2. The Laptops 2.1 Basic Specifications 2.1.1 Compaq Contura Aero 2.1.2 Toshiba T1910 2.2 The Problem 2.3 The Solution 3. Choices Made 3.1 What to use to create the initial root partition? 3.2 The Distribution 3.2..1 But I don't like Slackware! 3.3 Which installation method to use? 3.4 Partition Layout 3.4.1 Basic Requirement 3.4.2 How complex a layout? 3.5 Which components to install? 4. The Pre-installation Procedure 4.1 muLinux Preparation 4.2 Prepare the installation root files. 4.3 Create the partitions. 4.3.1 Mini-Linuces and ext2 file-systems - an important note. 4.3.2 Procedure 5. The Installation 5.1 Boot the machine 5.2 Floppy/Parport CD-ROM Install 5.3 Network/PCMCIA Install 5.3.1 PCMCIA install on the Aero 5.4 Set-up 5.4.1 AddSwap 5.4.2 Target 5.4.3 Select 5.4.4 Install 5.4.5 Configure 5.4.6 Exit 5.5 Pre-reboot Configuration 5.6 Post-reboot Configuration. 5.6.1 Re-use the temporary root. 5.6.2 Other configuration tweaks. 6. Conclusion 7. Appendix A: 7.1 A - Base Linux System 7.1..1 Packages considered for omission: 7.1..2 Packages installed: 7.2 AP - Non-X Applications 7.2..1 Packages considered for inclusion: 7.2..2 Packages installed: 7.3 D - Development Tools 7.3..1 Packages installed: 7.4 E - Emacs 7.4..1 Packages installed: 7.5 F - FAQs and HOWTOs 7.5..1 Packages installed: 7.6 K - Kernel Source 7.6..1 Packages Installed: 7.7 N - Networking Tools and Apps 7.7..1 Packages installed: 7.8 Tetex 7.8..1 Packages installed: 7.9 Y - BSD Games Collection 7.9..1 Packages installed: 7.10 End result 8. Appendix B: Resources relevant to this HOWTO ______________________________________________________________________ 1. Introduction 1.1. Why this document was written. I got my hands on two elderly laptops, both with just 4mb RAM and small (<=200mb) hard drives. I wanted to install Linux on them. The documentation for this kind of laptop all recommends installing either a mini-Linux or an old (and therefor compact) version of one of the professional distributions. I wanted to install an up-to-date professional distribution. 1.2. What use is a small laptop? Plenty. It isn't going to run X or be a development box (see ``Which components to install?'') but if you are happy at the console you have a machine that can do e-mail, networking, writing etc. Laptops also make excellent diagnostic/repair tools and the utilities for that will easily fit onto small laptops. 1.3. Why not just upgrade the laptop? Upgrading old laptops is not much cheaper than upgrading new ones. That's a lot to spend on an old machine, especially considering that the manufacturer isn't supporting it any more and spare parts are hard to find. 1.4. What about 4mb desktop machines? The procedure described in this document will work perfectly well on a desktop PC. On the other hand, upgrading a desktop machine is far easier and cheaper than upgrading a laptop. Even if you don't upgrade it, there are still simpler options. You could take out the hard disk, put it in a more powerful machine, install Linux, trim it to fit and then put the disk back in the old machine. 1.5. What this document doesn't do. This document is not a general HOWTO about installing Linux on laptops or even a specific HOWTO for either of the two machines mentioned here. It simply describes a way of squeezing a large Linux into a very small space, citing two specific machines as examples. 1.6. Where to find this document. The latest copy of this document can be found in several formats at http://website.lineone.net/~brichardson/linux/4mb_laptops/. 1.7. Copyright This document is copyright (c) Bruce Richardson 2000. It may be distributed under the terms set forth in the LDP license at sunsite.unc.edu/LDP/COPYRIGHT.html. This HOWTO is free documentation; you can redistribute it and/or modify it under the terms of the LDP license. This document is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the LDP license for more details. Toshiba and T1910 are trademarks of Toshiba Corporation. Compaq and Contura Aero are trademarks of Compaq Computer Corporation. 2. The Laptops This section describes the laptops that I have used this procedure on, the problems faced when installing Linux on them and the solutions to those problems (in outline). 2.1. Basic Specifications 2.1.1. Compaq Contura Aero · 25MHz 486SX CPU · 4mb RAM · 170mb Hard Disk · 1 PCMCIA Type II slot · External PCMCIA 3.5" Floppy drive (-- The PCMCIA floppy drive has a proprietary interface which is partly handled by the Aero's unique BIOS. The Linux PCMCIA drivers can't work with it. According to the PCMCIA-HOWTO, if the drive is connected when the laptop boots it will work as a standard drive and Card Services will ignore the socket but it is not hot-swappable. However, I found that the drive becomes inaccessible as soon as Card Services start unless there is a mounted disk in the drive. This has implications for the installation process - these are covered at the relevant points. --) 2.1.2. Toshiba T1910 · 33MHz 486SX CPU · 4mb RAM · 200 mb Hard Disk · Internal 3.5" Floppy drive · 1 PCMCIA Type II/III slot 2.2. The Problem The small hard disks and the lack of an internal floppy on the Aero make the installation more tricky than normal but the real problem is the RAM. None of the current distributions has an installation disk that will boot in 4mb, not even if the whole hard disk is a swap partition. The standard installation uses a boot disk to uncompress a root- partition image (either from a second floppy or from CD-ROM) into a ram-disk. The root-image is around 4mb in size. That's all the RAM available in this scenario. Try it and it freezes while unpacking the root-image. 2.3. The Solution The answer is to eliminate the ram-disk. If you can mount root on a physical partition you will have enough memory to do the install. Since the uncompressed ram-disk is too big to fit on a floppy, the only place left is on the hard disk of the laptop. The steps are: 1. Find something that will boot in 4mb ram and which can also create ext2 partitions. 2. Use it to create a swap partition and a small ext2 partition on the laptop's hard disk. 3. Uncompress the installation root-image and copy it onto the ext2 partition. 4. Boot the laptop from the installation boot-disk, pointing it at the ext2 partition on the hard disk. 5. The installation should go more or less as normal from here. The only question was whether a distribution that wouldn't install (under normal circumstances) on the laptops would run on them. The short answer is "Yes". If you're an old Linux hand then that's all you need to know. If not, read on - some of the steps listed above aren't as simple as they look. 3. Choices Made This section describes the choices available, which options are practical, which ones I decided on and why. 3.1. What to use to create the initial root partition? The best tool for this is a mini-Linux. There's a wide selection of small Linuces available on the net, but most of them won't boot in 4mb RAM. I found two that will: SmallLinux http://smalllinux.netpedia.net/ SmallLinux will boot in as little as 2mb RAM but its root disk can't be taken out of the drive, which is a shame since otherwise it has everything we need (i.e. fdisk, mkswap and mkfs.ext2). SmallLinux can create the needed partitions but can't be used to copy the root partition. muLinux http://sunsite.auc.dk/mulinux/ muLinux will boot in 4mb but only in a limited single-user mode. In this mode fdisk and mkswap are available but mkfs.ext2 and the libraries needed to run it are on the /usr partition which is not available in maintenance mode. To use muLinux to do the whole pre-installation procedure the files needed to create ext2 file-systems must be extracted from the usr disk image and copied onto a floppy. This gives the option of either using SmallLinux to create the partitions and muLinux to copy the root partition or using muLinux to do the whole job. Since I had two laptops I tried both. 3.2. The Distribution It didn't take much time to choose Slackware. Apart from the fact that I like it but haven't used it much and want to learn more, I considered the following points: · Slackware has possibly the most low-tech DIY install of all the major distributions. It is also one of the most flexible, coming with a wide range of boot-disk kernels to suit many different machines. This makes it well suited to the kind of hacking about required in this scenario. · Slackware supports all the methods listed in ``Which Installation method to use?''. · Slackware is a distribution designed by one person. I'm sure Patrick Volkerding won't object if I say this means its configuration tools are simpler and more streamlined. In my opinion this makes the job of trimming the installation to fit cramped conditions easier. Version 7.0 was the latest version when I tried this so that's what I used. 3.2.0.1. But I don't like Slackware! You don't have to use it. I can't answer for all the distributions but I know that Debian, Red Hat and SuSE offer a range of installation methods and have an "expert" installation procedure (-- Does Debian do any other kind? --) which can be used here. Most of the steps in this document would apply to any of the distributions without change. If you haven't used the expert method with your preferred distribution before, do a trial run on a simple desktop machine to get the feel of it and to explore the options it offers. 3.3. Which installation method to use? Floppy Install This means churning out 15 floppies - which only gives you an absolute minimal install and requires a second stage to get the apps you want on. It's also very slow on such low-spec machines. This is a last resort if you can't make the others work. Parallel-port Install Where the parallel port has an IDE device, parallel cable or pocket ethernet adaptor (-- A pocket lan adaptor installation onto these machines will be very slow. --) attached. This would be a good choice for the Aero, leaving the PCMCIA slot free to run the floppy drive. PCMCIA Install As above, this could be a CD-ROM or network install. This would be the best method for the T1910 - on the Aero it's a bit more awkward. ISA/PCI Ethernet Install Not an option for the laptops, obviously, but included in case your target machine is a desktop PC. The tools I had to hand dictated a PCMCIA network install. I will point out where steps differ for the other methods. Whichever method you choose, you need to have a higher-spec machine available - even if only to create the disks for a floppy install. 3.4. Partition Layout 3.4.1. Basic Requirement This procedure requires at least two Linux Native partitions in addition to a Swap partition. Since one of the ext2 partitions will be in use as temporary root during the installation it will not be available as a target partition and so should be small - though no smaller than 5mb. It makes sense to create for this a partition that you will re-use as /home after installation is complete. Another option would be to re-create it as a DOS partition to give you a dual boot laptop. 3.4.2. How complex a layout? There isn't room to get too clever here. There is an argument for having a single ext2 partition and using a swap file to avoid wasting space but I would strongly urge creating a separate partition for /usr. If you have only one partition and something goes wrong with it you may well be faced with a complete re-installation. Separating /usr and having a small partition for / makes disaster recovery a more likely prospect. On both machines I created 4 partitions in total: 1. A swap partition -- 16mb on the T1910, 20 on the Aero (I'm more likely to upgrade the memory on the Aero). 2. /home (temporary root during installation) -- 10mb 3. / -- 40mb on the T1910, 30mb on the Aero. 4. /usr -- All the remainder. In addition, the Aero uses hda3 for a 2mb DOS partition containing configuration utilities. See the Aero FAQs for details. 3.5. Which components to install? The full glibc libraries alone would nearly fill the hard disks so there's no question of building a development machine. It looks as if a minimal X installation can be squeezed in but I'm sure it would crawl and I don't want it anyway. I decide to install the following (for a full listing see ``Appendix A''): · The core Linux utilities · Assorted text apps from the ap1 file set: · Info/FAQ/HOWTO documentation · Basic networking utilities · The BSD games This selection matches the kind of machine described in ``What use is a small laptop?''. 4. The Pre-installation Procedure This section covers creating a swap partition and a temporary root partition on the laptop's hard disk. Nothing here is Slackware- specific. 4.1. muLinux Preparation If you are going to use only muLinux to for this procedure then you need to prepare a disk with mkfs.ext2 and supporting libraries on it. From the muLinux setup files uncompress USR.bz2 and mount it as a loop file-system. If you are in the same directory as the USR file and you want to mount it as /tmpusr then the sequence for this is: ______________________________________________________________________ losetup /dev/loop0 USR mount -t ext2 /dev/loop0 /tmpusr ______________________________________________________________________ >From there copy mkfs.ext2, libext2fs.so.2, libcomerr.so.2 and libuuid.so.1 onto a floppy. 4.2. Prepare the installation root files. Select the root disk you want - I used the color one with no problems but the text one would be slightly faster in these low memory conditions. Uncompress the image and mount it as a loop device. The procedure is the same as in the above section but the root disk image is a minix file-system. Next you need 3 1722 floppies or 4 1440 floppies with ext2 file- systems - it's better with 1722 disks as you don't need to split the /lib directory. Give one floppy twice the default number of inodes so it can take the /dev directory. That's 432 nodes for a 1722 disk or 368 for a 1440. If you specify /dev/fd0H1722 or /dev/fd0H1440 then you don't have to give any other parameters so for a 1722 disk do ______________________________________________________________________ mke2fs -N 432 /dev/fd0H1722 ______________________________________________________________________ If you have mounted the root image as /tmproot and the destination floppy as /floppy then cd to /tmproot. To copy the dev directory the command is ______________________________________________________________________ cp -dpPR dev/* /floppy/ ______________________________________________________________________ For the other directories with files in (bin, etc, lib, mnt, sbin, usr, var) it's ______________________________________________________________________ cp -dpPr directoryname/* /floppy/ ______________________________________________________________________ Don't bother with the empty ones (floppy, proc, root, tag, tmp) because you can simply create them on the laptop. boot and cdrom are soft links pointing to /mnt/boot and /var/log/mount respectively - you can also create them on the laptop. 4.3. Create the partitions. 4.3.1. Mini-Linuces and ext2 file-systems - an important note. To save space, small-Linux designers sometimes use older libc5 librariesand where they do use up-to-date libc6 they leave out may of the options compiled into full distributions, including some optional features of the ext2 file-system. This has two consequences: · Trying to mount ext2 disks formatted using a modern Linux system can generate error messages if you mount them read-write. Be sure to use the -r option when mounting floppies on the laptops. · It is not wise to use the mkfs.ext2 that comes with the mini-Linux to create file-systems on the partitions into which SlackWare will be installed. It should only be used to create the file-system on the temporary root partition. Once installation is complete this partition can be reformatted and re-used. 4.3.2. Procedure If installing on an Aero, make sure the floppy drive is inserted before switching on and do not remove it. 1. Boot from the mini-Linux (-- With muLinux, wait until the boot- process complains about the small memory space and offers the option of dropping into a shell - take that option and work in the limited single-user mode it gives you. --) 2. Use fdisk to create the partitions. 3. Reboot on leaving fdisk (with muLinux you may simply have to turn off and on again at this point). 4. Use mkswap on the swap partition and then activate it (this will make muLinux much happier). 5. If using muLinux then mount the extra floppy created in ``muLinux Preparation'', copy mkfs.ext2 into /bin and the libraries into /lib. 6. Use mkfs.ext2 to create the file-system on the temporary root partition. 7. If you have been using SmallLinux, shut down and reboot using muLinux. Don't forget to activate the swap partition again. 8. muLinux will have mounted the boot floppy on /startup - unmount it to free the floppy drive. 9. Now mount the temporary root partition and copy onto it the contents of the disks you created in ``Prepare the installation root files''. Do not be alarmed by the error messages: if, for example, you copy usr from the floppy to the temporary root partition by typing "cp -dpPr usr/* /tmproot/" then you'll get the error message "cp: sr: no such file or directory". Ignore this, nothing is wrong. 10. cd to the temporary root partition and create the empty folders (floppy, proc, root, tag, tmp) and the soft links boot (pointing to mnt/boot) and cdrom (to var/log/mount). 11. Unmount the temporary root partition - this syncs the disk. 12. You can simply turn off the machine now. 5. The Installation This section does not give much detail on the Slackware installation process. In fact, it assumes you are familiar with it. Instead, this section concentrates on those areas where special care or unusual steps are required. 5.1. Boot the machine Make a boot-disk from one of the images. I recommend you use bareapm.i on a laptop and bare.i on a desktop - unless you have a parallel-port IDE device (pportide.i). Boot the laptop from it. When the boot: prompt appears, type "mount root=/dev/hdax" where x is the temporary root partition. Log in as root. Then activate the swap partition. 5.2. Floppy/Parport CD-ROM Install In both these cases, no extra work should be necessary to access the installation media. Simply run setup. 5.3. Network/PCMCIA Install Slackware has supplementary disks with tools for these and instructions for their use greet you when you log in. Use the network disk on a desktop PC with ethernet card or a laptop with pocket ethernet adaptor. Use the PCMCIA disk for PCMCIA install. Once your network adapter/PCMCIA socket has been identified, run setup. 5.3.1. PCMCIA install on the Aero The Slackware installation process runs the PCMCIA drivers from the supplementary floppy. Because the Aero has a PCMCIA floppy drive, this means you can't remove the floppy drive to insert the PCMCIA CD- ROM/ethernet card. The solution is simple: the Slackware PCMCIA setup routine creates /pcmcia and mounts the supplementary disk there, so 1. Create the /pcmcia directory yourself 2. Mount the supplementary disk to /mnt. Be sure to specify the type as vfat - if you don't, it'll be incorrectly identified as UMSDOS and long filenames will be mis-copied. 3. cd /mnt;cp -dpPr ./* /pcmcia/ 4. Unmount the floppy. 5. Run pcmcia. When the script complains that there is no disk in the drive simply hit Enter: Card Sevices will start. Connect your PCMCIA device and hit Enter. 6. Run setup 5.4. Set-up The Slackware set-up program is straightforward. Start with the Keymap section and it'll take you forward step by step. 5.4.1. AddSwap You do need to do this step so it can put the correct entry in fstab but make sure it doesn't run mkswap - you're already using the partition. 5.4.2. Target In this section Slackware asks which partitions will be mounted as what and then formats them if you want. The safest bet here is to leave your temporary root partition out altogether and just edit fstab later once you know you don't need it for it's temporary purpose anymore. If you're going to reuse it as /home then it is OK to designate it as /home - obviously, don't format it now! If you intend to re-use it as a part of the directory structure that will have files placed in it during installation (/var, for example) then you absolutely must ignore it in this step: after the installation is complete you can move the files across. 5.4.3. Select Here you choose which general categories of software to install. I chose as follows: · A - Base Linux System · AP -Non-X applications · F - FAQs and HOWTOs · N - Networking tools and apps · Y - BSD games collection I wouldn't recommend adding to this - if anything, prune it back to A, AP and N. That gives you a core Linux setup to which you can add according to your needs. 5.4.4. Install Choose the Expert installation method. This allows you to select/reject for installation individual packages from the categories you chose in the Selection step. ``Appendix A'' goes through the precise choices I made . This part takes about 3 hours for a PCMCIA network install. You are prompted to select individual packages before the installation of each category, so you can't just walk away and leave it to run through. 5.4.5. Configure Once the packages are all installed, you are prompted to do final configuration for your machine. This covers areas like networking, Lilo, selecting a kernel etc. Some points to look out for: · If you did a PCMCIA install, don't accept the offer to configure your network with netconfig. This will ruin your pcmcia networking. Wait until you've rebooted and then edit /etc/pcmcia/network.opts · This is the point where you should install a kernel. For a laptop the bareapm kernel is best, for a desktop simply the bare one. 5.4.6. Exit The set-up process is finished but you are not. Do not reboot yet! There is another vital step to complete. 5.5. Pre-reboot Configuration On a normal machine you would simply reboot once the installation is complete. If you do that here you may have to wait 6 or 8 hours for a login prompt to appear and another half hour to get to the command prompt. Before rebooting you need to change or remove the elements that cause this slowdown. This involves editing config files so you need to be familiar with vi, ed or sed. At this stage your future root partition is still mounted as /mnt so remember to at that to the paths given here. /etc/passwd Edit this to change root's login shell to ash. ash really is the only practical login shell for 4mb RAM. /etc/rc.d/rc.modules Comment out the line 'depmod -a'. You only need to update module dependencies if you have changed your module configuration (recompiled or added new ones, for example). On a standard system it only takes a second or two and so it doesn't matter that it's needlessly performed each time. On a 4mb laptop it can take as much as 8 hours. When you do change your module set-up you can simply uncomment this line and reboot. Alternatively, change this part of the script so that it will only run if you pass a parameter at the boot-prompt. For example: ________________________________________________________________ if [ "NEWMODULES" == "1" ] ; then depmod -a fi ________________________________________________________________ /etc/rc.d/rc.inet2 This script starts network services like nfs. You probably don't need these and certainly not at start-up. Rename this script to something like RC.inet2 - that will stop it from being run at boot and you can run it manually when you need it. /etc/rc.d/rc.pcmcia On the Aero you should also rename this script, otherwise you'll lose the use of your floppy drive on start-up. It's worth considering for any other small laptop as well - you can always run it manually before inserting a card. Once these changes have been made, you are ready to reboot. 5.6. Post-reboot Configuration. If you made the changes recommended in section ``Pre-reboot configuration'' then the boot process will only take a few minutes, as opposed to several hours. Login as root and check that everything is functioning properly. 5.6.1. Re-use the temporary root. Once you are sure the installation is solid you can reclaim the partition you used as the temporary root. Don't just delete the contents, reformat the filesystem. Remember, the mke2fs that came with the mini-Linux is out of date. If you intend to re-use this partition as /home, remember not to create any user accounts until you have completed this step. 5.6.2. Other configuration tweaks. In such a small RAM space, every little helps. Go through SlackWare's BSD-style init scripts in /etc/rc.d/ and comment out anything you don't need. Have a look at Todd Burgess' Small Memory mini-HOWTO http://eddie.cis.uoguelph.ca/~tburgess/ for more ideas. 6. Conclusion That's it all done. You now have a laptop with the core utilities in place and 50 to 70mb spare for whichever extras you need. Don't mess it up because it's a lot easier to modify an existing installation on such cramped old machines than it is to start from scratch again. 7. Appendix A: This appendix lists which packages (if any) from each category might be included in the installation and gives my reasons for including or omitting them. I made no attempt to install X so those categories are ignored. Although this appendix refers specifically to the Slackware distribution it can be used as a guide with any of the major distributions. 7.1. A - Base Linux System Most of the packages in this category are essential, even those that aren't listed as required by the Slackware set-up program. Because of this, I've listed those packages that I felt could reasonably be left out rather than all the non-compulsory packages that I installed. 7.1.0.1. Packages considered for omission: kernels (ide, scsi etc.) There's no need to install any of these, you get a chance to select a kernel at the very end of the installation process. aoutlibs This is only needed if you intend to run executables compiled in the old a.out format. Omitting it saves a lot of space. Omitted. bash1 Bash2 (simply called bash in the Slackware package list) is required for the Slackware configuration scripts but there are a lot of scripts that need bash1. I included it. getty agetty is Slackware's default getty, this package contains getty and uugetty as alternatives. Only include it if you need their extra functionality. Omitted. gpm Personally, I find this very useful at the console (and the Aero's trackball is very handy) but it's not essential. Included. icbs2 Not needed. Omitted. isapnp No use here. Omitted. loadlin Not needed with the setup described here - unless your old laptop has some peculiarity that requires a DOS driver to initialise some of its devices. Omitted. lpr You could argue that you can do your printing from whichever desktop is nearest but I always find it useful to be have printing capabilities on a laptop. Included. minicom Not a compulsory include but I want the laptop to do dial-up connection. Very handy. Included. pciutils Not needed on these old laptops. Omitted. quota Not vital but it can be used to set limits that stop you from overflowing the limited space available in these laptops. Included. tcsh I recommend using ash as your login shell. Only include this if you need it for scripts. Omitted. umsprogs You can leave this out and still be able to access UMSDOS floppies. Omitted. scsimods No use on these laptops. Omitted. sysklogd This can interfere with apmd but it does provide essential information. Included. 7.1.0.2. Packages installed: aaa_base, bash, bash1, bin, bzip2, cpio, cxxlibs, devs, e2fsprog, elflibs, elvis, etc, fileutils, find, floppy, fsmods, glibcso, gpm, grep, gzip, hdsetup, infozip, kbd, ldso, less, lilo, man, modules, modutils, pcmcia, sh_utils, shadow, sudo, sysklogd, sysvinit, tar, txtutils, util, zoneinfo Combined size: 33.4 7.2. AP - Non-X Applications None of these packages are, strictly speaking, essential - although ash is really required for sensible operation in 4mb. Leaving them all out could save the vital space for you to squeeze in your favourite app. I selected a minimal set of tools that I don't like to do without. 7.2.0.1. Packages considered for inclusion: apsfilter Not much point having printing if you can only print text files. Included. ash This is the shell for low-memory machines, only taking up 60k. Use it as the default login shell unless you like waiting 10 seconds for the command prompt to reappear each time. Included. editors (jed, joe jove vim) elvis is the default Slackware editor and a required part of the installation. If, like me, you are a vi fan then that's all you need: installing vim would be wasteful duplication given the space restrictions. If you can't stand vi and need a more DOS- style editor then joe is small. Emacs fans with some self- discipline might consider jed or jove rather than pigging out on the full-size beast. Omitted. enscript If you already have apsfilter you don't really need this. Omitted. ghostscript Including the fonts this comes to about 7.5mb. One to leave until after the core installation, then consider if you need it. Omitted. groff Needed for the man pages. Included. ispell Not an essential butvery useful to the overenthusiastic touch- typist. included. manpages Included! mc Slackware offers a lightweight compilation of mc but I'm happier at the command prompt. Omitted. quota Not necessary on what is not a multi-user machine but you may,like me, find it handy to stop you from forgetfully wasting the little space you have. Included. rpm Don't bother. If you do have an rpm that you would like to squeeze in, use rpm2tgz on a desktop machine to turn it into a tgz package - then you can use the standard Slackware installation tools. Omitted. sc A useful little spreadsheet packed very small. Included. sudo Not essential but I find it useful here: it's a cramped environment and an awkward reinstall if you mess things up - sudo helps create user profiles with the power to do the things you need without carelessly wiping your disk. Included. texinfo Info documentation. Included. zsh Leave this out unless you're addicted to it or have scripts that must use it. Omitted. 7.2.0.2. Packages installed: apsfilter,ash, diff, groff, ispell, manpages, quota, sc, sudo, texinfo Combined size: 8.1 mb 7.3. D - Development Tools You could fit C or C++ into this space but the glibc library package is too big, so some pruning would be needed. Do the main installation first and then try it. There is room for Perl and Python. 7.3.0.1. Packages installed: None 7.4. E - Emacs I don't use Emacs and so saved myself some space. On the other hand, if you are an Emacs fan then you probably use it for e-mail, news and coding so you'll claim some of that space back by omitting other packages. If you do want Emacs it might be an idea to leave this out while doing the core installation. Once the laptop is up you can try fitting in what you want/need at your leisure. 7.4.0.1. Packages installed: None. 7.5. F - FAQs and HOWTOs If you know it all you don't need these. I installed the lot. 7.5.0.1. Packages installed: howto, manyfaqs, mini Combined size: 12.4 mb 7.6. K - Kernel Source You can just squeeze it in. If all you want to do is read the source, go ahead. 7.6.0.1. Packages Installed: None 7.7. N - Networking Tools and Apps These packages were selected to provide core networking tools, dial-up capability, e-mail, web and news. 7.7.0.1. Packages installed: dip, elm, fetchmail, mailx, lynx, netmods, netpipes, ppp, procmail, trn, tcpip1, tcpip2, uucp, wget Combined size: 15.1 mb 7.8. Tetex Another set that will barely squeeze in. I can't say how it would run in the space available. 7.8.0.1. Packages installed: None 7.9. Y - BSD Games Collection I'm addicted to several of these. If I really need that last 5mb they can go. 7.9.0.1. Packages installed: bsdgames Combined size: 5.4 mb 7.10. End result In total the installed packages plus kernel took up about 75mb of disk space of which 19.5mb was in the root partition and 55.5 in /usr. On the Aero that left 39mb in /usr, 74mb on the T1910. 8. Appendix B: Resources relevant to this HOWTO Linux Laptop HOWTO http://www.snafu.de/~wehe/Laptop-HOWTO.html Small Memory mini-HOWTO http://eddie.cis.uoguelph.ca/~tburgess/ Linux on Laptops http://www.cs.utexas.edu/users/kharker/linux-laptop/ HOWTOs and installation FAQs for a wide range of machines. Linux T1910 FAQ http://members.tripod.com/~Cyberpvnk/linux.htm Linux Contura Aero FAQ http://domen.uninett.no/~hta/linux/aero-faq.html Contura Aero FAQ http://www.reed.edu/~pwilk/aero/aero.faq Comprehensive FAQ on all aspects of the Contura Aero compiled by the moderators of the Aero mailing list. Good Linux section . How to Develop Accessible Linux Applications Sharon Snider Copyright © 2002 by IBM Corporation v1.1, 2002-05-03 Revision History Revision v1.1 2002-05-03 Revised by: sds Converted to DocBook XML and updated broken links. Revision v1.0 2002-01-28 Revised by: sds Wrote and converted to DocBook SGML. This document provides Linux software developers with guidelines and test cases for developing accessible Linux applications. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 2. Developing Accessible Applications 2.1. Principles for Developing Accessible Applications 3. Guidelines for Developing Accessible Applications 3.1. Keyboard Navigation 3.2. Mouse Interaction 3.3. Graphical Elements and Objects 3.4. Fonts and Text 3.5. Color and High Contrast Settings 3.6. Magnification 3.7. Audio 3.8. Animation 3.9. Focus 3.10. Visual Focus Indicator 3.11. Timing 3.12. Documentation 4. Additional Resources: 1. Introduction This document provides developers with the information necessary to assess their applications for accessibility. Some of these tests should be performed using various types of [http://www.ibiblio.org/pub/Linux/docs/HOWTO/ Accessibility-HOWTO] adaptive technologies. Please send any comments, or contributions via e-mail to [mailto: snidersd@us.ibm.com] Sharon Snider. This document will be updated regularly with new contributions and suggestions. ----------------------------------------------------------------------------- 2. Developing Accessible Applications Some of the most important reasons for developing accessible software are:   * Software should be accessible to as many users as possible.   * Accessibility to new products benefits everyone. Information technology has provided many benefits to society. However, individuals with disabilities can not participate fully when the technology does not meet the needs of users with disabilities.   * Compliance with worldwide regulations and standards such as [http:// www.section508.gov/] Section 508 of the Rehabilitation Act, [http:// www.usdoj.gov/crt/ada/adahom1.htm] Americans with Disabilities Act and [http://www.w3.org/WAI/Policy] the World Wide Web Consortium's Web Accessibility Initiative. ----------------------------------------------------------------------------- 2.1. Principles for Developing Accessible Applications Developers need to consider the following needs of disabled users when developing an accessible application:   * Choice of input methods. Support should be available for various types of input, such as, keyboard, mouse and adaptive technologies. Pay close attention to keyboard navigation.   * Choice of output methods. Support should be available for various types of output, such as, visual display, audio, and print. The main focus is that text labels are provided for all user interface elements and objects, graphics, and icons.   * Consistency and flexibility with the user's system configuration. In addition, include customization options so the user can select color, font, and layout of the work area. ----------------------------------------------------------------------------- 3. Guidelines for Developing Accessible Applications The following sections contain guidelines and tests that developers can use to create more accessible applications. Use Pass, Fail, or Pending as a rating system for each item. ----------------------------------------------------------------------------- 3.1. Keyboard Navigation 3.1.1. Guidelines The following keyboard navigation settings and sequences can cause accessibility problems. You should confirm that:   * Efficient keyboard access is provided to application features.   * A logical keyboard navigation order has been implemented.   * The correct tab order is used for controls that are dependent on check boxes, radio buttons, or toggle state.   * Keyboard access does not override existing accessibility features.   * The application provides more than one method to perform keyboard tasks whenever possible.   * There are alternate key combinations whenever possible.   * There are no awkward reaches for frequently performed keyboard operations.   * The application does not use repetitive, simultaneous key presses.   * The application provides keyboards equivalents for all mouse functions.   * The application does not use any general navigation functions to trigger operations.   * All the keyboard invoked menus, windows, and tool tips appear near the object they relate to. ----------------------------------------------------------------------------- 3.1.2. Tests Run the following keyboard tests without using the mouse for all actions. Using only the keyboard commands, move the focus through all menus in the application. You should verify that:   * Context sensitive menus display correctly.   * Any functions listed on the tool bar can be performed using the keyboard.   * You can operate every control in the client area of the application and dialog boxes.   * Text and objects within the application can be selected.   * Any keyboard enhancements or shortcuts are working as designed. ----------------------------------------------------------------------------- 3.2. Mouse Interaction 3.2.1. Guidelines The following are mouse button actions and sequences that cause accessibility problems. You should confirm that:   * There is no input dependent on mouse button two or three.   * All mouse operations can be canceled.   * There is visual feedback throughout a drag and drop operation. ----------------------------------------------------------------------------- 3.3. Graphical Elements and Objects 3.3.1. Guidelines The following are graphical element attributes, object attributes, and naming conventions that are needed for accessibility. You should confirm that:   * There are no hard-coded graphical attributes, such as, lines, borders, or shadow thickness.   * There are descriptive names for all application program interface (API) objects.   * All multi-color graphical elements can be adjusted to monochrome only, whenever possible.   * All interactive graphical user interface (GUI) elements are easily identifiable.   * An option to hide non-essential graphics has been provided. ----------------------------------------------------------------------------- 3.3.2. Tests Test the application using a screen reader and verify that:   * Labels and text are being read correctly, including menus and tool bars.   * Object information is read correctly. ----------------------------------------------------------------------------- 3.4. Fonts and Text 3.4.1. Guidelines The following are font and text styles, attributes, and labels that cause accessibility problems. You should confirm that:   * All the font styles and sizes are not hard-coded.   * An option to turn off graphical backdrops has been provided.   * All label objects have names that make sense when taken out of context.   * There are no label names that have been used more than once in the same window.   * There is consistency with label positioning throughout the application.   * When using static text as a label for a control, the label immediately precedes the control in tab order.   * An alternative to what you see is what you get (WYSIWYG) is provided. ----------------------------------------------------------------------------- 3.4.2. Tests Run the following tests to confirm that font size and settings are maintained.   * Change the font in the application and confirm that the changes apply only to the application and not the desktop environment.   * Change colors within the application and confirm that the changes apply only to the application and not the desktop environment.   * Run a screen magnification program and test the font, color, and size of text when being viewed through a magnifier. ----------------------------------------------------------------------------- 3.5. Color and High Contrast Settings 3.5.1. Guidelines The following are color and high contrast guidelines for the application environment. You should confirm that:   * The application color is not hard-coded and can be changed.   * Color is used as an enhancement and not the only way to convey information.   * The application supports various high contrast settings (For example, black on white, or white on black).   * The application is not dependent on a particular high contrast setting. ----------------------------------------------------------------------------- 3.5.2. Tests Run the following tests and verify that:   * All information is available by printing a screen shot to a black and white printer.   * All information is conveyed correctly when settings are set to only black and white or high contrast.   * At least three high contrast schemes are available, and they function correctly.   * High contrast settings in the desktop environment are respected by the application (For example, the window bar and font colors that are set by the desktop environment do not change). ----------------------------------------------------------------------------- 3.6. Magnification 3.6.1. Guidelines The following magnification functions should be built into the application. You should confirm that:   * The application provides the ability to magnify the work area.   * The application has the option to scale the work area.   * The applications not adversely effected by changing the magnification settings. ----------------------------------------------------------------------------- 3.7. Audio 3.7.1. Guidelines The following are guidelines for audio output. Using a screen reader, confirm that:   * The user can hear all required audio output.   * Audio is not the only means that the information is conveyed.   * The user can configure frequency and volume on all audio alerts and sounds. ----------------------------------------------------------------------------- 3.7.2. Tests The application should have an option to show audio alerts and sounds visually. Test that the audio is working correctly with sound enabled. Verify that:   * The application is working as designed when the user performs an action that generates an audio alert.   * The application works correctly when increasing or decreasing the volume.   * Warning messages and alerts can be heard correctly in a noisy work environment. ----------------------------------------------------------------------------- 3.8. Animation 3.8.1. Guidelines The following are guidelines for all animation that is included in the application. You should confirm that:   * There are no blinking elements with a frequency greater then 2 Hertz (Hz) and lower then 55Hz.   * There are no large areas that flash or blink. ----------------------------------------------------------------------------- 3.8.2. Tests Run the following tests on applications that include animation. You should verify that:   * An option is available to stop or turn off animation.   * When the animation is turned off it is working correctly. ----------------------------------------------------------------------------- 3.9. Focus 3.9.1. Guidelines Focus is determined by the location of the cursor as the user moves through the application or display panels. The following are guidelines for focus within the application. You should confirm that:   * Focus starts at the most commonly used controls.   * The current input focus is clearly displayed at all times.   * The input focus is in the active display panel.   * The appropriate feedback is provided when the user attempts to navigate past the end of a group of related objects.   * The default audio alert is played when the user presses an inappropriate key. ----------------------------------------------------------------------------- 3.10. Visual Focus Indicator 3.10.1. Guidelines The visual focus indicator tells the user the position of the cursor and provides enough information, so the user understands what to do next. The following are guidelines for the visual focus indicator. You should confirm that:   * There is sufficient audio information for the visual focus indicator, so the user can figure out what to do next.   * Screen readers and Braille devices can confirm the current cursor position within the application and read the content of the visual focus indicator. ----------------------------------------------------------------------------- 3.10.2. Tests Test the following using a screen reader or Braille device. You should verify that:   * When moving among objects the visual focus indicator is easy to identify.   * Keyboard navigation through the application menus is clearly visible when the focus moves.   * The screen reader or Braille device is tracking the visual focus indicator as you navigate using a keyboard.   * When running a screen magnification program that the magnifier can track the visual focus indicator accurately as you navigate using the keyboard and mouse. ----------------------------------------------------------------------------- 3.11. Timing 3.11.1. Guidelines The following guidelines apply to timing options built-in to the application. You should confirm that:   * There are no hard-coded timeouts or other time-based features.   * There are no objects that display briefly and then hide information based on the movement of the mouse pointer. ----------------------------------------------------------------------------- 3.11.2. Tests Test the following for timing related to your application. You should verify that:   * The user is notified before a message times out and is given the option to indicate that more time is needed.   * An option is available to adjust the response time and confirm that it is working as designed. ----------------------------------------------------------------------------- 3.12. Documentation 3.12.1. Guidelines The following are guidelines for writing accessible documentation:   * All documentation is in an accessible format (For example, HTML, or text).   * Documentation is available on all accessibility features of the application.   * State if the application does not support the standard keyboard access that is used by the operating system.   * Identify if there are unique keyboard commands.   * Identify and explain all accessibility features.   * When documenting mouse actions, include the alternative keyboard action as well. ----------------------------------------------------------------------------- 3.12.2. Tests Run the following test to verify that the documentation is available and accessible.   * Open a help file while in the application using a screen reader or Braille device and confirm the information is accessible, clear, and precise. ----------------------------------------------------------------------------- 4. Additional Resources: The following Web sites provide checklists and testing information that is more specific to the various Linux development environments:   * American Foundation for the Blind provides information on creating accessible applications at [http://www.afb.org/] http://www.afb.org/.   * GNOME Accessibility Project has written a guide specifically for application development in the GNOME 2.0 desktop. It includes information using their Accessibility Tool Kit (ATK). Additional information is available at [http://developer.gnome.org/projects/gap/guide/gad/ index.html] http://developer.gnome.org/projects/gap/guide/gad/index.html.   * IBM Accessibility Center provides links to a Java, Web, and Software accessibility checklist for application development. This site is located at [http://www-3.ibm.com/able/guidelines.html] http://www-3.ibm.com/able/ guidelines.html.   * Sun Accessibility provides accessibility information on designing accessible Java applications. More information is available at [http:// www.sun.com/access/developers/software.guide.html] http://www.sun.com/ access/developers/software.guide.html.   * The Web Accessibility Initiative Web site includes guidelines, checklists, and techniques for developing accessible Web sites and applications. Additional information is located at [http://www.w3.org/WAI /] http://www.w3.org/WAI/. Linux Accessibility HOWTO Michael De La Rue Sharon Snider v3.1, June 21, 2002 Revision History Revision v3.1 2002-06-21 Revised by: sds Updated and converted to DocBook XML. Revision v3.0 2001-10-25 Revised by: sds Updated and converted to DocBook SGML. Revision v2.11 1997-03-28 Revised by: mdlr Last Linuxdoc revision. The Linux Accessibility HOWTO covers the use of adaptive technologies that are available for the Linux operating system, as well as the software applications and hardware devices that can be installed to make Linux accessible to users with disabilities. The information provided targets specific groups of individuals with similar disabilities. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. Distribution Policy 2. The Linux Operating System 2.1. Assistive Technologies Available for Linux 2.2. Usability 3. Visual Impairments 3.1. Technologies for the Visually Impaired 4. Hearing Impaired 4.1. Assistive Technologies for the Hearing Impaired 5. Physically Disabled 5.1. Keyboard Navigation 5.2. Assistive Technologies for the Physically Disabled 5.3. Additional Resources 6. Cognitive, Language, and Other Impairments 6.1. Assistive Technologies for Cognitive, Language and Other Impairments 7. Developing Accessible Applications 8. Other Helpful Information 9. Acknowledgments 1. Introduction The purpose of this HOWTO is to introduce the tools, applications, and configuration utilities that are available to Linux users who are disabled. The information provided targets groups of individuals with the following disabilities:   * Visually Impaired   * Hearing Impaired   * Physically Disabled   * Cognitive, Language, and Other Impairments Please send any comments, or contributions via e-mail to [mailto: snidersd@us.ibm.com] Sharon Snider. This document will be updated regularly with new contributions and suggestions. ----------------------------------------------------------------------------- 1.1. Distribution Policy The Access-HOWTO may be distributed, at your choice, under either the terms of the GNU Public License version 2 or later or the standard Linux Documentation Project (LDP) terms. These licenses should be available from the LDP Web site: [http://www.linuxdoc.org/docs.html] http://www.linuxdoc.org /docs.html. Please note that since the LDP terms do not allow modification (other than translation), modified versions can be assumed to be distributed under the GPL. ViaVoice® is a registered trademark of International Business Machines Corporation. ----------------------------------------------------------------------------- 2. The Linux Operating System The Linux operating system has many software applications and utilities that run in the non-graphical environment. The graphical user interface (GUI), which is often referred to as X Windows, is clearly separate from the underlying non-graphical, text-only environment. One major reason that a visually impaired individual can use Linux is that network connectivity is built in to the operating system and provides full access to the Internet from the non-graphical interface. All visible text on the screen can be translated using a screen reader and speech synthesizer. Over the past few years many improvements have been made to the GUI, and many of the desktops now provide features and enhancements designed for accessibility. In the following sections you will find information on the tools, utilities, and applications that are available to assist users in configuring their desktop environment. ----------------------------------------------------------------------------- 2.1. Assistive Technologies Available for Linux Assistive technologies are computer hardware devices and software applications that provide individuals with impairments access to the information and applications on a computer. Although there are not many commercial applications available specifically for Linux accessibility, there are free software applications that can make the computer more accessible. Detailed information on assistive technologies that are available has been listed in this document based on the type of disability. ----------------------------------------------------------------------------- 2.2. Usability Linux has the advantage over Windows that a large majority of Linux software has been developed for the console. Although many programs are now being developed for the GUI, programs continue to be written for the non-graphical, text-based environment. Linux originated as a programmer's operating system and, for the physically disabled, this means that it is easy to build and customize programs to suit an individual's needs. The windowing system used by Linux (X11) includes many programming tools that enable further modification and customization of the GUI. KDE and GNOME have included many accessibility and usability features in their latest releases and are continuing to test, upgrade, and enhance the graphical environment. The following are links to KDE and GNOME's accessibility and usability projects:   * KDE Accessibility Project - [http://accessibility.kde.org/] http:// accessibility.kde.org/.   * GNOME Accessibility Project - [http://developer.gnome.org/projects/gap/] http://developer.gnome.org/projects/gap/.   * KDE Usability Project - [http://usability.kde.org/] http:// usability.kde.org/.   * GNOME Usability Project - [http://developer.gnome.org/projects/gup/] http://developer.gnome.org/projects/gup/. ----------------------------------------------------------------------------- 3. Visual Impairments There are two categories of visual impairments. Individuals who are partially sighted (for example, blurred vision, near and far-sightedness, color blindness) and those who are totally blind. Assistive technologies are available for the Linux operating system for visually impaired users, and many of the software packages are free. ----------------------------------------------------------------------------- 3.1. Technologies for the Visually Impaired The following is a list of assistive technologies for visually impaired users: ----------------------------------------------------------------------------- 3.1.1. Screen Readers Screen readers are software applications that are installed on the computer to provide translation of the information on the computer screen to an audio output format. The translation is passed to the speech synthesizer and the words are spoken out loud. Currently, fully functional screen readers are only available for Linux in console mode. This section describes some of the most common screen readers.   * Emacspeak is the complete Audio Desktop is an excellent non-graphical, text based interface for users who are visually impaired. This application can be used as a screen reader in conjunction with a hardware synthesizer or IBM ViaVoice® Run-time text-to-speech application. More information and software packages of are available at: [http:// www.cs.cornell.edu/home/raman/emacspeak/] http://www.cs.cornell.edu/home/ raman/emacspeak/. The Emacspeak HOWTO [http://www.ibiblio.org/pub/Linux/ docs/HOWTO/] http://www.ibiblio.org/pub/Linux/docs/HOWTO/ includes a tutorial and installation guide.   * Jupiter Speech System is a screen reader for Linux in console mode. A user guide and software packages are available at: [http://www.eklhad.net /linux/jupiter/] http://www.eklhad.net/linux/jupiter/.   * Screader is a screen reader for Linux in console mode that works with the Festival software speech synthesizer and the Accent hardware synthesizer. Information and downloads are available at: [http://www.euronet.nl/~acj/ eng-screader.html] http://www.euronet.nl/~acj/eng-screader.html.   * Speaker is a new plugin for the Konqueror file manager and Web browser. Speaker provides Text to Speech using the Festival speech system or IBM ViaVoice. Downloads are available at: [http://dogma.freebsd-uk.eu.org/ ~grrussel/speaker.html] http://dogma.freebsd-uk.eu.org/~grrussel/ speaker.html.   * Speakup is a screen review package for the Linux operating system. It requires a hardware speech synthesizer, such as the DecTalk Express. An installation boot disk and packages are available at: [http:// www.linux-speakup.org/] http://www.linux-speakup.org/ that allow a visually impaired user to install the Linux operating system.   * ZipSpeak is a talking mini-distribution of Linux. More information and software packages are available at: [http://www.linux-speakup.org/ zipspeak.html] http://www.linux-speakup.org/zipspeak.html. ----------------------------------------------------------------------------- 3.1.2. Speech Synthesizers Speech synthesizers can be a hardware device or a text to speech (TTS) software application that creates the sounds necessary to provide speech output. Hardware synthesizers are available for the Linux operating system; however, they can be very expensive and must be compatible with the screen reader application in order to function properly. The alternative is to download and install a software synthesizer such as IBM's ViaVoice or Festival and configure the application to a compatible screen reader, such as Emacspeak. ----------------------------------------------------------------------------- 3.1.2.1. Hardware Speech Synthesizers A hardware speech synthesizer is a device that is connected to the computer's serial or parallel port and translates the text to a spoken output. Normally there are Braille labels on all controls to indicate the off and on position, and volume control. Hardware synthesizers also have the ability to speak in different tones that can be setup to indicate various parts of a document or text. Some models will provide a connection for headphones. The following is a list of speech synthesizers that are supported on the Linux operating system and can be used with Emacspeak:   * Accent SA and Apollo 2 ([http://polio.dyndns.org/chip/vss.html] http:// polio.dyndns.org/chip/vss.html)   * DECTalk Express ([http://www.4access.com/synthesizers.asp] http:// www.4access.com/synthesizers.asp)   * DoubleTalk ([http://www.rcsys.com] http://www.rcsys.com) ----------------------------------------------------------------------------- 3.1.2.2. Software Speech Synthesizers A software speech synthesizer is an application that translates the text on the screen to speech output and provides speech synthesis, so that the screen reader application can read information out loud to the user.   * Festival is a general, multi-lingual speech synthesis system developed at the Center for Speech Technology Research (CSTR). It offers a full TTS system with various application program interfaces, as well as an environment for development and research of speech synthesis techniques. Mbrola or FestVox are needed to complete the Festival installation. Software packages and installation instructions are available at: [http:/ /www.cstr.ed.ac.uk/projects/festival/] http://www.cstr.ed.ac.uk/projects/ festival/.   * Mbrola is a speech synthesizer that can be used with a TTS application, such as, Festival to provide speech output. More information is available at: [http://tcts.fpms.ac.be/synthesis/mbrola.html] http://tcts.fpms.ac.be /synthesis/mbrola.html. ----------------------------------------------------------------------------- 3.1.3. Screen Magnifiers Screen magnifiers enable users that are partially sighted to view selected areas of the screen in a manner similar to using a magnifying glass.   * GMag is a screen magnifier for X Windows. It provides continuous magnification while you work, as well as the option to change the contrast of images at run-time. More information and downloads are available at: [http://projects.prosa.it/gmag/] http://projects.prosa.it/ gmag/.   * Puff is a screen magnifier for users who need a high magnification of text and graphics in X Windows. Puff follows the focus of the mouse or pointer and enlarges the portion of the screen under the cursor. In order for Puff to run properly on Linux the source code needs to be modified. This application is not a good option for inexperienced users. The software packages and source code modification instructions are available at: [http://trace.wisc.edu/world/computer_access/unix/unixshare.html] http://trace.wisc.edu/world/computer_access/unix/unixshare.html.   * SVGATextmode enlarges or reduces the font size for users who perfer to work in console mode. The normal text screen that Linux provides is 80 characters across and 25 vertically. After SVGATextmode is installed, the text can be displayed much larger. One example would be 50 characters across and 15 vertically. The program does not offer the ablitity to zoom in and out, but the user can re-size when necessary. The most current download is available at: [http://freshmeat.net/projects/svgatextmode/] http://freshmeat.net/projects/svgatextmode/. Do not run try to run SVGATextmode from an X Windows terminal. You must be in console mode for the display to function properly.   * UnWindows is a collection of programs that includes Dynamag, a screen magnification program that helps the user locate the mouse pointer. The source code is available for Dynamag as a stand alone application, or the entire UnWindows package can be downloaded at: [http://www.cs.rpi.edu/pub /unwindows/] http://www.cs.rpi.edu/pub/unwindows/. The entire UnWindows package will not work with Linux without programming modifications. However, the Dynamag application can be installed successfully without any additional code changes.   * Xzoom is a screen magnifier similar to Xmag that allows the user to magnify rotate or mirror a portion of the screen. The most current download is available at: [http://filewatcher.org/sec/xzoom.html] http:// filewatcher.org/sec/xzoom.html. ----------------------------------------------------------------------------- 3.1.4. Adjusting the Screen's Resolution The X Windows server can be setup with different screen resolutions. The ability to adjust the screen's resolution allows a partially sighted user to magnify the screen with a single key sequence. The steps to set up your system are as follows: 1. Changed directories, type cd /etc 2. Using a text editor, open the XF86Config file 3. Locate the line beginning with Modes and change it to Modes "1280x1024" "1024x768" "800x600" "640x480" "320x240" Note: The settings may vary based your monitor's highest resolution mode. 4. Save the file and exit. To enlarge the text on the screen type Ctrl+Alt+keypad-plus and to make the text smaller type Ctrl+Alt+keypad-minus ----------------------------------------------------------------------------- 3.1.5. Braille Devices Braille terminals are normally used by individuals who are totally blind and may be hearing impaired as well. A Braille display uses a series of pins to form Braille symbols that are continuously updated as the users changes focus. A Braille embosser is a hardware device for printing a hard copy of a text document in Braille. Braille translation software is required to translate the on-screen text to a Braille format. ----------------------------------------------------------------------------- 3.1.5.1. Braille Hardware Devices The following Braille devices have been listed on the hardware compatibility list of one or more of the following Braille translation applications:   * Braillex [http://www.redhat.com/mailing-lists/blinux-announce/ msg00031.html] http://www.redhat.com/mailing-lists/blinux-announce/ msg00031.html.   * Alva B.V.: ABT3xx, Delphi (serial and parallel ports), Satellite.  .   * Baum: Vario/RBT 40/80 (emulation 1/2) [http://www.baum.de/English/ homeeng1.htm] http://www.baum.de/English/homeeng1.htm.   * Blazie Engineering: BrailleLite 18/40 [http://www.freedomscientific.com/ index.html] http://www.freedomscientific.com/index.html.   * Handialog: VisioBraille 2040 [http://www.handialog.com/indexuk.htm] http: //www.handialog.com/indexuk.htm.   * Handy Tech Elektronik GmbH: BrailleWave, mod20, mod40, mod80 [http:// www.handytech.de/] http://www.handytech.de/.   * MDV: MB208/MB408L/MB408S (protocol 5) [http://www.cavazza.it/cnt/schede/ scheda-mb408l-eng.html] http://www.cavazza.it/cnt/schede/ scheda-mb408l-eng.html.   * Pulse Data International: BrailleNote 18/32 [http://www.pulsedata.co.nz/ graphics.htm] http://www.pulsedata.co.nz/graphics.htm.   * Telesensory Systems Inc.: Navigator 20/40/80 (latest firmware version only), PowerBraille 40/65/80 [http://www.telesensory.com/] http:// www.telesensory.com/.   * Tieman B.V.: CombiBraille 25/45/85, MiniBraille 20, MultiBraille MB125CR/ MB145CR/MB185CR [http://www.braillevoyager.nl/uk/index.html] http:// www.braillevoyager.nl/uk/index.html.   * Tiflosoft: VideoBraille 40 [http://www.tinlecco.it/tiflosoft/] http:// www.tinlecco.it/tiflosoft/. ----------------------------------------------------------------------------- 3.1.5.2. Braille Translation Software The following Braille translation applications are available for download:   * Brass is a new program that combines speech and Braille output. The current version is still in testing and can be downloaded at: [http:// www.butenuth.onlinehome.de/blinux/] http://www.butenuth.onlinehome.de/ blinux/.   * BrLTTY supports parallel port and USB Braille displays and provides access to the Linux console. It drives the terminal and provides complete screen review capabilities. It is available at: [http://dave.mielke.cc/ brltty/] http://dave.mielke.cc/brltty/.   * NFBTrans is a freeware Braille translator written by the National Federation for the Blind (NFB). Software packages are available for download at: [http://www.nfb.org/nfbtrans.htm] http://www.nfb.org/ nfbtrans.htm. ----------------------------------------------------------------------------- 3.1.6. Cursors for X Windows Changing the shape and size of the mouse cursor can help users who have a problem following or seeing the cursor. The X Big Cursor mini HOWTO explains how to configure enlarged mouse cursors with the X Windows system. This HOWTO is available at: [http://www.icewalk.com/doclib/howtos/mini/ X-Big-Cursor.html] http://www.icewalk.com/doclib/howtos/mini/ X-Big-Cursor.html. There are also a large select of cursors that can be downloaded at: [http:// themes.tucows.com/cursors.html] http://themes.tucows.com/cursors.html. ----------------------------------------------------------------------------- 3.1.7. Audio Audio can be very useful to users who are visually impaired. In most X Windows desktop environments audio alerts and sound events can be setup within the desktop control center by enabling sound and verifying that the option to show sound is activated. You will need to check the desktop users manual for setup and configuration of sound events. Locktones is an excellent application for providing toggle keys that sound an audio alert to warn the user that a keystroke has created a locking state such as Cap Locks, or Num Locks. The application can be downloaded at: [http: //leb.net/pub/blinux/] http://leb.net/pub/blinux/. Linux can also be configured to beep at the login prompt so the user knows when to type in the password. A configuration utility can be downloaded and installed that can provide this function at: [http://leb.net/pub/blinux/ bootmeup/] http://leb.net/pub/blinux/bootmeup/. ----------------------------------------------------------------------------- 3.1.8. Additional Resources   * Access Mozilla has a goal to build an accessible Web suite: browser, e-mail, news, composer and chat that conform to the W3C accessibility standards. More information is available at: [http:// access-mozilla.sourceforge.net/] http://access-mozilla.sourceforge.net/.   * Blind + Linux = BLINUX provides documentation, downloads and a mailing list that focus on users who are blind. Information and software packages are available at: [http://leb.net/blinux] http://leb.net/blinux.   * LaTex/Tex is an extremely powerful document preparation system and it can be used to produce large print documents. More information is available at: [http://www.emerson.emory.edu/services/latex/latex_toc.html] http:// www.emerson.emory.edu/services/latex/latex_toc.html.   * National Federation for the Blind's (NFB) purpose is to help blind persons achieve self-confidence and self-respect and to act as a vehicle for collective self expression by the blind. Information for blind users, as well as software are available at: [http://www.nfb.org/] http:// www.nfb.org/.   * Project Ocularis is run by volunteers, and the project's aim is to improve Linux accessibility through the creation of new free software and the modification of pre-existing free software. More information is available at: [http://ocularis.sourceforge.net] http:// ocularis.sourceforge.net.   * Screen is a standard piece of software that allows many different applications to run at the same time on a single terminal in console mode. Screen has been enhanced to support some Braille terminals directly. It is available for download at: [http://www.icewalk.com/ softlib/app/app_01508.html] http://www.icewalk.com/softlib/app/ app_01508.html.   * SuSE Linux is the first Linux distribution to support installation of the Linux operating system and applications that run on Linux in Braille. The Blinux screen reader runs in the background to enable visually impaired users to work in a Linux console environment. More information is available at: [http://www.suse.de/us/products/susesoft/70news/ new_in_70.html] http://www.suse.de/us/products/susesoft/70news/ new_in_70.html.   * xocr is an optical character recognition program that scans written text, such as a book and translates it to audio output, so the information is available to visually impaired users. More information is available at: [http://sal.unimedya.net.tr/Z/3/XOCR.html] http://sal.unimedya.net.tr/Z/3 /XOCR.html. ----------------------------------------------------------------------------- 4. Hearing Impaired For users who have hearing impairments the audio output must be conveyed visually on the screen. Most desktops provide visual audio alerts and warnings. In console mode the system can also be configured to provide visual bells. There is a "Visual Bells mini-HOWTO" written by Alessandro Rubini that provides the configuration details available at: [http://www.ibiblio.org/pub/ Linux/docs/HOWTO/mini/] http://www.ibiblio.org/pub/Linux/docs/HOWTO/mini/. ----------------------------------------------------------------------------- 4.1. Assistive Technologies for the Hearing Impaired The following is a list of assistive technologies for the hearing impaired: ----------------------------------------------------------------------------- 4.1.1. Telecommunications Devices for the Deaf (TDD) TDD allows for the user you to communicate over the telephone using the computer as a text terminal.   * Zapata is a computer-based, high-density telephony project. The current version is available for download as source code on at: [http:// www.zapatatelephony.org/project.html] http://www.zapatatelephony.org/ project.html. ----------------------------------------------------------------------------- 4.1.2. Closed Captioning Closed captioning provides text translation of spoken words to video display. Closed captioning can be used for distance learning, video-teleconferencing, audio from a CD-ROM, and other types of interactive technology.   * Ccdecoder is a closed captioned, extended data services decoder for the bttv and video4linux based tv video cards: [http://sourceforge.net/ projects/ccdecoder/] http://sourceforge.net/projects/ccdecoder/. ----------------------------------------------------------------------------- 5. Physically Disabled There are a wide range of physical disabilities that can impair a user's mobility, and many of these impairments need to be addressed on an individual basis. This section addresses impairments that apply to users who have difficulty using a mouse, pointing device, or keyboard. ----------------------------------------------------------------------------- 5.1. Keyboard Navigation There are features that are built into the Linux operating system that allow for additional keyboard configuration. In some of the X Windows desktops these settings can be changed from the control center. An application has also been developed for X Windows called AccessX and it provides a graphical user interface for configuring all the AccessX keyboard settings. These settings are:   * StickyKeys enable the user to lock modifier keys (for example, control and shift) allowing single finger operations in place of multiple key combinations.   * MouseKeys provide alternative keyboard sequences for cursor movement and mouse button operations.   * SlowKeys requires the user to hold the key down for a specified period of time before the keystroke is accepted. This prevents keystrokes that are pressed by accident from being sent.   * ToggleKeys sound an audio alert that warns the user that a keystroke created a locking state for keys, such as Caps Lock, and Num Lock.   * RepeatKeys allow a user with limited coordination additional time to release keys before multiple key sequences are sent to the application.   * BounceKeys or Delay Keys have a delay between keystrokes. This function can help prevent the system from accepting unintentional keystrokes. ----------------------------------------------------------------------------- 5.2. Assistive Technologies for the Physically Disabled The following is a list of assistive technologies for the physically disabled: ----------------------------------------------------------------------------- 5.2.1. On-Screen Keyboard On-screen keyboards enable a user to select keys using a pointing device, such as a mouse, trackball, or touch pad. This application can be used in place of a standard keyboard.   * GTkeyboard is a on-screen, graphical keyboard and can be downloaded at: [http://opop.nols.com/gtkeyboard.html] http://opop.nols.com/ gtkeyboard.html.   * GNOME Onscreen Keyboard (GOK)is a on-screen, graphical keyboard that enables users to control their computer without having to rely on a standard keyboard or mouse. MOre information is available at [http:// www.gok.ca] http://www.gok.ca. ----------------------------------------------------------------------------- 5.2.2. Speech Recognition Speech recognition utilities are used by people with mobility impairments, so they can operate the computer using voice control.   * Open Mind Speech is a development project for speech recognition tools and applications. Information for the project and a mailing list are available at: [http://freespeech.sourceforge.net/] http:// freespeech.sourceforge.net/.   * ViaVoice Dictation for Linux allows you to write documents using your voice rather than a keyboard. Information and downloads are available at: [http://www-4.ibm.com/software/speech/dev/] http://www-4.ibm.com/software /speech/dev/.   * This site has information and links related to several different speech recognition utilities. [http://www.trace.wisc.edu/world/computer_access/ unix/unixshar.html] http://www.trace.wisc.edu/world/computer_access/unix/ unixshar.html. ----------------------------------------------------------------------------- 5.3. Additional Resources The following is a list of additional Web sites that may be of interest to users with mobility impairments:   * This site provides a kernel patch that can be downloaded to enable a one-handed keyboard. The download is available at: [http:// www.fourtytwo.de] http://www.fourtytwo.de.   * Configuration and information on Adapting the Linux Keyboard for a one handed user is available at: [http://www.eklhad.net/linux/app/ onehand.html] http://www.eklhad.net/linux/app/onehand.html.   * Morseall allows the user to control a Linux shell by tapping Morse code on the left mouse button: [http://sourceforge.net/projects/morseall] http://sourceforge.net/projects/morseall.   * The keyboard and console HOWTO provides additional keyboard configuration information. [http://www.ibiblio.org/pub/Linux/docs/HOWTO/ Keyboard-and-Console-HOWTO] http://www.ibiblio.org/pub/Linux/docs/HOWTO/ Keyboard-and-Console-HOWTO.   * There is a Speech Recognition HOWTO, written by Stephen Cook that provides complete details for anyone interested in learning more about speech recognition applications. [http://www.linuxdoc.org/HOWTO/] http:// www.linuxdoc.org/HOWTO/. ----------------------------------------------------------------------------- 6. Cognitive, Language, and Other Impairments Cognitive and language impairments include dyslexia and problems with; memory, comprehension, problem solving, and written language. For many individuals with cognitive and language disabilities, complex graphical displays and inconsistent use of words can make using the computer more difficult. A user with epilepsy can have a seizure from an application with blinking lights and animation. Most desktops now allow users to disable animation. Web browsers such as Mozilla and Netscape allow users to disable graphics. It is important to check the documentation for preferences that are available in the desktop environment you are using, as well as any applications that are used. This section discusses the tools that are available to aid users with these impairments: ----------------------------------------------------------------------------- 6.1. Assistive Technologies for Cognitive, Language and Other Impairments The following is a list of assistive technologies that can be helpful to users with cognitive, language, and other impairments: ----------------------------------------------------------------------------- 6.1.1. Screen Readers and Speech Synthesis Screen readers with speech synthesis enable the system to read on-screen information and text out loud to the user. This type of assistive technology can be particularly helpful to individuals who have dyslexia and other learning disabilities. Although there are no screen readers available for the GNOME desktop, screen reader applications are available for Linux in console mode that provide this functionality.   * Emacspeak is a speech interface that will provide audio output for all text. The program works in terminal and console mode and requires a software or hardware speech synthesizer. The downloads and users manuals are available at: [http://www.cs.cornell.edu/home/raman/emacspeak/] http: //www.cs.cornell.edu/home/raman/emacspeak/.   * The Trace Center provides information and downloads for various screen readers and speech synthesizers. More information is available at: [http: //www.trace.wisc.edu/world/computer_access/unix/unixshar.html] http:// www.trace.wisc.edu/world/computer_access/unix/unixshar.html. ----------------------------------------------------------------------------- 6.1.2. Keyboard filters and Word Processing Keyboard filters and word processing applications that have word prediction and spell checking utilities can be an excellent aid for users with learning and language impairments. ----------------------------------------------------------------------------- 6.1.3. Speech Recognition Speech recognition applications enables you to control the computer with your voice rather than having to type or write out the information.   * CVoice Control is a speech recognition system that enables a user to connect spoken commands to UNIX commands. More information is available at: [http://www.kiecza.de/daniel/linux/] http://www.kiecza.de/daniel/ linux/.   * IBM ViaVoice Dictation for Linux allows the user to write documents using their voice rather then a keyboard and can read the information back to the user. More information is available at: [http://www-4.ibm.com/ software/speech/dev/] http://www-4.ibm.com/software/speech/dev/.   * Open Mind Speech is a development project for speech recognition tools and applications. The developers have established a mailing list for asking questions and obtaining information at: [http:// freespeech.sourceforge.net/] http://freespeech.sourceforge.net/.   * XVoice enables continuous speech to text dictation for many applications. More information is available at: [http://www.compapp.dcu.ie/~tdoris/ Xvoice/] http://www.compapp.dcu.ie/~tdoris/Xvoice/. ----------------------------------------------------------------------------- 7. Developing Accessible Applications It is important to consider accessibility when developing new applications for the Linux operating system. The American Foundation for the Blind, the GNOME Accessibility Project, IBM, Sun, and W3C have written guidelines that are excellent road maps for developing and testing new Linux software. The following Web sites provide the tools, checklists and testing information to help developers write accessible programs for impaired users.   * American Foundation for the Blind provides information on creating accessible computer applications at: [http://www.afb.org/ info_document_view.asp?documentid=198] http://www.afb.org/ info_document_view.asp?documentid=198.   * GNOME Accessibility Project has written a guide specifically for application development for the GNOME 2.0 desktop. More information is available at: [http://developer.gnome.org/projects/gap/guide/gad/ index.html] http://developer.gnome.org/projects/gap/guide/gad/index.html.   * IBM Accessibility Center provides links to a software accessibility checklist, testing information, and the Section 508 Rehabilitation Act. This site is located at: [http://www-3.ibm.com/able/guidelines.html] http://www-3.ibm.com/able/guidelines.html.   * Sun Accessibility provides information on designing applications for accessibility at: [http://www.sun.com/access/developers/ software.guides.html] http://www.sun.com/access/developers/ software.guides.html and an Accessibility Quick Reference Guide is available at: [http://www.sun.com/access/developers/ access.quick.ref.html] http://www.sun.com/access/developers/ access.quick.ref.html.   * W3C User Agent Accessibility Guidelines 1.0 provides guidelines on accessible Web browser development including multimedia players and Web related software. [http://www.w3.org/TR/UAAGIO/] http://www.w3.org/TR/ UAAGIO/. ----------------------------------------------------------------------------- 8. Other Helpful Information The following is a list of additional information that may be helpful, but is not necessarily targeting a specific disability:   * The CMU Sphinx Group Source has released a set of reasonably mature, speech components that provide a basic level of technology to anyone interested in creating speech enabled applications. More information is available at: [http://fife.speech.cs.cmu.edu/sphinx/] http:// fife.speech.cs.cmu.edu/sphinx/.   * Access to Linux documentation is critical to learning and using Linux. The Linux Documentation Project has links to many Linux HOWTOs, mini HOWTOs, and guides, as well as information on becoming involved in authoring new HOWTOs. More information and downloads are available at: [http://www.linuxdoc.org] http://www.linuxdoc.org.   * RPMFind.net provides rpm downloads for Linux applications on most Linux operating systems. The site is located at:[http://www.rpmfind.net] http:/ /www.rpmfind.net.   * Sourceforge provides updated information, documentation, and software for Linux. Some of the applications available are under development. More information and downloads are available at: [http://www.sourceforge.net] http://www.sourceforge.net.   * The Trace Center provides accessibility information and software for the Linux operating system. More information is available at: [http:// trace.wisc.edu/worl/computer_access/] http://trace.wisc.edu/worl/ computer_access/ and the Linux Accessibility Resource Site (LARS) [http:/ /trace.wisc.edu/linux/] http://trace.wisc.edu/linux/.   * W3C Web Accessibility initiative provides information and links on Web site accessibility. More information is available at: [] . ----------------------------------------------------------------------------- 9. Acknowledgments These are the orignal acknowledgments documented by Michael De La Rue. They have been included in their entirety to ensure that each persons efforts to make Linux more accessibile are acknowledged. Much of this document was created from various information sources on the Internet, many found from Yahoo and DEC's Alta Vista Search engine. Included in this was the documentation of most of the software packages mentioned in the text. Some information was also gleaned from the Royal National Institute for the Blind's help sheets. T.V. Raman, the author of Emacspeak has reliably contributed comments, information and text as well as putting me in touch with other people who he knew on the Internet. Kenneth Albanowski [mailto:kjahds@kjahds.com] kjahds@kjahds.com provided the patch needed for the Brailloterm and information about it. Roland Dyroff of [http://www.suse.de/] S.u.S.E. GmbH (Linux distributors and makers of S.u.S.E. Linux (English/German)) looked up KTS Stolper GmbH at my request and got some hardware details and information on the Brailloterm. The most major and careful checks over of this document were done by James Bowden, [mailto:jrbowden@bcs.org.uk] jrbowden@bcs.org and Nikhil Nair [mailto:nn201@cus.cam.ac.uk] nn201@cus.cam.ac.uk, the BRLTTY authors who suggested a large number of corrections as well as extra information for some topics. The contributors to the blinux and linux-access mailing lists have contributed to this document by providng information for me to read. Mark E. Novak of the Trace R and D centre [http://trace.wisc.edu/] http:// trace.wisc.edu/ pointed me in the direction of several packages of software and information which I had not seen before. He also made some comments on the structure of the document which I have partially taken into account and should probably do more about. Other contributors include Nicolas Pitrie and Stephane Doyon. A number of other people have contributed comments and information. Specific contributions are acknowledged within the document. This version was specifically produced for [http://www.redhat.com/] RedHat's Dr. Linux book. This is because they provided warning of it's impending release to myself and other LDP authors. Their doing this is strongly appreciated since wrong or old information sits around much longer in a book than on the Internet. No doubt you made a contribution and I haven't mentioned it. Don't worry, it was an accident. I'm sorry. Just tell me and I will add you to the next version. ACPI: Advanced Configuration and Power Interface Emma Jane Hogbin [http://www.xtrinsic.com] xtrinsic     Erich Schubert - Author of the section on DSDT. Revision History Revision v1.2 2003-07-08 Added the abstract. Revision v1.1 2003-07-03 Added updates for the 2.4.21 kernel, the latest stable kernel at the time. Revision v1.0 2003-07-01 Initial release, reviewed by LDP Revision v0.2 2003-06-12 Outlines how to patch a kernel for ACPI support. ----------------------------------------------------------------------------- Table of Contents 1. About this document 2. Copyright and License 3. About ACPI 4. Why switch? 5. DSDT: Differentiated System Description Table 6. Installing from scratch 6.1. Choosing a kernel 7. Backups 8. Required packages 9. Download and patch 9.1. Unpack 9.2. Patch 10. Configure the new kernel 11. Compile the new kernel 12. Install the new kernel 13. Reboot and test 14. Load related modules 15. Switching from APM to ACPI 16. Using ACPI 17. References and Resources 18. Thanks A. ACPI the Non-Debian Way A.1. Compile the kernel A.2. Install the new kernel A.3. Software packages B. GNU Free Documentation License B.1. PREAMBLE B.2. APPLICABILITY AND DEFINITIONS B.3. VERBATIM COPYING B.4. COPYING IN QUANTITY B.5. MODIFICATIONS B.6. COMBINING DOCUMENTS B.7. COLLECTIONS OF DOCUMENTS B.8. AGGREGATION WITH INDEPENDENT WORKS B.9. TRANSLATION B.10. TERMINATION B.11. FUTURE REVISIONS OF THIS LICENSE B.12. How to use this License for your documents 1. About this document When I first started the switch from APM to ACPI I didn't realize the kernel needed to be patched. My problem (insanely loud fan) was fixed just by upgrading to 2.4.20 (Debian packaged kernel with an earlier patch from [http: //acpi.sourceforge.net] acpi.sourceforge.net). Unfortunately after the first upgrade I wasn't able to halt my computer without using the power switch to power-down my computer. It wasn't until later that I realized I had an old, ineffectual ACPI patch. This HOWTO was written to summarize the install process for myself, and hopefully help others who are also having a hard time finding information about ACPI. Please note: the main article outlines [http: //www.debian.org] The Debian Way of doing things. There is also generic information in the Appendix A for those of you who prefer ... the generic way. ----------------------------------------------------------------------------- 2. Copyright and License Copyright (c) 2003 Emma Jane Hogbin. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in Appendix B. ----------------------------------------------------------------------------- 3. About ACPI In the world of power management ACPI is relatively new to the game. It was first released in 1996 by Compaq/Hewlett-Packard, Intel, Microsoft, Phoenix and Toshiba. These developers aimed to replace the previous industry standard for power management. Their [http://www.acpi.info] ACPI.info site contains the official specifications, a list of companies that support ACPI and a number of other goodies. This is definitely not required reading, but may be of some interest to the insanely curious. ACPI allows control of power management from within the operating system. The previous industry standard for power management, Advanced Power Management (APM), is controlled at the BIOS level. APM is activated when the system becomes idle--the longer the system idles, the less power it consumes (e.g. screen saver vs. sleep vs. suspend). In APM, the operating system has no knowledge of when the system will change power states. ACPI can typically be configured from within the operating system. This is unlike APM where configuration often involves rebooting and entering the BIOS configuration screens to set parameters. ACPI has several different software components:   * a subsystem which controls hardware states and functions that may have previously been in the BIOS configuration These states include:   + thermal control   + motherboard configuration   + power states (sleep, suspend)   * a policy manager, which is software that sits on top of the operating system and allows user input on the system policies   * the ACPI also has device drivers that control/monitor devices such as a laptop battery, SMBus (communication/transmission path) and EC (embedded controller). If you would like more information on power management in laptops, check out the resources on [http://www.tuxmobil.org] tuxmobil.org. Specifically: Power Management with Linux - APM, ACPI, PMU and the [http://tuxmobil.org/ Mobile-Guide.db/mobile-guide-p2c1-hardware-in-detail.html] Hardware in Detail section of the Linux Mobile Guide. ----------------------------------------------------------------------------- 4. Why switch? Not all systems support both APM and ACPI. I switched because my system only supported ACPI. Pretty easy decision really. If you're switching to get [http://acpi.sourceforge.net/documentation/sleep.html] S3 (suspend to RAM) support and you're using a 2.4.x kernel, don't bother. It is [http:// lists.debian.org/debian-laptop/2003/debian-laptop-200304/msg00418.html] not supported. Period. Not sure if your system is supported? ACPI4Linux has a list of supported machines/BIOSes started on their Wiki. Please contribute to the list if you've installed ACPI! They also have a list of machines that are [http:// acpi.sourceforge.net/documentation/blacklist.html] not supported. ----------------------------------------------------------------------------- 5. DSDT: Differentiated System Description Table Thanks to [http://www.vitavonni.de/] Erich writing this section. You might need to override the DSDT when certain features like battery status are incorrectly reported (usually causing error messages to syslog). DELL laptops usually need this kind of override. Fixed DSDT for many systems are available on the [http://acpi.sourceforge.net/dsdt/index.php] DSDT page, along with a patch that tells the kernel to ignore the BIOS-supplied table but use the compiled-in fixed DSDT. Basically you need to copy the fixed table into your kernel source with a special filename (or modifing the filename in the patch supplied at the [http://acpi.sourceforge.net/dsdt/index.php] DSDT page) This override is quite easy: instead of loading the DSDT table from bios, the kernel uses the compiled-in DSDT table. That's all. ----------------------------------------------------------------------------- 6. Installing from scratch ACPI is constantly being revised. It is currently not available in the 2.4.x series kernels but will be released into the 2.5.x version kernels (or possibly not until 2.6). This means all kernels released before 2.5.x must be patched. The patches are available from [http://acpi.sourceforge.net] acpi.sourceforge.net. You need to get the patch that exactly matches the version of the kernel that you are running. Since this is the "install from scratch" section I will assume you know exactly which kernel you will be installing. ----------------------------------------------------------------------------- 6.1. Choosing a kernel This document was written for the 2.4.20 kernel. Since that time the 2.4.21 kernel has been released as the latest stable kernel. There have been mixed levels of success with 2.4.21 and the latest ACPI patch. For now I recommend sticking to the 2.4.20 kernel and its latest patch: 2002.12.12. Others recommend doing other things. A Google through the debian-user, debian-laptop and acpi-support email lists will be of help to you if you're not sure what you should do for your specific system. Note For sanity's sake this document will use the 2.4.20 kernel as an example, substitute your own kernel version as appropriate. It is important to use the latest version of the ACPI patch. Some distributions have already patched their kernels. This is the case for Debian, and may be the case for others. For more information on the patches that have been applied to the Debian kernel source package scan through: /usr /src/kernel-source-/README.Debian. If you are not using Debian you will probably still be able to find an equivalent file for your distribution. A user on acpi-support confirmed that I shouldn't need any of the additional patches that have been applied to the kernel to run my laptop. If you are running a production-level server and/or are serving web pages to the internet, you should really apply any additional security patches. Warning If a kernel has had other patches applied to it, you may have problems applying the ACPI patch. Of course, an ACPI patch should not be applied to a kernel that is already patched for ACPI. As long as there has not been an ACPI patch applied to the kernel it should be possible to apply one now. Depending on the patches applied, you may need to modify some of the Makefiles for your patch to be successful. This is beyond my current grasp of reality so it is not covered in this document. ----------------------------------------------------------------------------- 6.1.1. Debian-ized pre-patched kernel If you would prefer to use a Debian-ized kernel instead of a fresh one, [http://people.debian.org/~maxx] maxx has provided a pre-patched kernel-source package with the latest patch for the 2.4.20 kernel. This would be instead of downloading a fresh (non-patched) kernel from [http:// www.kernel.org] www.kernel.org. He sent me an email with the following details:   I took the kernel-source 2.4.20-8 from unstable, removed the   ACPI changes [i.e. the old patch] and applied acpi-20021212-2.4.20.diff.gz from acpi.sf.net since the vanilla 2.4.20 HAS several security leaks (ptrace, hash table, ...). You can find the package at [http://people.debian.org/~maxx/ kernel-source-2.4.20/] http://people.debian.org/~maxx/ kernel-source-2.4.20/ (I didn't upload the .orig.tar.gz since you can get it from any debian mirror and the .deb is already big enough) --[http://people.debian.org/~maxx] maxx   Warning I have not tested these packages. You may or may not have any luck with them. Please don't email me asking about them, ask maxx instead. ----------------------------------------------------------------------------- 7. Backups If you are already running a kernel that is the same version of the one you are about to patch I recommend creating a fresh directory for the newly patched kernel. Remember that backups are never a bad thing. These are the files that I back up:   * /etc/lilo.conf   * /usr/src/*.deb (Debian-specific)   * /etc/modules   * /etc/modutils/aliases   * /usr/src/linux/.config   * If you are not doing things The Debian Way you should also back up the / lib/modules directory, /boot/vmlinuz, /usr/src/linux/arch/i386/boot/ bzImage and /usr/src/System.map. It's possible my notes on the location of these files differs. Do a locate if they're not where I've stated they should be. ----------------------------------------------------------------------------- 8. Required packages Since I was starting on a brand new machine, I'm pretty sure I have the full list of required packages to make this whole patch go smoothly. Here's the list all in one go:   * kernel source files   * ACPI patch that exactly matches the kernel version   * debian packages: make, bzip2, gcc, libc6-dev, tk8.3, libncurses5-dev, kernel-package   * after you've patched the kernel add the debian packages: acpid, acpi (Debian testing and unstable only) ----------------------------------------------------------------------------- 9. Download and patch Download a fresh kernel from [http://www.kernel.org] www.kernel.org. You need to make sure you get a full kernel. Find the "latest stable version of the Linux kernel" and click on F for FULL. Wait patiently. A bzipped kernel is about 26M. If you're feeling particularly geeky you could also wget http:// kernel.org/pub/linux/kernel/v2.4/linux-.tar.bz2. Tip You may or may not want the latest stable version. For more information read the Section 6.1 section of this document. If you decide to use a version of the kernel that is not published on the front page, use the [http://www.kernel.org/pub/linux/kernel/] /pub/linux/kernel directory on the [http://www.kernel.org] kernel.org site to find the kernel you'd like. While you're waiting, grab a copy of the patch as well. For the 2.4.20 kernel use the 2.4.20 patch. It's dated 2002.12.12. You'll need to know that number later when we check to make sure the patch worked. If you are using a different kernel version make sure you take note of the date of your patch. Your numbers will differ slightly from the one I use later on. Once you've got those two files (the kernel and the patch) unpack them and patch the kernel. ----------------------------------------------------------------------------- 9.1. Unpack First we're going to set the stage to patch the kernel. We need to unpack the bz2 file (bzip2) and shuffle the directories around a bit. /usr/src/linux probably points to your current kernel. We need it to point to the new kernel, so we'll do that as well.   *  cd /usr/src   *  mkdir kernel-source- (use an alternate name if you already have a version of this kernel installed)   *  cp linux..tar.bz2 /usr/src/kernel-source-   *  cd /usr/src/kernel-source-   *  tar xjfv linux..tar.bz2   *  mv linux. /usr/src/linux-   *  rm linux (assuming that's a link to your old kernel)   *  ln -s /usr/src/linux- linux ----------------------------------------------------------------------------- 9.2. Patch Now we're going to actually patch the kernel. I take one extra step from [http://acpi.sourceforge.net/download.html] the instructions at ACPI4Linux. Instead of gunzipping and patching in the same line, I use two lines. This is purely a matter of preference. When you patch the kernel you want to make sure there are no error messages. (There is no "yay" line, instead look for the absence of errors.)   *  cd /usr/src/linux   *  cp acpi-20021212-2.4.20.diff.gz /usr/src/linux/. (Your patch filename will be different if you're not using the 2.4.20 kernel.)   *  gunzip acpi-20021212-2.4.20.diff.gz   *  patch -p1 < acpi-20021212-2.4.20.diff (this is the actual patching part) ----------------------------------------------------------------------------- 10. Configure the new kernel Now instead of using make menuconfig, I have a godsend of an option. Check this out: copy your current .config file into /usr/src/linux. Now use "make oldconfig". It will run through your old config file and see what's been updated so that you don't have to find all the new options. For everything to do with ACPI (about the first 5 questions for me, but possibly more for you if you've never configured a pre-patched kernel) say M for module. There are an extra 3 or so things after that which I said "no" to. In point form, this is how the kernel should be configured:   *  cd /usr/src/linux   *  cp /usr/src//.config .config   *  make oldconfig (say M to all new options for ACPI--you can also say "Y" if you prefer to compile it directly into your kernel) Now go in to the config file with make menuconfig. I want you do check and make sure you have your APM (the old stuff) turned off. Under "General Setup" , make sure that:   *  Power Management Support is ON   *  APM (Advanced Power Management) is OFF (this is the old one--you don't even want it as a module unless you really know what you're doing. And if you really know what you're doing you're probably not reading this.)   * everything to do with ACPI should be M (modules) or * (compiled directly into the kernel) exit and save the new configuration ----------------------------------------------------------------------------- 11. Compile the new kernel If you have additional modules that are not part of the main source tree, you will need to add modules_image when you make your Debian packages. This is almost inevitable if you're using a laptop. I have three things are not part of the stock kernel that I install separately: my graphics card (nvidia); sound (ALSA); and my wireless card (PCMCIA).   *  cd /usr/src/linux   *  make-kpkg clean   *  make-kpkg --append-to-version=. kernel_image modules_image Note Naming kernel builds   I no longer use .date to distinguish kernel builds. It was too frustrating to have 030627a, 032627b (etc) as I tried to figure things out. I now use names, in alphabetical order, starting with the kernel build "alien". I'm going to leave the date option in though as I still think it's a good way to do things. Note Kernel compile help   For non-Debian instructions see the Appendix "ACPI the Non-Debian Way". For more information on how to compile the kernel The Debian Way please read Creating custom kernels with Debian's kernel-package system ----------------------------------------------------------------------------- 12. Install the new kernel I like to configure lilo on my own, but do whatever tickles your fancy.   *  cd /usr/src   *  dpkg -i kernel-image-._10.00.Custom_i386.deb At this point I decline all the lilo updates and configure it myself by hand.   * configure lilo by hand: vi /etc/lilo.conf   *  load the new kernel into lilo: lilo   *  If you have any other deb files for your modules you should install them now as well. If you're not sure check /usr/src for additional .deb files. Note Kernel compile help   For non-Debian instructions see the Appendix "ACPI the Non-Debian Way". For more information on how to compile the kernel The Debian Way please read Creating custom kernels with Debian's kernel-package system ----------------------------------------------------------------------------- 13. Reboot and test At this point you should reboot your machine. When your system comes back up (assuming of course that everything went well and you still have a system), check to see what kernel you're running with uname -a. It should show you the one you just built. You also need to make sure the correct patch was installed. You can do that with dmesg | grep ACPI.*Subsystem\ revision . It should give the output: ACPI: Subsystem revision 20021212. The revision is the date the patch was released. This number will be different than mine if you are not using the 2.4.20 kernel. To look at all ACPI-related bits that were loaded/started when your system rebooted, do this: dmesg | grep ACPI . dmesg prints your boot messages and grep ACPI makes sure that only ACPI-related messages are printed. You can also check to see what version you're using with cat /proc/acpi/info. Don't believe everything you read though. My output says that S3 is a supported state, but we already know it's not. It does give the correct version though, which is useful. ----------------------------------------------------------------------------- 14. Load related modules If you compiled ACPI support in as "M"odules you'll probably need to load the modules by hand. You'll need to hunt around a bit to see what modules are there. Mine are in /lib/modules/. /kernel/drivers/acpi/, and are as follows: -rw-r--r-- 1 root root 4.1k Jun 3 23:57 ac.o -rw-r--r-- 1 root root 9.5k Jun 3 23:57 battery.o -rw-r--r-- 1 root root 5.2k Jun 3 23:57 button.o -rw-r--r-- 1 root root 3.7k Jun 3 23:57 fan.o -rw-r--r-- 1 root root 14k Jun 3 23:57 processor.o -rw-r--r-- 1 root root 11k Jun 3 23:57 thermal.o -rw-r--r-- 1 root root 6.2k Jun 3 23:57 toshiba_acpi.o The first time I rebooted I loaded them all by hand, typing insmod < modulename>. I personally load processor first, although there are mixed feelings on whether or not the order matters. Note Kernel modules   The module name is the bit before .o extension on a module filename. processor.o is the file, and processor is the module name. To install a loadable kernel module use: insmod processor. You can check to see which modules are loaded with lsmod. My output of lsmod (with most of the extras removed) looks like this: Module Size Used by Tainted: P button 2420 0 (unused) battery 5960 0 (unused) ac 1832 0 (unused) fan 1608 0 (unused) thermal 6664 0 (unused) processor 8664 0 [thermal] NVdriver 945408 11 The last one is my graphics card, which uses proprietary drivers. This is why I have a "P" next to Tainted on the top line. Note Operating System Power Management (OSPM)   The first time I tried this the modules were all in separate directories and were ospm_. This was probably because I was using an old patch, but it is something to be aware of. The OSPM modules are now deprecated so hopefully you won't see them. To prevent having to load the modules each time you reboot you can do one of two things: compile them directly into the kernel (bit late for that though, eh?), or add them to your /etc/modules file. If you don't already have a copy of the file just create a new one and add each module name (remember, no dot-o) on a separate line. ----------------------------------------------------------------------------- 15. Switching from APM to ACPI Don't let apmd and acpid run at the same time unless you REALLY know what you're doing. Debian will not make sure only one is running at a time. You will have to check. APM will try to put your system into S3. On the 2.4.x (and before) series kernels this will quite probably hang your machine. S3 is not supported until 2.5.x. Even the patch won't provide support for S3. I personally did an apt-get remove apmd to solve the hanging problem. You should also be aware of another little glitch I discovered. The XFree86 server has an option for DPMS (Energy Star) features. The DPMS can states can be one of standby, suspend, off or on. Since the 2.4.x kernels cannot suspend to disk, this can cause problems. I fixed my system by doing two things:   *  xset -dpms (disables DPMS features)   *  In /etc/X11/XF86Config-4 I commented out the line Option "DPMS" under Section "Monitor". ----------------------------------------------------------------------------- 16. Using ACPI There are a few different applications/daemons you will want to install on your system: acpid (the daemon that will control your hardware states), and acpi (the interface to monitor events and states) are the base install. The acpi Debian package is only available in testing and is unstable. If you're running stable you won't be able to install it without playing around with apt and your list.sources file. You can probably also compile from source. If you do get acpi installed you can use it to monitor your system like this: acpi -V. The output will tell you about your system. Mine looks like this: Thermal 1: ok, 47.1 degrees C Thermal 2: ok, 45.1 degrees C AC Adapter 1: off-line <-- running off battery AC Adapter 1: on-line <-- running off AC power Unfortunately, the -V "full version" doesn't work for me. Fortunately I can still look in each of the acpi files individually for information about my system. Check in the /proc/acpi directory for various things of importance. If I want to check my battery I read the following file like this: cat /proc/ acpi/battery/BAT0/state. The output is as follows: present: yes capacity state: ok charging state: discharging <-- running off battery present rate: unknown remaining capacity: 3920 mAh <-- watch this number present voltage: 14800 mV present: yes capacity state: ok charging state: discharging present rate: unknown remaining capacity: 3840 mAh <-- capacity getting smaller present voltage: 14800 mV present: yes capacity state: ok charging state: charging <-- AC adapter plugged in present rate: unknown remaining capacity: 3840 mAh present voltage: 14800 mV If I want information about my battery in general I check it out like this: cat /proc/acpi/battery/BAT0/info present: yes design capacity: 3920 mAh last full capacity: 3920 mAh battery technology: rechargeable design voltage: 14800 mV design capacity warning: 30 mAh design capacity low: 20 mAh capacity granularity 1: 10 mAh capacity granularity 2: 3470 mAh model number: Bat0 serial number: battery type: Lion OEM info: Acer You're smart people. You can probably figure it out from here. :) ----------------------------------------------------------------------------- 17. References and Resources The following URLs were incredibly useful in writing this HOWTO and generally getting ACPI up and running. HOWTOs HOWTO install ACPI under Linux http://sylvestre.ledru.info/howto/howto_acpi.php Linux ACPI-HOWTO http://www.columbia.edu/~ariel/acpi/acpi_howto.txt Linux on the road, formerly: Linux Laptop HOWTO http://tuxmobil.org/howtos.html You'll need to scroll a bit, or use the HTML version: http://tuxmobil.org/Mobile-Guide.db/Mobile-Guide.html Hardware in Detail (part of Linux on the road) http://tuxmobil.org/Mobile-Guide.db/ mobile-guide-p2c1-hardware-in-detail.html Power Management with Linux - APM, ACPI, PMU http://tuxmobil.org/apm_linux.html Creating custom kernels with Debian's Kernel-Package system http://newbiedoc.sourceforge.net/system/kernel-pkg.html Hardware-specific Install Reports and Info Installation Reports http://acpi.sourceforge.net/wiki/index.php/InstallationReports Blacklist http://acpi.sourceforge.net/documentation/blacklist.html DSDT: Overview http://acpi.sourceforge.net/dsdt/index.php Includes links to patched DSDTs and HOWTOs about patching your own DSDT. BIOS Settings for the AcerTM (Phoenix BIOS) http://help.nec-computers.com/au/pri/item_instr_bios_7521N.asp Software Development Groups ACPI4Linux http://acpi.sf.net ACPI Special Interest Group http://www.acpi.info/ Intel http://developer.intel.com/technology/iapc/acpi/ Mailing List Threads debian-laptop thread: can't restore from suspend http://lists.debian.org/debian-laptop/2003/debian-laptop-200304/ msg00367.html acpi-support thread: newbie HOWTO and debian patching http://sourceforge.net/mailarchive/forum.php?forum_id=7803&max_rows= 25&style=flat&viewmonth=200304&viewday=17 debian-laptop thread: acer 634 acpi & apm http://lists.debian.org/debian-laptop/2002/debian-laptop-200212/ msg00242.html ACPI packages and related software The Kernel Remember to choose "F" for full when you download your kernel source. http://www.kernel.org Debian-ized kernel maxx's pre-patched 2.4.20-8 kernel source package. For more information see maxx's notes. http://people.debian.org/~maxx/kernel-source-2.4.20/ ACPI kernel patch You'll need to pick the version that exactly matches the kernel you're using. http://sourceforge.net/project/showfiles.php?group_id=36832 acpid the daemon http://sourceforge.net/projects/acpid acpi text interface http://grahame.angrygoats.net/acpi.shtml Kacpi graphical interface for KDE http://www.elektronikschule.de/~genannt/ kacpi/download.html aKpi another KDE interface http://akpi.scmd.at/ wmacpi WindowMaker DockApp (another GUI) http://www.ne.jp/asahi/linux/timecop/ wmacpi+clecourt WindowMaker DockApp (another graphical interface). Handles two battery slots. http://open.iliad.fr/~clecourt/wmacpi/index.html ----------------------------------------------------------------------------- 18. Thanks Much thanks goes out to the following:   *  [http://acpi.sourceforge.net/mailinglists.html] acpi-support   *  [http://lists.debian.org/debian-laptop/] debian-laptop   *  [http://lists.debian.org/debian-user/] debian-user   *  [http://linuxchix.org/] techtalk   *  TLDP mailing lists (discuss and docbook)   * Sebastian Henschel for reminding me I'd promised to write it all down   * Erich Schubert for writing the section on DSDTs   * Werner Heuser for suggesting I submit the document to The LDP   * Tabatha Marshall for editing and generally being very enthusiastic about learning DocBook ----------------------------------------------------------------------------- A. ACPI the Non-Debian Way There is very little difference between The Debian Way and the generic way. In fact it's probably only 10 or so lines of difference. ----------------------------------------------------------------------------- A.1. Compile the kernel The "normal" way of compiling a kernel does not use make-kpkg. Instead, it uses the following steps:   *  cd /usr/src/linux which should point to the 2.4.20 kernel (unzipped) files   *  make dep   *  make clean   *  make bzImage   *  make modules (remember to unpack your modules first) ----------------------------------------------------------------------------- A.2. Install the new kernel In The Debian Way, you create a deb file which contains information about where the kernel is (and makes the kernel and yada-yada). In the "normal" way, you put things where they need to be right away. You need to install your modules and then configure lilo to point to the new kernel and then run lilo. If you are not doing things The Debian Way your "install" will look like this:   *  cd /usr/src/linux   *  make modules_install   *  cp arch/i386/boot/bzImage /boot/vmlinuz.   *  vi /etc/lilo.conf and copy the structure of your existing kernel. Do NOT delete the reference to your existing kernel! You need to point lilo to the "vmlinuz" file that was created when you compiled the kernel above   * lilo (yup, just exactly like that.) Lilo will let you know if it's going to have major problems loading the new kernel. Warning Do NOT forget to run lilo before rebooting. Type lilo. It's that easy (and that easy to forget). ----------------------------------------------------------------------------- A.3. Software packages You can still use all of the software mentioned in this HOWTO even if you're not using Debian. Unfortunately it will take a little more effort on your part to download and install everything. Fortunately it's really not that difficult. Most software packages include a README file when you gunzip them which will explain what you need to do to get things working on your system. Tip Software downloads   For more information about software for ACPI, please use the ACPI packages and related software. ----------------------------------------------------------------------------- B. GNU Free Documentation License Version 1.1, March 2000 Copyright (C) 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. ----------------------------------------------------------------------------- B.1. PREAMBLE The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software. We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference. ----------------------------------------------------------------------------- B.2. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language. A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque". Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only. The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text. ----------------------------------------------------------------------------- B.3. VERBATIM COPYING You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies. ----------------------------------------------------------------------------- B.4. COPYING IN QUANTITY If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects. If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document. ----------------------------------------------------------------------------- B.5. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission. B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has less than five). C. State on the Title page the name of the publisher of the Modified Version, as the publisher. D. Preserve all the copyright notices of the Document. E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below. G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice. H. Include an unaltered copy of this License. I. Preserve the section entitled "History", and its title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission. K. In any section entitled "Acknowledgements" or "Dedications", preserve the section's title, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. M. Delete any section entitled "Endorsements". Such a section may not be included in the Modified Version. N. Do not retitle any existing section as "Endorsements" or to conflict in title with any Invariant Section. If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles. You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. ----------------------------------------------------------------------------- B.6. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice. The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Acknowledgements", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements." ----------------------------------------------------------------------------- B.7. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document. ----------------------------------------------------------------------------- B.8. AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate. ----------------------------------------------------------------------------- B.9. TRANSLATION Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail. ----------------------------------------------------------------------------- B.10. TERMINATION You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. ----------------------------------------------------------------------------- B.11. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See [http://www.gnu.org/copyleft/] http:// www.gnu.org/copyleft/. Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. ----------------------------------------------------------------------------- B.12. How to use this License for your documents To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright (c) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A copy of the license is included in the section entitled "GNU Free Documentation License". If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise for Back-Cover Texts. If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software. Linux ACP Modem (Mwave) mini-HOWTO Mike Sullivan sullivam@us.ibm.com Paul Schroeder paulsch@us.ibm.com Joy Yokley - Edited document and coverted to DocBook v4.1 (SGML). 2001-01-12 Revision History Revision .11 2002-07-18 Revised by: PBS Revision .10 2001-07-18 Revised by: PBS Revision .09 2001-05-21 Revised by: PBS Revision .08 2001-05-09 Revised by: JEY Revision .07 2001-04-30 Revised by: JEY This document describes how to build, setup, and use the driver and user space application necessary for using the ACP (Mwave) Modem in the IBM Thinkpad?? 600, 600E, and 770 models which include the on board ACP modem hardware. The latest version of this document can always be found at [http:// www.ibm.com/linux/ltc/] http://www.ibm.com/linux/ltc/ ----------------------------------------------------------------------------- Table of Contents 1. General Information and Hardware Requirements 1.1. Introduction 1.2. Credits 1.3. Where Can I Get the Latest Version of this Driver? 1.4. Are There Any Other Resources? 1.5. Copyright Notice and Disclaimer 1.6. Which Systems are Supported 1.7. Features of the Modem 2. Compilation, Installation, and Startup 2.1. Prerequisites 2.2. Building and Installing Source 2.3. Setting Things Up 2.4. Runtime 3. Resolving Installation and Configuration Problems 3.1. DSP Does Not Start 3.2. Resource Conflicts 3.3. Not Connecting at Specified Starting Speed 3.4. Dialer Application Cannot Detect Serial Port 3.5. PPP Errors Using 2.4.0 Version of the Kernel 4. Debugging Tips 4.1. Error Logs 4.2. Tracing 5. Test Claims 6. List of Supported Countries 7. Trademarks 1. General Information and Hardware Requirements 1.1. Introduction The ACP Modem for Linux is a software based modem. Support software for the ACP modem is composed of a loadable kernel module and a user level application. Together these components support direct connection to public switched telephone networks (PSTNs) and support selected countries world wide. Refer to Section 6 of this document for the supported country list. The modem also supports the standard communications port interface (ttySx) and is compatible with the Hayes AT Command Set. ACP Modem software is continually under development. If you encounter bugs or usability issues, please contact us and we'll work to correct them. ----------------------------------------------------------------------------- 1.2. Credits This Linux ACP Modem driver was ported from the Windows NT?? version of the driver available from IBM. Many thanks to Keith Frechette, Charles Ball, and Frank Novak for their technical and support efforts in making this project possible. ----------------------------------------------------------------------------- 1.3. Where Can I Get the Latest Version of this Driver? The latest version of this driver is available from [http://www.ibm.com/linux /ltc/] http://www.ibm.com/linux/ltc/ ----------------------------------------------------------------------------- 1.4. Are There Any Other Resources? Thomas Hood's [http://panopticon.csustan.edu/thood/tp600lnx.htm] Debian GNU/ Linux on IBM ThinkPad 600 and 600x page contains lots of useful information. ----------------------------------------------------------------------------- 1.5. Copyright Notice and Disclaimer Copyright (c) 2002 IBM Corporation This document may be reproduced or distributed in any form without prior permission. Modified versions of this document may be freely distributed, provided that they are clearly identified as such, and this copyright is included intact. This document is provided "AS IS", with no express or implied warranties. Use the information in this document at your own risk. ----------------------------------------------------------------------------- 1.6. Which Systems are Supported This version of the ACP Modem driver supports the IBM Thinkpad?? 600E, 600, and 770 that include on-board ACP modem hardware. ----------------------------------------------------------------------------- 1.7. Features of the Modem The ACP Modem provides the following features:   * Standard asynchronous COM port interface (NS16550A UART compatible) operation   * Bell-103/212A, CCITT-V.21/V.22,V.22bis protocols with data from 300 to 2400 bps   * CCITT-V.32 protocols with data rates of 4800, 9600 uncoded, and 9600 bps Trellis coded (Optional)   * CCITT-V.32bis protocols with data rates of 4800, 9600, 12000, and 14400 bps (optional)   * ITU-T V.34 protocols with data rates from 2400 to 33600 bps.   * 56K capable modem   * Hayes AT Command Set compatibility   * DTMF and pulse dialing   * Asynchronous error recovery protocol   * Error correction via Microcom Network Protocol (MNP) classes 1-4   * Error correction via the V.42 error correction standard   * MNP class 5 for up to 2x data compression   * V.42bis for up to 4x data compression   * "Adaptive Rate Negotiation" which provides for "Fallback / Fallforward" as line quality deteriorates or improves Your modem contains 56K technology. To take advantage of this technology, you must first make sure that your Internet Service Provider (ISP) supports a 56K modem protocol. Significantly higher modem connection speeds, up to 56kbps, require all-digital transmission connections from your ISP to the line card in the central office from which your phone line is connected. The actual connection rate may be limited by the quality of your telephone lines. Telephone line quality may vary from location to location. Current regulations limit maximum trasfer rates to 53K. While your modem contains 56K technology, typical maximum connection rates in the receive direction may be significantly less than 56K. Currently, 56K capability is for the receive direction only. The transmit direction uses V.34 technology. ----------------------------------------------------------------------------- 2. Compilation, Installation, and Startup 2.1. Prerequisites   * A 2.2.16 series (or later) Linux kernel source tree   * An appropriate set of module utilities   * gcc version 2.7.x or later If you are building the ACP Modem driver along with the user space application, you need to have a complete Linux source tree for your kernel, not just an up-to-date kernel image. ----------------------------------------------------------------------------- 2.2. Building and Installing Source 1. Use tar xzvf mwavem-yyyymmdd.tar.gz to unpack the distribution. 2. Change directories with cd mwavem-yyyymmdd 3. Use the ./configure command to configure the build options. Issue ./ configure --help to view all of the options. The defaults are probably okay though. Note NOTE   As of mwavem-1.0.3 you must give ./configure the --enable-mwavedd argument in order to build the driver with the user space application. 4. Use the make command to build all of the ACP Modem binaries. Note NOTE   Your gcc package should be at least at the 2.7.x level. Check your /usr/src/linux/Documentation/Changes file for the minimum version information. 5. Use make install to install the mwavem binary, mwavem.conf configuration, the extra binary (mostly .dsp) files, and module device driver (if you specified that it must be built) and to create the /dev/modems/mwave device node. ----------------------------------------------------------------------------- 2.3. Setting Things Up In the [WORLDTRADE] section of your mwavem.conf file, set the Country= parameter to your country access code. Note NOTE   The mwavem.conf file is installed in the /usr/local/etc directory unless you specified otherwise during the build process Country information (including access codes) are listed in the mwavem.conf file. For example, for France the following section is present: [Telephony\Country List\33] CountryCode=00000021 Name=France SameAreaRule=0FG LongDistanceRule=0FG InternationsalRule=00EFG To set France to be your configured country in the [WORLDTRADE] section of mwavem.conf, +---------------------------------------------------------------------------+ |set Country=33 | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.4. Runtime An initialization script has been provided which may be used to to start, stop, or check the status of the ACP Modem driver and application. It has been successfully run on the Debian, Slackware, SuSE, and Red Hat distributions and should run on any of their derivitives. If you are using the runtime script, it will load the mwave device driver module, configure the serial port, and start the mwave manager for you. All of the options which can be passed to the device driver module, along with some options for the script itself, can be configured by uncommenting and editing the appropriate variables at the beginning of the script. The mwaved startup script can be found in the src/mwavem directory of the source distribution. If you are running the Red Hat distribution, you can copy the script to your /etc/rc.d/init.d directory and issue the ntsysv command in order to enable it at boot time. If not using Red Hat, see the documentation for your distribution for information on how to set this up to run at boot time. It is recommended that you use the provided mwaved script. If you are not using the script, however, the following sections will describe how to manually start the device driver and application. ----------------------------------------------------------------------------- 2.4.1. Loading the ACP device driver To load the mwave device driver use +---------------------------------------------------------------------------+ |insmod mwave | +---------------------------------------------------------------------------+ or +---------------------------------------------------------------------------+ |modprobe mwave | +---------------------------------------------------------------------------+ The following arguments may be supplied with the insmod command: Note NOTE   The following arguments are not persistent from boot to boot (i.e. We are not saving them in the BIOS).   * mwave_3780i_irq=5/7/10/11/15 This parameter allows you to configure the IRQ used by the DSP if the DSP IRQ was not set and stored in BIOS by the Thinkpad?? configuration utility.   * mwave_3780i_io=0x130/0x350/0x0070/0xDB0 This parameter allows you to configure the I/O range used by the DSP if the DSP I/O range was not set and stored in the BIOS by the Thinkpad?? configuration utility.   * mwave_uart_irq=3/4 This parameter allows you to configure the IRQ used by the ACP UART if the Mwave's UART IRQ was not set and stored in BIOS by the Thinkpad?? configuration utility.   * mwave_uart_io=0x3f8/0x2f8/0x3E8/0x2E8 This parameter allows you to configure the I/O range used by the ACP UART if the UART I/O range was not set and stored in BIOS by the Thinkpad?? configuration utility. The following code is an example of how to run DSP using ttyS1 resources: +------------------------------------------------------------------------------------------+ |insmod mwave mwave_3780i_irq=10 mwave_3780i_io=0x0130 mwave_uart_irq=3 mwave_uart_io=0x2f8| +------------------------------------------------------------------------------------------+ Note NOTE   The mwave is unable to check for resource conflicts. It is your responsibility to ensure that none of the resources specified conflict with other (commonly PCMCIA) devices. You can use the tpctl package on Linux or the Thinkpad?? configuration utility on Windows NT or DOS to manage the configuration of Thinkpad?? related resources. ----------------------------------------------------------------------------- 2.4.2. Running ACP Modem Application 1. Once the ACP device driver is loaded successfully, use the mwavem command to execute the application. Note NOTE   The location of the mwavem.conf file can be specified as an argument to the mwavem application. If not specified the default location is assumed to be /usr/local/etc/mwavem.conf unless otherwise changed during the build process. 2. Setup the serial driver to recognize the UART provided by the ACP driver. +---------------------------------------------------------------+ |setserial /dev/ttyS0 autoconfig | +---------------------------------------------------------------+ Note NOTE   Substitute /dev/ttyS0 to match the serial port you have configured the DSP to use. Note NOTE   You may wish to create a symbolic link from your modem device to your serial device for convenience. Example: ln -s /dev/ttyS0 /dev/modem The ACP Modem is now available for use by your favorite dialing application. ----------------------------------------------------------------------------- 3. Resolving Installation and Configuration Problems The following sections list solutions to possible problems you may experience. ----------------------------------------------------------------------------- 3.1. DSP Does Not Start In order to recognize memory above 64 Meg, it may be necessary to append the "mem=" option to the kernel command line. If you are using LILO for your boot loader, you would do this in the lilo.conf file. For example, if you had a machine with 128 Meg you would type: +---------------------------------------------------------------------------+ |append="mem=130496K" | +---------------------------------------------------------------------------+ Note NOTE   Your statement must reflect 576K less than you actually have. Specifying the full amount of memory will prevent the DSP from starting. In the above example, the formula used to arrive at the proper number was 1024 * nMB - 576 = nK. If you forget to run the Thinkpad?? utility to enable the ACP Modem and you didn't specify any command line arguments when inserting the mwave module (or it didn't work), you will receive a message in the syslog, similar to the one below: ACP Modem, UART settings IRQ 0x3 IO 0x2f8 tp3780::EnableDSP, pSettings->bDSPEnabled 0 failed Mwave Modem, ERROR cannot Enable DSP error fffffffb Mwave Modem, ERROR cannot perform Mwave Initialization retval fffffffb If you receive a message like the one above, check the command line arguments you provided to insmod. ----------------------------------------------------------------------------- 3.2. Resource Conflicts The ACP Modem requires the use of system resources for both the DSP and the UART provided by the ACP chip. For Linux systems, you will specify parameters to use for the duration of the boot with the insmod mwave command line parameters listed in Section 2.4.1. Typically the configured resources are: For the DSP: IRQ 10, I/O address 0x130-0x13f For the UART: IRQ 3, I/O address 0x2f8 (if using ttyS1) IRQ 4, I/O address 0x3f8 (if using ttyS0) For dual boot systems we recommeded that you use the Thinkpad?? Configuration Utility on Windows NT or DOS to configure these system resources. Windows NT Thinkpad Configuration Utility Notes: (Under the Internal Modem -- > Advanced selection) 1. Set IRQ sharing to disabled 2. Set 1st IRQ to your DSP IRQ (10 is recommened) 3. Set 2nd IRQ to your UART IRQ (i.e. ttyS1 is equivalent to COM2) 4. Set the DSP I/O address (0x130 is recommended) 5. Set the internal modem I/O address to the UART I/O address (i.e. 0x2f8 for COM2) 6. The DMA address is unused and can be set to anything. Note NOTE   You may also specify parameters to use for the duration of the boot by using the insmod mwave line parameters listed in section Section 2.4.1. ----------------------------------------------------------------------------- 3.3. Not Connecting at Specified Starting Speed The configured initial connection speed is set to 64000. The modem should start there and negotiate down to a connection speed based on target modem and line capabilities. If the modem is unable to connect it may be having difficulty negotiating with the target modem. Try setting the SPEED parameter in mwavem.conf to a lower initial starting speed. Supported speeds include:   * 64000   * 33600   * 14400   * 9600   * 2400 ----------------------------------------------------------------------------- 3.4. Dialer Application Cannot Detect Serial Port The startup script that executes the serial port setup works well with Red Hat, Debian, Slackware, and SuSE. If you are not running one of these distributions, you may need to perform the following steps in order to set up. After inserting the mwave.o module and starting the mwavem application, you must run the setserial command in order for the serial port configuration to discover the UART on the mwave hardware: +---------------------------------------------------------------------------+ |setserial /dev/ttySx autoconfig | +---------------------------------------------------------------------------+ Replace ttySx with the serial port you have configured the ACP driver to use. To test whether the serial port is setup correctly, run: +---------------------------------------------------------------------------+ |setserial /dev/ttySx | +---------------------------------------------------------------------------+ The above command should return the following for serial port 1: +---------------------------------------------------------------------------+ |/dev/ttyS1, UART: 16550A, Port: 0x2f8, IRQ: 3 | +---------------------------------------------------------------------------+ The port and IRQ numbers should match the information placed in the syslog by the ACP module when it was loaded: kernel: Mwave Modem, UART settings IRQ 0x3 IO 0x2f8 If the information returned by setserial indicates that the UART is 'unknown' or if the IRQ and I/O resources do not match what you have in the syslog, you will need to reconfigure. Check the setserial man pages to learn how to setup the resources on your ttySx to match what appears in the syslog output. If you have problems running setserial, you may have a resource conflict. Before using insmod mwave, check /proc/ioports and /proc/interrupts to make sure the resources you intend to claim are not already in use. ----------------------------------------------------------------------------- 3.5. PPP Errors Using 2.4.0 Version of the Kernel When upgrading to the 2.4.0 version of the kernel be sure to read the ./ Documentation/Changes file. Kernel 2.4.0 requires an upgraded version of the pppd, gcc, and modutils (amoung other things). Follow the instructions for setting up the new pppd daemon carefully. You may experience some initial problems getting ppp running with 2.4.0. One of the most prevelant errors we received was, "Can't locate module tty-ldisc-3." However, we had no problems once we rebuilt the kernel with the following options: CONFIG_PPP=y CONFIG_PPP_ASYNC=m CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOM=m ----------------------------------------------------------------------------- 4. Debugging Tips 4.1. Error Logs Errors encountered by the ACP Modem device driver or application are logged using the syslog utility. ----------------------------------------------------------------------------- 4.2. Tracing The ACP device driver supports a debug argument to enable the generation of trace information. The command for this debug is listed below. You can also access several of the variables listed below in the mwaved script. +---------------------------------------------------------------------------+ |insmod mwave mwave_debug=0x0f | +---------------------------------------------------------------------------+ Where the following debug trace information is selectable: 0x01 ACP Modem Device driver entry points 0x02 Systems Management API(SMAPI) 0x04 Hardware Interface (3780I) 0x08 Thinkpad Interface (tp3780i) Trace information is logged using the syslog utility. The ACP application supports tracing through the use of flags configured in the [STARTUP] section of the mwavem.conf file. Mwave Manager API trace points: MANAGER_API_TRACE=1 MANAGER_API_DATA_TRACE=1 MANAGER_CORE_TRACE=1 MANAGER_SPECIFIC_TRACE=1 MEIO Manager trace points: MEIO_API_TRACE=1 MEIO_CORE_TRACE=1 MEIO_SPECIFIC_TRACE=1 Mwave Modem application trace points: MWMLW32_TRACE=1 MWMPW32_TRACE=1 MWMUTIL_TRACE=1 MWWTT32_TRACE=1 Trace information is logged using the syslog utility. ----------------------------------------------------------------------------- 5. Test Claims This driver has been tested using the ThinkPad?? 600E. The same chipset is integrated on the 600 and 770 models and should work. ----------------------------------------------------------------------------- 6. List of Supported Countries The following countries are supported by the ACP Modem driver Table 1. List of Supported Countries +--------------------+-------------------+ |Country Name |Country Access Code| +--------------------+-------------------+ |ALGERIA |213 | +--------------------+-------------------+ |ANTIGUA_BARBUDA |102 | +--------------------+-------------------+ |ARGENTINA |54 | +--------------------+-------------------+ |ARMENIA |374 | +--------------------+-------------------+ |ARUBA |297 | +--------------------+-------------------+ |AUSTRALIA |61 | +--------------------+-------------------+ |AUSTRIA |43 | +--------------------+-------------------+ |AZERBAIJAN |994 | +--------------------+-------------------+ |BAHAMAS |103 | +--------------------+-------------------+ |BARBADOS |104 | +--------------------+-------------------+ |BELARUS |375 | +--------------------+-------------------+ |BELGIUM |32 | +--------------------+-------------------+ |BERMUDA |105 | +--------------------+-------------------+ |BOLIVIA |591 | +--------------------+-------------------+ |BRAZIL |55 | +--------------------+-------------------+ |BRUNEI |673 | +--------------------+-------------------+ |BULGARIA |359 | +--------------------+-------------------+ |CANADA |107 | +--------------------+-------------------+ |CAYMAN_ISLANDS |108 | +--------------------+-------------------+ |CHILE |38 | +--------------------+-------------------+ |COLOMBIA |57 | +--------------------+-------------------+ |COSTA_RICA |506 | +--------------------+-------------------+ |CUBA |53 | +--------------------+-------------------+ |CYPRUS |357 | +--------------------+-------------------+ |CZECHREPUBLIC |420 | +--------------------+-------------------+ |DENMARK |45 | +--------------------+-------------------+ |ECUADOR |593 | +--------------------+-------------------+ |EGYPT |20 | +--------------------+-------------------+ |EL_SALVADOR |503 | +--------------------+-------------------+ |FINLAND |358 | +--------------------+-------------------+ |FRANCE |33 | +--------------------+-------------------+ |GERMANY |49 | +--------------------+-------------------+ |GREECE |30 | +--------------------+-------------------+ |GRENADA |111 | +--------------------+-------------------+ |GUATEMALA |502 | +--------------------+-------------------+ |GUYANA |592 | +--------------------+-------------------+ |HONDURAS |504 | +--------------------+-------------------+ |HONG_KONG |852 | +--------------------+-------------------+ |HUNGARY |36 | +--------------------+-------------------+ |INDIA |91 | +--------------------+-------------------+ |INDONESIA |62 | +--------------------+-------------------+ |IRELAND |353 | +--------------------+-------------------+ |ISRAEL |972 | +--------------------+-------------------+ |ITALY |39 | +--------------------+-------------------+ |JAMAICA |112 | +--------------------+-------------------+ |JAPAN |81 | +--------------------+-------------------+ |JORDAN |962 | +--------------------+-------------------+ |KOREA |850 | +--------------------+-------------------+ |KOREA_SOUTH |82 | +--------------------+-------------------+ |KUWAIT |965 | +--------------------+-------------------+ |LUXEMBOURG |352 | +--------------------+-------------------+ |MALAYSIA |60 | +--------------------+-------------------+ |MEXICO |52 | +--------------------+-------------------+ |NETH_ANTILLES |599 | +--------------------+-------------------+ |NETHERLANDS |31 | +--------------------+-------------------+ |NEW_ZEALAND |64 | +--------------------+-------------------+ |NICARAGUA |505 | +--------------------+-------------------+ |NORWAY |47 | +--------------------+-------------------+ |OMAN |968 | +--------------------+-------------------+ |PAKISTAN |92 | +--------------------+-------------------+ |PANAMA |507 | +--------------------+-------------------+ |PARAGUAY |595 | +--------------------+-------------------+ |PERU |51 | +--------------------+-------------------+ |PHILIPPINES |63 | +--------------------+-------------------+ |POLAND |48 | +--------------------+-------------------+ |PORTUGAL |351 | +--------------------+-------------------+ |PRC |852 | +--------------------+-------------------+ |ROMANIA |40 | +--------------------+-------------------+ |RUSSIA |7 | +--------------------+-------------------+ |SAUDI_ARABIA |966 | +--------------------+-------------------+ |SINGAPORE |65 | +--------------------+-------------------+ |SLOVAKIA |421 | +--------------------+-------------------+ |SLOVENIA |386 | +--------------------+-------------------+ |SOUTH_AFRICA |27 | +--------------------+-------------------+ |SPAIN |34 | +--------------------+-------------------+ |ST_KITTS_NEVIS |115 | +--------------------+-------------------+ |ST_LUCIA |122 | +--------------------+-------------------+ |ST_VINCENT |116 | +--------------------+-------------------+ |SURINAME |597 | +--------------------+-------------------+ |SWEDEN |46 | +--------------------+-------------------+ |SWITZERLAND |41 | +--------------------+-------------------+ |TAIWAN |866 | +--------------------+-------------------+ |THAILAND |66 | +--------------------+-------------------+ |TRINIDAD_TOBAGO |117 | +--------------------+-------------------+ |TURKEY |90 | +--------------------+-------------------+ |TURKS_CAICOS |118 | +--------------------+-------------------+ |U_K |44 | +--------------------+-------------------+ |UKRAINE |380 | +--------------------+-------------------+ |UNITED_ARAB_EMIRATES|971 | +--------------------+-------------------+ |URUGUAY |598 | +--------------------+-------------------+ |USA |1 | +--------------------+-------------------+ |VENEZUELA |58 | +--------------------+-------------------+ |VIETNAM |84 | +--------------------+-------------------+ |VIRGIN_IS_BRITISH |106 | +--------------------+-------------------+ |VIRGIN_IS_USA |123 | +--------------------+-------------------+ |YEMAN |967 | +--------------------+-------------------+ |YUGOSLAVIA |381 | +--------------------+-------------------+ ----------------------------------------------------------------------------- 7. Trademarks Hayes is a trademark of Hayes Microcomputer Products, Inc. MNP (Microcom Network Protocol) is a trademark of Microcom, Inc. IBM is a trademark of International Business Machines, Inc. ADSL Bandwidth Management HOWTO Dan Singletary Revision History Revision 1.3 2003-04-07 Revised by: ds Added links section. Revision 1.2 2002-09-26 Revised by: ds Added link to new Email Discussion List. Added small teaser to caveat section regarding new and improved QoS for Linux designed specifically for ADSL to be released soon. Revision 1.1 2002-08-26 Revised by: ds A few corrections (Thanks to the many that pointed them out!). Added informational caveat to implementation section. Revision 1.0 2002-08-21 Revised by: ds Better control over bandwidth, more theory, updated for 2.4 kernels Revision 0.1 2001-08-06 Revised by: ds Initial publication This document describes how to configure a Linux router to more effectively manage outbound traffic on an ADSL modem or other device with similar bandwidth properties (cable modem, ISDN, etc). Emphasis is placed on lowering the latency for interactive traffic even when the upstream and/or downstream bandwidth is fully saturated. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. New Versions of This Document 1.2. Email Discussion List 1.3. Disclaimer 1.4. Copyright and License 1.5. Feedback and corrections 2. Background 2.1. Prerequisites 2.2. Layout 2.3. Packet Queues 3. How it Works 3.1. Throttling Outbound Traffic with Linux HTB 3.2. Priority Queuing with HTB 3.3. Classifying Outbound Packets with iptables 3.4. A few more tweaks... 3.5. Attempting to Throttle Inbound Traffic 4. Implementation 4.1. Caveats 4.2. Script: myshaper 5. Testing the New Queue 6. OK It Works!! Now What? 7. Related Links 1. Introduction The purpose of this document is to suggest a way to manage outbound traffic on an ADSL (or cable modem) connection to the Internet. The problem is that many ADSL lines are limited in the neighborhood of 128kbps for upstream data transfer. Aggravating this problem is the packet queue in the ADSL modem which can take 2 to 3 seconds to empty when full. Together this means that when the upstream bandwidth is fully saturated it can take up to 3 seconds for any other packets to get out to the Internet. This can cripple interactive applications such as telnet and multi-player games. ----------------------------------------------------------------------------- 1.1. New Versions of This Document You can always view the latest version of this document on the World Wide Web at the URL: [http://www.tldp.org] http://www.tldp.org. New versions of this document will also be uploaded to various Linux WWW and FTP sites, including the LDP home page at [http://www.tldp.org] http:// www.tldp.org. ----------------------------------------------------------------------------- 1.2. Email Discussion List For questions and update information regarding ADSL Bandwidth Management please subscribe to the ADSL Bandwidth Management email list at [http:// jared.sonicspike.net/mailman/listinfo/adsl-qos] http://jared.sonicspike.net/ mailman/listinfo/adsl-qos. ----------------------------------------------------------------------------- 1.3. Disclaimer Neither the author nor the distributors, or any other contributor of this HOWTO are in any way responsible for physical, financial, moral or any other type of damage incurred by following the suggestions in this text. ----------------------------------------------------------------------------- 1.4. Copyright and License This document is copyright 2002 by Dan Singletary, and is released under the terms of the GNU Free Documentation License, which is hereby incorporated by reference. ----------------------------------------------------------------------------- 1.5. Feedback and corrections If you have questions or comments about this document, please feel free to contact the author at [mailto:dvsing@sonicspike.net] dvsing@sonicspike.net. ----------------------------------------------------------------------------- 2. Background 2.1. Prerequisites The method outlined in this document should work in other Linux configurations however it remains untested in any configuration but the following:   * Red Hat Linux 7.3   * 2.4.18-5 Kernel with QoS Support fully enabled (modules OK) and including the following kernel patches (which may eventually be included in later kernels):   + HTB queue - [http://luxik.cdi.cz/~devik/qos/htb/] http://luxik.cdi.cz /~devik/qos/htb/ Note: it has been reported that kernels since version 2.4.18-3 shipped with Mandrake (8.1, 8.2) have already been patched for HTB.   + IMQ device - [http://luxik.cdi.cz/~patrick/imq/] http://luxik.cdi.cz/ ~patrick/imq/   * iptables v1.2.6a or later (version of iptables distributed with Red Hat 7.3 is missing the length module) +---------------------------------------------------------------------------+ | | | Note: Previous versions of this document specified a method of bandwidth | | control that involved patching the existing sch_prio queue. It was found | | later that this patch was entirely unnecessary. Regardless, the newer | | methods outlined in this document will give you better results (although | | at the writing of this document 2 kernel patches are now necessary. :) | | Happy patching.) | | | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.2. Layout In order to keep things simple, all references to network devices and configuration in this document will be with respect to the following network layout diagram: +----------------------------------------------------------------------------+ | <-- 128kbit/s -------------- <-- 10Mbit --> | | Internet <--------------------> | ADSL Modem | <-------------------- | | 1.5Mbit/s --> -------------- | | | | eth0 | | V | | ----------------- | | | | | | | Linux Router | | | | | | | ----------------- | | | .. | eth1..ethN | | | | | | V V | | | | Local Network | | | +----------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.3. Packet Queues Packet queues are buckets that hold data for a network device when it can't be immediately sent. Most packet queues use a FIFO (first in, first out) discipline unless they've been specially configured to do otherwise. What this means is that when the packet queue for a device is completely full, the packet most recently placed in the queue will be sent over the device only after all the other packets in the queue at that time have been sent. ----------------------------------------------------------------------------- 2.3.1. The Upstream With an ADSL modem, bandwidth is asymmetric with 1.5Mbit/s typical downstream and 128kbit/sec typical upstream. Although this is the line speed, the interface between the Linux Router PC and the ADSL modem is typically at or above 10Mbit/s. If the interface to the Local Network is also 10Mbit/s, there will typically be NO QUEUING at the router when packets are sent from the Local Network to the Internet. Packets are sent out eth0 as fast as they are received from the Local Network. Instead, packets are queued at the ADSL modem since they are arriving at 10Mbit/s and only being sent at 128kbit/s. Eventually the packet queue at the ADSL modem will become full and any more packets sent to it will be silently dropped. TCP is designed to handle this and will adjust it's transmit window size accordingly to take full advantage of the available bandwidth. While packet queues combined with TCP result in the most effective use of bandwidth, large FIFO queues can increase the latency for interactive traffic. Another type of queue that is somewhat like FIFO is an n-band priority queue. However, instead of having just one queue that packets line up in, the n-band priority queue has n FIFO queues which packets are placed in by their classification. Each queue has a priority and packets are always dequeued from the highest priority queue that contains packets. Using this discipline FTP packets can be placed in a lower priority queue than telnet packets so that even during an FTP upload, a single telnet packet will jump the queue and be sent immediately. This document has been revised to use a new queue in linux called the Hierarchical Token Bucket (HTB). The HTB queue is much like the n-band queue described above, but it has the capability to limit the rate of traffic in each class. In addition to this, it has the ability to set up classes of traffic beneath other classes creating a hierarchy of classes. Fully describing HTB is beyond the scope of this document, but more information can be found at [http://www.lartc.org] http://www.lartc.org ----------------------------------------------------------------------------- 2.3.2. The Downstream Traffic coming inbound on your ADSL modem is queued in much the same way as outbound traffic, however the queue resides at your ISP. Because of this, you probably don't have direct control of how packets are queued or which types of traffic get preferential treatment. The only way to keep your latency low here is to make sure that people don't send you data too fast. Unfortunately, there's no way to directly control the speed at which packets arrive, but since a majority of your traffic is most likely TCP, there are some ways to slow down the senders:   * Intentionally drop inbound packets - TCP is designed to take full advantage of the available bandwidth while also avoiding congestion of the link. This means that during a bulk data transfer TCP will send more and more data until eventually a packet is dropped. TCP detects this and reduces it's transmission window. This cycle continues throughout the transfer and assures data is moved as quickly as possible.   * Manipulate the advertised receive window - During a TCP transfer, the receiver sends back a continuous stream of acknowledgment (ACK) packets. Included in the ACK packets is a window size advertisement which states the maximum amount of unacknowledged data the receiver should send. By manipulating the window size of outbound ACK packets we can intentionally slow down the sender. At the moment there is no (free) implementation for this type of flow-control on Linux (however I may be working on one!). ----------------------------------------------------------------------------- 3. How it Works There are two basic steps to optimize upstream bandwidth. First we have to find a way to prevent the ADSL modem from queuing packets since we have no control over how it handles the queue. In order to do this we will throttle the amount of data the router sends out eth0 to be slightly less than the total upstream bandwidth of the ADSL modem. This will result in the router having to queue packets that arrive from the Local Network faster than it is allowed to send them. The second step is to set up priority queuing discipline on the router. We'll investigate a queue that can be configured to give priority to interactive traffic such as telnet and multi-player games. +---------------------------------------------------------------------------+ | | | By using the HTB queue we can accomplish bandwidth shaping and priority | | queuing at the same time while also assuring that no priority class is | | starved by another. Avoiding starvation wasn't possible using the method | | outlined in the 0.1 revision of this document. | | | +---------------------------------------------------------------------------+ The final step is to configure the firewall to prioritize packets by using fwmark. ----------------------------------------------------------------------------- 3.1. Throttling Outbound Traffic with Linux HTB Although the connection between the router and the modem is at 10Mbit/s, the modem is only able to send data at 128kbit/s. Any data sent in excess of that rate will be queued at the modem. Thus, a ping packet sent from the router may go to the modem immediately, but may take a few seconds to actually get sent out to the Internet if the queue in the modem has any packets in it. Unfortunately most ADSL modems provide no mechanism to specify how packets are dequeued or how large the queue is, so our first objective is to move the place where the outbound packets are queued to somewhere where we have more control over the queue. We'll do this by using the HTB queue to limit the rate at which we send packets to the ADSL modem. Even though our upstream bandwidth may be 128kbit/ s we'll have to limit the rate at which we send packets to be slightly below that. If we want to lower the latency we have to be SURE that not a single packet is ever queued at the modem. Through experimentation I have found that limiting the outbound traffic to about 90kbit/s gives me almost 95% of the bandwidth I could achieve without HTB rate control. With HTB enabled at this rate, we've prevented the ADSL modem from queuing packets. ----------------------------------------------------------------------------- 3.2. Priority Queuing with HTB +---------------------------------------------------------------------------+ | | | Note: previous claims in this section (originally named N-band priority | | queuing) were later found to be incorrect. It actually WAS possible to | | classify packets into the individual bands of the priority queue by only | | using the fwmark field, however it was poorly documented at the writing | | of version 0.1 of this document | | | +---------------------------------------------------------------------------+ At this point we still haven't realized any change in the performance. We've merely moved the FIFO queue from the ADSL modem to the router. In fact, with Linux configured to a default queue size of 100 packets we've probably made our problem worse at this point! But not for long... Each neighbor class in an HTB queue can be assigned a priority. By placing different types of traffic in different classes and then assigning these classes different priorities, we can control the order in which packets are dequeued and sent. HTB makes it possible to do this while still avoiding starvation of any one class, since we're able to specify a minimum guaranteed rate for each class. In addition to this, HTB allows for us to tell a class that it may use any unused bandwidth from other classes up to a certain ceiling. Once we have our classes set up, we set up filters to place traffic in classes. There are several ways to do this, but the method described in this document uses the familiar iptables/ipchains to mark packets with an fwmark. The filters place traffic into the classes of the HTB queue based on their fwmark. This way, we're able to set up matching rules in iptables to send certain types of traffic to certain classes. ----------------------------------------------------------------------------- 3.3. Classifying Outbound Packets with iptables +---------------------------------------------------------------------------+ | | | Note: originally this document used ipchains to classify packets. The | | newer iptables is now used. | | | +---------------------------------------------------------------------------+ The final step in configuring your router to give priority to interactive traffic is to set up the firewall to define how traffic should be classified. This is done by setting the packet's fwmark field. Without getting into too much detail, here is a simplified description of how outbound packets might be classified into 4 classes with the highest priority class being 0x00: 1. Mark ALL packets as 0x03. This places all packets, by default, into the lowest priority queue. 2. Mark ICMP packets as 0x00. We want ping to show the latency for the highest priority packets. 3. Mark all packets that have a destination port 1024 or less as 0x01. This gives priority to system services such as Telnet and SSH. FTP's control port will also fall into this range however FTP data transfer takes place on high ports and will remain in the 0x03 band. 4. Mark all packets that have a destination port of 25 (SMTP) as 0x03. If someone sends an email with a large attachment we don't want it to swamp interactive traffic. 5. Mark all packets that are going to a multi-player game server as 0x02. This will give gamers low latency but will keep them from swamping out the the system applications that require low latency. Mark any "small" packets as 0x02. Outbound ACK packets from inbound downloads should be sent promptly to assure efficient downloads. This is possible using the iptables length module. Obviously, this can be customized to fit your needs. ----------------------------------------------------------------------------- 3.4. A few more tweaks... There are two more things that you can do to improve your latency. First, you can set the Maximum Transmittable Unit (mtu) to be lower than the default of 1500 bytes. Lowering this number will lower the average time you have to wait to send a priority packet if there is already a full-sized low-priority packet being sent. Lowering this number will also slightly decrease your throughput because each packet contains at least 40 bytes worth of IP and TCP header information. The other thing you can do to improve latency even on your low-priority traffic is to lower your queue length from the default of 100, which on an ADSL line could take as much as 10 seconds to empty with a 1500 byte mtu. ----------------------------------------------------------------------------- 3.5. Attempting to Throttle Inbound Traffic By using the Intermediate Queuing Device (IMQ), we can run all incoming packets through a queue in the same way that we queue outbound packets. Packet priority is much simpler in this case. Since we can only (attempt to) control inbound TCP traffic, we'll put all non-TCP traffic in the 0x00 class, and all TCP traffic in the 0x01 class. We'll also place "small" TCP packets in the 0x00 class since these are most likely ACK packets for outbound data that has already been sent. We'll set up a standard FIFO queue on the 0x00 class, and we'll set up a Random Early Drop (RED) queue on the 0x01 class. RED is better than a FIFO (tail-drop) queue at controlling TCP because it will drop packets before the queue overflows in an attempt to slow down transfers that look like they're about to get out of control. We'll also rate-limit both classes to some maximum inbound rate which is less than your true inbound speed over the ADSL modem. ----------------------------------------------------------------------------- 3.5.1. Why Inbound Traffic Limiting isn't all That Good We want to limit our inbound traffic to avoid filling up the queue at the ISP, which can sometimes buffer as much as 5 seconds worth of data. The problem is that currently the only way to limit inbound TCP traffic is to drop perfectly good packets. These packets have already taking up some share of bandwidth on the ADSL modem only to be dropped by the Linux box in an effort to slow down future packets. These dropped packets will eventually be retransmitted consuming more bandwidth. When we limit traffic, we are limiting the rate of packets which we will accept into our network. Since the actual inbound data rate is somewhere above this because of the packets we drop, we'll actually have to limit our downstream to much lower than the actual rate of the ADSL modem in order to assure low latency. In practice I had to limit my 1.5mbit/s downstream ADSL to 700kbit/sec in order to keep the latency acceptable with 5 concurrent downloads. The more TCP sessions you have, the more bandwidth you'll waste with dropped packets, and the lower you'll have to set your limit rate. A much better way to control inbound TCP traffic would be TCP window manipulation, but as of this writing there exists no (free) implementation of it for Linux (that I know of...). ----------------------------------------------------------------------------- 4. Implementation Now with all of the explanation out of the way it's time to implement bandwidth management with Linux. ----------------------------------------------------------------------------- 4.1. Caveats Limiting the actual rate of data sent to the DSL modem is not as simple as it may seem. Most DSL modems are really just ethernet bridges that bridge data back and forth between your linux box and the gateway at your ISP. Most DSL modems use ATM as a link layer to send data. ATM sends data in cells that are always 53 bytes long. 5 of these bytes are header information, leaving 48 bytes available for data. Even if you are sending 1 byte of data, an entire 53 bytes of bandwidth are consumed sent since ATM cells are always 53 bytes long. This means that if you are sending a typical TCP ACK packet which consists of 0 bytes data + 20 bytes TCP header + 20 bytes IP header + 18 bytes Ethernet header. In actuality, even though the ethernet packet you are sending has only 40 bytes of payload (TCP and IP header), the minimum payload for an Ethernet packet is 46 bytes of data, so the remaining 6 bytes are padded with nulls. This means that the actual length of the Ethernet packet plus header is 18 + 46 = 64 bytes. In order to send 64 bytes over ATM, you have to send two ATM cells which consume 106 bytes of bandwidth. This means for every TCP ACK packet, you're wasting 42 bytes of bandwidth. This would be okay if Linux accounted for the encapsulation that the DSL modem uses, but instead, Linux only accounts the TCP header, IP header, and 14 bytes of the MAC address (Linux doesn't count the 4 bytes CRC since this is handled at the hardware level). Linux doesn't count the minimum Ethernet packet size of 46 bytes, nor does it take into account the fixed ATM cell size. What all of this means is that you'll have to limit your outbound bandwidth to somewhat less than your true capacity (until we can figure out a packet scheduler that can account for the various types of encapsulation being used). You may find that you've figured out a good number to limit your bandwidth to, but then you download a big file and the latency starts to shoot up over 3 seconds. This is most likely because the bandwidth those small ACK packets consume is being miscalculated by Linux. I have been working on a solution to this problem for a few months and have almost settled on a solution that I will soon release to the public for further testing. The solution involves using a user-space queue instead of linux's QoS to rate-limit packets. I've basically implemented a simple HTB queue using linux user-space queues. This solution (so far) has been able to regulate outbound traffic SO WELL that even during a massive bulk download (several streams) and bulk upload (gnutella, several streams) the latency PEAKS at 400ms over my nominal no-traffic latency of about 15ms. For more information on this QoS method, subscribe to the email list for updates or check back on updates to this HOWTO. ----------------------------------------------------------------------------- 4.2. Script: myshaper The following is a listing of the script which I use to control bandwidth on my Linux router. It uses several of the concepts covered in the document. Outbound traffic is placed into one of 7 queues depending on type. Inbound traffic is placed into two queues with TCP packets being dropped first (lowest priority) if the inbound data is over-rate. The rates given in this script seem to work OK for my setup but your results may vary. +---------------------------------------------------------------------------+ | | | This script was originally based on the ADSL WonderShaper as seen at the | | [http://www.lartc.org] LARTC website. | | | +---------------------------------------------------------------------------+ #!/bin/bash # # myshaper - DSL/Cable modem outbound traffic shaper and prioritizer. # Based on the ADSL/Cable wondershaper (www.lartc.org) # # Written by Dan Singletary (8/7/02) # # NOTE!! - This script assumes your kernel has been patched with the # appropriate HTB queue and IMQ patches available here: # (subnote: future kernels may not require patching) # # http://luxik.cdi.cz/~devik/qos/htb/ # http://luxik.cdi.cz/~patrick/imq/ # # Configuration options for myshaper: # DEV - set to ethX that connects to DSL/Cable Modem # RATEUP - set this to slightly lower than your # outbound bandwidth on the DSL/Cable Modem. # I have a 1500/128 DSL line and setting # RATEUP=90 works well for my 128kbps upstream. # However, your mileage may vary. # RATEDN - set this to slightly lower than your # inbound bandwidth on the DSL/Cable Modem. # # # Theory on using imq to "shape" inbound traffic: # # It's impossible to directly limit the rate of data that will # be sent to you by other hosts on the internet. In order to shape # the inbound traffic rate, we have to rely on the congestion avoidance # algorithms in TCP. Because of this, WE CAN ONLY ATTEMPT TO SHAPE # INBOUND TRAFFIC ON TCP CONNECTIONS. This means that any traffic that # is not tcp should be placed in the high-prio class, since dropping # a non-tcp packet will most likely result in a retransmit which will # do nothing but unnecessarily consume bandwidth. # We attempt to shape inbound TCP traffic by dropping tcp packets # when they overflow the HTB queue which will only pass them on at # a certain rate (RATEDN) which is slightly lower than the actual # capability of the inbound device. By dropping TCP packets that # are over-rate, we are simulating the same packets getting dropped # due to a queue-overflow on our ISP's side. The advantage of this # is that our ISP's queue will never fill because TCP will slow it's # transmission rate in response to the dropped packets in the assumption # that it has filled the ISP's queue, when in reality it has not. # The advantage of using a priority-based queuing discipline is # that we can specifically choose NOT to drop certain types of packets # that we place in the higher priority buckets (ssh, telnet, etc). This # is because packets will always be dequeued from the lowest priority class # with the stipulation that packets will still be dequeued from every # class fairly at a minimum rate (in this script, each bucket will deliver # at least it's fair share of 1/7 of the bandwidth). # # Reiterating main points: # * Dropping a tcp packet on a connection will lead to a slower rate # of reception for that connection due to the congestion avoidance algorithm. # * We gain nothing from dropping non-TCP packets. In fact, if they # were important they would probably be retransmitted anyways so we want to # try to never drop these packets. This means that saturated TCP connections # will not negatively effect protocols that don't have a built-in retransmit like TCP. # * Slowing down incoming TCP connections such that the total inbound rate is less # than the true capability of the device (ADSL/Cable Modem) SHOULD result in little # to no packets being queued on the ISP's side (DSLAM, cable concentrator, etc). Since # these ISP queues have been observed to queue 4 seconds of data at 1500Kbps or 6 megabits # of data, having no packets queued there will mean lower latency. # # Caveats (questions posed before testing): # * Will limiting inbound traffic in this fashion result in poor bulk TCP performance? # - Preliminary answer is no! Seems that by prioritizing ACK packets (small <64b) # we maximize throughput by not wasting bandwidth on retransmitted packets # that we already have. # # NOTE: The following configuration works well for my # setup: 1.5M/128K ADSL via Pacific Bell Internet (SBC Global Services) DEV=eth0 RATEUP=90 RATEDN=700 # Note that this is significantly lower than the capacity of 1500. # Because of this, you may not want to bother limiting inbound traffic # until a better implementation such as TCP window manipulation can be used. # # End Configuration Options # if [ "$1" = "status" ] then echo "[qdisc]" tc -s qdisc show dev $DEV tc -s qdisc show dev imq0 echo "[class]" tc -s class show dev $DEV tc -s class show dev imq0 echo "[filter]" tc -s filter show dev $DEV tc -s filter show dev imq0 echo "[iptables]" iptables -t mangle -L MYSHAPER-OUT -v -x 2> /dev/null iptables -t mangle -L MYSHAPER-IN -v -x 2> /dev/null exit fi # Reset everything to a known state (cleared) tc qdisc del dev $DEV root 2> /dev/null > /dev/null tc qdisc del dev imq0 root 2> /dev/null > /dev/null iptables -t mangle -D POSTROUTING -o $DEV -j MYSHAPER-OUT 2> /dev/null > /dev/null iptables -t mangle -F MYSHAPER-OUT 2> /dev/null > /dev/null iptables -t mangle -X MYSHAPER-OUT 2> /dev/null > /dev/null iptables -t mangle -D PREROUTING -i $DEV -j MYSHAPER-IN 2> /dev/null > /dev/null iptables -t mangle -F MYSHAPER-IN 2> /dev/null > /dev/null iptables -t mangle -X MYSHAPER-IN 2> /dev/null > /dev/null ip link set imq0 down 2> /dev/null > /dev/null rmmod imq 2> /dev/null > /dev/null if [ "$1" = "stop" ] then echo "Shaping removed on $DEV." exit fi ########################################################### # # Outbound Shaping (limits total bandwidth to RATEUP) # set queue size to give latency of about 2 seconds on low-prio packets ip link set dev $DEV qlen 30 # changes mtu on the outbound device. Lowering the mtu will result # in lower latency but will also cause slightly lower throughput due # to IP and TCP protocol overhead. ip link set dev $DEV mtu 1000 # add HTB root qdisc tc qdisc add dev $DEV root handle 1: htb default 26 # add main rate limit classes tc class add dev $DEV parent 1: classid 1:1 htb rate ${RATEUP}kbit # add leaf classes - We grant each class at LEAST it's "fair share" of bandwidth. # this way no class will ever be starved by another class. Each # class is also permitted to consume all of the available bandwidth # if no other classes are in use. tc class add dev $DEV parent 1:1 classid 1:20 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 0 tc class add dev $DEV parent 1:1 classid 1:21 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 1 tc class add dev $DEV parent 1:1 classid 1:22 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 2 tc class add dev $DEV parent 1:1 classid 1:23 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 3 tc class add dev $DEV parent 1:1 classid 1:24 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 4 tc class add dev $DEV parent 1:1 classid 1:25 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 5 tc class add dev $DEV parent 1:1 classid 1:26 htb rate $[$RATEUP/7]kbit ceil ${RATEUP}kbit prio 6 # attach qdisc to leaf classes - here we at SFQ to each priority class. SFQ insures that # within each class connections will be treated (almost) fairly. tc qdisc add dev $DEV parent 1:20 handle 20: sfq perturb 10 tc qdisc add dev $DEV parent 1:21 handle 21: sfq perturb 10 tc qdisc add dev $DEV parent 1:22 handle 22: sfq perturb 10 tc qdisc add dev $DEV parent 1:23 handle 23: sfq perturb 10 tc qdisc add dev $DEV parent 1:24 handle 24: sfq perturb 10 tc qdisc add dev $DEV parent 1:25 handle 25: sfq perturb 10 tc qdisc add dev $DEV parent 1:26 handle 26: sfq perturb 10 # filter traffic into classes by fwmark - here we direct traffic into priority class according to # the fwmark set on the packet (we set fwmark with iptables # later). Note that above we've set the default priority # class to 1:26 so unmarked packets (or packets marked with # unfamiliar IDs) will be defaulted to the lowest priority # class. tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 20 fw flowid 1:20 tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 21 fw flowid 1:21 tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 22 fw flowid 1:22 tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 23 fw flowid 1:23 tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 24 fw flowid 1:24 tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 25 fw flowid 1:25 tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 26 fw flowid 1:26 # add MYSHAPER-OUT chain to the mangle table in iptables - this sets up the table we'll use # to filter and mark packets. iptables -t mangle -N MYSHAPER-OUT iptables -t mangle -I POSTROUTING -o $DEV -j MYSHAPER-OUT # add fwmark entries to classify different types of traffic - Set fwmark from 20-26 according to # desired class. 20 is highest prio. iptables -t mangle -A MYSHAPER-OUT -p tcp --sport 0:1024 -j MARK --set-mark 23 # Default for low port traffic iptables -t mangle -A MYSHAPER-OUT -p tcp --dport 0:1024 -j MARK --set-mark 23 # "" iptables -t mangle -A MYSHAPER-OUT -p tcp --dport 20 -j MARK --set-mark 26 # ftp-data port, low prio iptables -t mangle -A MYSHAPER-OUT -p tcp --dport 5190 -j MARK --set-mark 23 # aol instant messenger iptables -t mangle -A MYSHAPER-OUT -p icmp -j MARK --set-mark 20 # ICMP (ping) - high prio, impress friends iptables -t mangle -A MYSHAPER-OUT -p udp -j MARK --set-mark 21 # DNS name resolution (small packets) iptables -t mangle -A MYSHAPER-OUT -p tcp --dport ssh -j MARK --set-mark 22 # secure shell iptables -t mangle -A MYSHAPER-OUT -p tcp --sport ssh -j MARK --set-mark 22 # secure shell iptables -t mangle -A MYSHAPER-OUT -p tcp --dport telnet -j MARK --set-mark 22 # telnet (ew...) iptables -t mangle -A MYSHAPER-OUT -p tcp --sport telnet -j MARK --set-mark 22 # telnet (ew...) iptables -t mangle -A MYSHAPER-OUT -p ipv6-crypt -j MARK --set-mark 24 # IPSec - we don't know what the payload is though... iptables -t mangle -A MYSHAPER-OUT -p tcp --sport http -j MARK --set-mark 25 # Local web server iptables -t mangle -A MYSHAPER-OUT -p tcp -m length --length :64 -j MARK --set-mark 21 # small packets (probably just ACKs) iptables -t mangle -A MYSHAPER-OUT -m mark --mark 0 -j MARK --set-mark 26 # redundant- mark any unmarked packets as 26 (low prio) # Done with outbound shaping # #################################################### echo "Outbound shaping added to $DEV. Rate: ${RATEUP}Kbit/sec." # uncomment following line if you only want upstream shaping. # exit #################################################### # # Inbound Shaping (limits total bandwidth to RATEDN) # make sure imq module is loaded modprobe imq numdevs=1 ip link set imq0 up # add qdisc - default low-prio class 1:21 tc qdisc add dev imq0 handle 1: root htb default 21 # add main rate limit classes tc class add dev imq0 parent 1: classid 1:1 htb rate ${RATEDN}kbit # add leaf classes - TCP traffic in 21, non TCP traffic in 20 # tc class add dev imq0 parent 1:1 classid 1:20 htb rate $[$RATEDN/2]kbit ceil ${RATEDN}kbit prio 0 tc class add dev imq0 parent 1:1 classid 1:21 htb rate $[$RATEDN/2]kbit ceil ${RATEDN}kbit prio 1 # attach qdisc to leaf classes - here we at SFQ to each priority class. SFQ insures that # within each class connections will be treated (almost) fairly. tc qdisc add dev imq0 parent 1:20 handle 20: sfq perturb 10 tc qdisc add dev imq0 parent 1:21 handle 21: red limit 1000000 min 5000 max 100000 avpkt 1000 burst 50 # filter traffic into classes by fwmark - here we direct traffic into priority class according to # the fwmark set on the packet (we set fwmark with iptables # later). Note that above we've set the default priority # class to 1:26 so unmarked packets (or packets marked with # unfamiliar IDs) will be defaulted to the lowest priority # class. tc filter add dev imq0 parent 1:0 prio 0 protocol ip handle 20 fw flowid 1:20 tc filter add dev imq0 parent 1:0 prio 0 protocol ip handle 21 fw flowid 1:21 # add MYSHAPER-IN chain to the mangle table in iptables - this sets up the table we'll use # to filter and mark packets. iptables -t mangle -N MYSHAPER-IN iptables -t mangle -I PREROUTING -i $DEV -j MYSHAPER-IN # add fwmark entries to classify different types of traffic - Set fwmark from 20-26 according to # desired class. 20 is highest prio. iptables -t mangle -A MYSHAPER-IN -p ! tcp -j MARK --set-mark 20 # Set non-tcp packets to highest priority iptables -t mangle -A MYSHAPER-IN -p tcp -m length --length :64 -j MARK --set-mark 20 # short TCP packets are probably ACKs iptables -t mangle -A MYSHAPER-IN -p tcp --dport ssh -j MARK --set-mark 20 # secure shell iptables -t mangle -A MYSHAPER-IN -p tcp --sport ssh -j MARK --set-mark 20 # secure shell iptables -t mangle -A MYSHAPER-IN -p tcp --dport telnet -j MARK --set-mark 20 # telnet (ew...) iptables -t mangle -A MYSHAPER-IN -p tcp --sport telnet -j MARK --set-mark 20 # telnet (ew...) iptables -t mangle -A MYSHAPER-IN -m mark --mark 0 -j MARK --set-mark 21 # redundant- mark any unmarked packets as 26 (low prio) # finally, instruct these packets to go through the imq0 we set up above iptables -t mangle -A MYSHAPER-IN -j IMQ # Done with inbound shaping # #################################################### echo "Inbound shaping added to $DEV. Rate: ${RATEDN}Kbit/sec." ----------------------------------------------------------------------------- 5. Testing the New Queue The easiest way to test your new setup is to saturate the upstream with low-priority traffic. This depends how you have your priorities set up. For the sake of example, let's say you've placed telnet traffic and ping traffic at a higher priority (lower fwmark) than other high ports (that are used for FTP transfers, etc). If you initiate an FTP upload to saturate upstream bandwidth, you should only notice your ping times to the gateway (on the other side of the DSL line) increasing by a small amount compared to what it would increase to with no priority queuing. Ping times under 100ms are typical depending on how you've got things set up. Ping times greater than one or two seconds probably mean that things aren't working right. ----------------------------------------------------------------------------- 6. OK It Works!! Now What? Now that you've successfully started to manage your bandwidth, you should start thinking of ways to use it. After all, you're probably paying for it!   * Use a Gnutella client and SHARE YOUR FILES without adversely affecting your network performance   * Run a web server without having web page hits slow you down in Quake ----------------------------------------------------------------------------- 7. Related Links   * Bandwidth Controller for Windows - [http://www.bandwidthcontroller.com] http://www.bandwidthcontroller.com   * [http://www.sonicspike.net/software#dsl_qos_queue] dsl_qos_queue - (beta) for Linux. No kernel patching, and better performance - Linux ADSM Mini-Howto by Thomas König, Thomas.Koenig@ciw.uni-karlsruhe.de v, 15 January 1997 This document describes how to install and use a client for the com­ mercial ADSM backup system for Linux/i386. ______________________________________________________________________ Table of Contents 1. Introduction 2. Installing the iBCS module 3. Installing the ADSM client 4. Running the client 5. Known Problems ______________________________________________________________________ 1. Introduction ADSM is a network-based backup system, sold by IBM, in use at many organizations. There are clients for a large variety of systems (different UNIX brands, Windows, Novell, Mac, Windows NT). Unfortunately, at the time of this writing, there is no native Linux version. You will have to use the SCO binary, and install the iBCS2-emulator for running ADSM. This description is for ADSM v2r1. At the time if this writing, I am only aware of a version which works with the i386 version of Linux. 2. Installing the iBCS module The iBCS2 module is available from ftp://tsx-11.mit.edu/pub/linux/BETA/ibcs2. If you are running kernel version 1.2.13, get ibcs-1.2-950721.tar.gz, unpac it and apply the patches ibcs-1.2-950808.patch1 and ibcs-1.2-950828.patch2. You can then type "make" and install the iBCS modlue with "insmod". For a 2.0 kernel version, get ibcs-2.0-960610.tar.gz, unpack it in a suitable place, chdir into that directory, and apply the following patch: --- iBCSemul/ipc.c.orig Wed Jan 15 21:32:15 1997 +++ iBCSemul/ipc.c Wed Jan 15 21:32:31 1997 @@ -212,7 +212,7 @@ switch (command) { case U_SEMCTL: cmd = ibcs_sem_trans(arg3); - arg4 = (union semun *)get_syscall_parameter (regs, 4); + arg4 = (union semun *)(((unsigned long *) regs->esp) + (5)); is_p = (struct ibcs_semid_ds *)get_fs_long(arg4->buf); #ifdef IBCS_TRACE if ((ibcs_trace & TRACE_API) || ibcs_func_p->trace) Then, copy CONFIG.i386 to CONFIG, and type make. If you don't have them already, create the needed device files by executing # cd /dev # ln -s null XOR # ln -s null X0R # mknod socksys c 30 0 # mknod spx c 30 1 3. Installing the ADSM client The SCO binary is supplied as three tar files, or disks. Change to the root directory, set your umask according to your policies, and unpack them from there (as root). In your Directory /tmp, you will find an installation script; execute that. You will then have to hand-edit /usr/adsm/dsm.sys and /usr/adsm/dsm.opt. In dsm.sys, important lines to specify are: Servername The name of the server TCPServeraddress The fully qualified host name of the server NODename Your own hostname In dsm.opt, you will have to specify Server As before Followsymbolic Wether or not to follow symbolic links (not a good idea, in general) SUbdir Wether to back up subdirectories (you usually want that) domain The file systems to back up You will then have to create a SCO-compatible /etc/mnttab from your /etc/fstab. You can use the following Perl script, fstab2mnttab, for this. ______________________________________________________________________ #!/usr/bin/perl $mnttab_struct = "a32 a32 I L"; open(MTAB, "/etc/mtab") || die "Cannot open /etc/mtab: $!\n"; open(MNTTAB, ">/etc/mnttab") || die "Cannot open /etc/mnttab: $!\n"; while() { next if /pid/; chop; /^(\S*)\s(\S*)\s(\S*)\s.*$/; $device = $1; $mountpt = $2; $fstype = $3; if($fstype ne "nfs" && $fstype ne "proc") { $mnttab_rec = pack($mnttab_struct, $device, $mountpt, 0x9d2f, time()); syswrite(MNTTAB, $mnttab_rec, 72); print "Made entry for: $device $mountpt $fstype\n"; } } close(MNTTAB); exit 0; ______________________________________________________________________ You do not need to install any shared libraries for these clients; everything is linked statically. 4. Running the client There are two clients, dsm, which is an X11 interface, and dsmc, a command-line interface. Your computer centre will tell you how to run it. Some startup script at boot, for example dsmc schedule -quiet 2>&1 >/dev/null & will probably be required. 5. Known Problems Unfortunately, SCO can only deal with hostnames no longer than eight characters. If your hostname is longer, or fully qualified, you may need to specify your hostname on the NODename line in /usr/adsm/dsm.sys. If you use the DISPLAY variable, you will have to supply the fully qualified host name (i.e. DISPLAY=host.full.do.main:0 instead of DISPLAY=host:0). Linux Advocacy mini-HOWTO Paul L. Rogers, Paul.L.Rogers@li.org v0.5c, 3 May 2000 This document provides suggestions for how the Linux community can effectively advocate the use of Linux. ______________________________________________________________________ Table of Contents 1. About this document 2. Copyright Information 3. Introduction 4. Related Information 5. Advocating Linux 6. Canons of Conduct 7. User Groups 8. Vendor Relations 9. Media Relations 10. Acknowledgements ______________________________________________________________________ 1. About this document This is the Linux Advocacy mini-HOWTO and is intended to provide guidelines and ideas to assist with your Linux advocacy efforts. This mini-HOWTO was inspired by Jon ``maddog'' Hall when he responded to a request for feedback on guidelines for advocating Linux during NetDay activities. He responded positively to the guidelines and observed that they were the basis of a list of ``canons of conduct'' that would benefit the Linux community. This document is available in HTML form at http://www.datasync.com/~rogerspl/Advocacy-HOWTO.html. Nat Makarevitch has translated this document into French < http://www.linux- france.org/article/these/advocacy/Advocacy-HOWTO-fr.html>. Chie Nakatani has translated this document into Japanese . Janusz Batko has translated this document into Polish . Bruno H. Collovini has translated this document into Portuguese . Mauricio Rivera Pineda has translated this document into Spanish . The author and maintainer of the Linux Advocacy mini-HOWTO is Paul L. Rogers . Comments and proposed additions are welcome. If you need to know more about the Linux Documentation Project or about Linux HOWTO's, feel free to contact the supervisor Tim Bynum . Tim Bynum will post this document to several national and international newsgroups on a monthly basis. A personal note: Due to various circumstances, I have not been able to dedicate as much time to maintaining this mini-HOWTO and interacting with the Linux community as I would have desired. I apologize for this and if you have attempted to contact me and I was slow in responding, please forgive me being so inconsiderate. While I still have many other commitments, I am anticipating that they will start requiring less time to meet and allow me to catch up on other parts of my life. I appreciate your patience and would like to extend a special thanks to all who have taken the time to suggest additions and corrections. 2. Copyright Information This mini-HOWTO is Copyright © 1996-2000 by Paul L. Rogers. All rights reserved. A verbatim copy may be reproduced or distributed in any medium physical or electronic without permission of the author. Translations are similarly permitted without express permission if it includes a notice on who translated it. Short quotes may be used without prior consent by the author. Derivative work and partial distributions of the Advocacy mini-HOWTO must be accompanied with either a verbatim copy of this file or a pointer to the verbatim copy. Commercial redistribution is allowed and encouraged; however, the author would like to be notified of any such distributions. In short, we wish to promote dissemination of this information through as many channels as possible. However, we do wish to retain copyright on the HOWTO documents, and would like to be notified of any plans to redistribute the HOWTOs. We further want that all information provided in the HOWTOs is disseminated. If you have questions, please contact Tim Bynum, the Linux HOWTO coordinator, at linux-howto@sunsite.unc.edu. 3. Introduction The Linux community has known for some time that for many applications, Linux is a stable, reliable, robust (although not perfect) product. Unfortunately, there are still many people, including key decision-makers, that are not aware of the existence of Linux and its capabilities. If Linux and the many other components that make up a Linux distribution are to reach their full potential, it is critical that we reach out to prospective ``customers'' and advocate (being careful not to promise too much) the use of Linux for appropriate applications. The reason that many a company's products have done well in the marketplace is not so much due to the product's technical superiority but the company's marketing abilities. If you enjoy using Linux and would like to contribute something to the Linux community, please consider acting on one or more of the ideas in this mini-HOWTO and help others learn more about Linux. 4. Related Information Lars Wirzenius, former comp.os.linux.announce moderator and long-time Linux activist, also has some thoughts about Linux advocacy. Eric S. Raymond provides an analysis of why the development model used by the Linux community has been so successful. The free software community has recognized that the terms "free software" and "freely available software" are not appropriate in all contexts. For more information about using the term "open-source software" when marketing "free software", please visit the Open Source site. If you need to brush up on your Linux sales techniques, take a look at the Linuxmanship essay by Donald B. Marti, Jr. The Linux PR site discusses the importance of press releases to the Linux community. Another way to gain valuable experience in this area is to organize a NetDay at a local school using the guidelines presented in the NetDay How-To Guide . Linux International's goal is to promote the development and use of Linux. The Linux Documentation Project is an invaluable resource for Linux advocates. The Linux Center Project provides a thematical index of resources about Linux and free software. The Linux Business Applications site provides a forum for organizations that depend on Linux for day- to-day business operations to share their experiences. Linux Enterprise Computing and Freely Redistributable Software in Business cover resources and topics of interest to those deploying Linux in a business/commercial/enterprise setting. The Linux Advocacy Project's goal is to encourage commercial application developers to provide native Linux versions of their software. The Linux CD and Support Giveaway program is helping make Linux more widely available by encouraging the reuse of Linux CD-ROMs. Specialized Systems Consultants, Inc. (SSC) hosts the Linux Resources site and publishes the Linux Journal . The linux-biz mailing list is a forum created to discuss the use of Linux in a business environment. The Linux Mission Critical Systems survey documents successful existing systems which have a large load and are up 24 hours per day. A number of online publications are now devoted to covering Linux. These include: · LinuxFocus · Linuxove noviny · Linux Gazette · PLUTO Journal . Additional links to online publications can be found at the Linux Documentation Project and the Linux Center Project . 5. Advocating Linux · Share your personal experiences (good and bad) with Linux. Everyone knows that software has bugs and limitations and if we only have glowing comments about Linux, we aren't being honest. I love to tell people about having to reboot four times (three scheduled) in three years. · If someone has a problem that Linux may be able to solve, offer to provide pointers to appropriate information (Web pages, magazine articles, books, consultants, ...). If you haven't actually used the proposed solution, say so. · If you are available for making presentations about Linux, register with the Linux Speakers Bureau . · Offer to help someone start using Linux. Follow up to make sure that they are able to use their system effectively. · Some people still believe that Linux and similar systems operate only in text-mode. Make sure that they are aware of the availability of graphical applications, such as the Gimp . · Try to respond to one ``newbie'' posting each week. Seek out the tough questions, you may be the only one to respond and you may learn something in the process. However, if you aren't confident that you can respond with the correct answer, find someone that can. · Seek out small software development firms and offer to make a presentation about Linux. · If the opportunity arises, make a presentation to your employer's Information Technology group. · Participate in community events such as NetDay . While your first priority must be to contribute to the success of the event, use the opportunity to let others know what Linux can do for them. · Always consider the viewpoints of the person to whom you are ``selling'' Linux. Support, reliability, interoperability and cost are all factors that a decision-maker must consider. Of the above, cost is often the least important portion of the equation. · Availability of support is often mentioned as a concern when considering the adoption of Linux. Companies such as Caldera , Cygnus Solutions , Red Hat , and S.u.S.E. offer support for some or all components of a typical Linux distribution. In addition, the Linux Consultants HOWTO provides a listing of companies providing commercial Linux related support. Of course, some of the best support is found in the comp.os.linux and linux newsgroup hierarchies. · Point out that the production of open-source software takes place in an environment of open collaboration between system architects, programmers, writers, alpha/beta testers and end users which often results in well documented, robust products such as Apache , GNU Emacs , Perl and the Linux kernel . · Stand up and be counted! Register with the Linux Counter . · Report successful efforts of promoting Linux to Linux International (li@li.org) and similar organizations. · Find a new home for Linux CD-ROMs and books that you no longer need. Give them to someone interested in Linux, a public library or a school computer club. A book and its CD-ROM would be most appropriate for a library. However, please be sure that making the CD-ROM publicly available does not violate a licensing agreement or copyright. Also, inform the library staff that the material on the CD-ROM is freely distributable. Follow up to make sure it is available on the shelves. · When purchasing books about software distributed with Linux, give preference to books written by the author of the software. The royalties that authors receive from book sales may be the only monetary compensation received for their efforts. <-- Need to fix or change the Powered by Linux text --> · Encourage Linux-based sites to submit their entry for the Powered by Linux page and suggest that banners promoting Linux , Apache , GNU , Perl ... be displayed on their site. · Participate! If you have benefited from open-source software , please consider assisting the free software community by: · submitting detailed bug reports · writing documentation · creating artwork · supplying management skills · suggesting enhancements · providing technical support · contributing software · donating equipment · furnishing financial support. The Linux Documentation Project provides a list of Linux and Linux- related projects. · Finally, keep in mind that we all have infinitely more important issues to deal with than the selection of a computing environment. 6. Canons of Conduct · As a representative of the Linux community, participate in mailing list and newsgroup discussions in a professional manner. Refrain from name-calling and use of vulgar language. Consider yourself a member of a virtual corporation with Mr. Torvalds as your Chief Executive Officer. Your words will either enhance or degrade the image the reader has of the Linux community. · Avoid hyperbole and unsubstantiated claims at all costs. It's unprofessional and will result in unproductive discussions. · A thoughtful, well-reasoned response to a posting will not only provide insight for your readers, but will also increase their respect for your knowledge and abilities. · Don't bite if offered flame-bait. Too many threads degenerate into a ``My O/S is better than your O/S'' argument. Let's accurately describe the capabilities of Linux and leave it at that. · Always remember that if you insult or are disrespectful to someone, their negative experience may be shared with many others. If you do offend someone, please try to make amends. · Focus on what Linux has to offer. There is no need to bash the competition. Linux is a good, solid product that stands on its own. · Respect the use of other operating systems. While Linux is a wonderful platform, it does not meet everyone's needs. · Refer to another product by its proper name. There's nothing to be gained by attempting to ridicule a company or its products by using ``creative spelling''. If we expect respect for Linux, we must respect other products. · Give credit where credit is due. Linux is just the kernel. Without the efforts of people involved with the GNU project , MIT, Berkeley and others too numerous to mention, the Linux kernel would not be very useful to most people. · Don't insist that Linux is the only answer for a particular application. Just as the Linux community cherishes the freedom that Linux provides them, Linux only solutions would deprive others of their freedom. · There will be cases where Linux is not the answer. Be the first to recognize this and offer another solution. 7. User Groups · Participate in a local user group. An index of Linux User Group registries is part of the Linux Documentation Project . If a user group does not exist in your area, start one. · The Linux User Group HOWTO covers many of the issues involved with starting an user group and discusses the importance of Linux advocacy as one of the goals of a user group. · Make speakers available to organizations interested in Linux. · Issue press releases about your activities to your local media. · Volunteer to configure a Linux system to meet the needs of local community organizations. Of course, the installation process must include training the user community to use the system and adequate documentation for ongoing maintenance. · Discus the Linux Advocacy mini-HOWTO at a meeting. Brainstorm and submit new ideas. 8. Vendor Relations · When contemplating a hardware purchase, ask the vendor about Linux support and other user's experiences with the product in a Linux environment. · Consider supporting vendors that sell Linux based products and services. Encourage them to have their product listed in the Linux Commercial HOWTO . · Support vendors that donate a portion of their income to organizations such as the Free Software Foundation , the Linux Development Grant Fund , the XFree86 Project or Software in the Public Interest . If possible, make a personal donation to these or other organizations that support open-source software . Don't forget that some employers offer a matching gift program program. · If you need an application that is not supported on Linux, contact the vendor and request a native Linux version. 9. Media Relations · Linux International is collecting press clippings that mention Linux, GNU and other freely redistributable software. When you see such an article, please send the following information to clippings@li.org: · Name of publication · Publisher's contact address · Name of author · Author's contact address · Title of article · Page number where the article starts · The URL if available online · A summary of the article, including your opinion · If you believe that Linux was not given fair treatment in an article, review or news story, send the details, including the above information, to li@li.org so that an appropriate response can be sent to the publisher. If you contact the publisher directly, be professional and sure of your facts. · If you involved with a Linux related project, issue press releases to appropriate news services on a regular basis. 10. Acknowledgements Grateful acknowledgement is made to all contributors, including: Kendall G. Clark Wendell Cochran Bruno H. Collovini Allan "Norm" Crain Jon "maddog" Hall Greg Hankins Eric Ladner Chie Nakatani Daniel P. Kionka Nat Makarevitch Martin Michlmayr Rafael Caetano dos Santos Idan Shoham Adam Spiers C. J. Suire Juhapekka Tolvanen Lars Wirzenius Sean Woolcock Linux Advanced Routing & Traffic Control HOWTO Bert Hubert Netherlabs BV Gregory Maxwell Remco van Mook Martijn van Oosterhout Paul B Schroeder Jasper Spaans Revision History Revision 1.1 2002-07-22 DocBook Edition A very hands-on approach to iproute2, traffic shaping and a bit of netfilter. ----------------------------------------------------------------------------- Table of Contents 1. Dedication 2. Introduction 2.1. Disclaimer & License 2.2. Prior knowledge 2.3. What Linux can do for you 2.4. Housekeeping notes 2.5. Access, CVS & submitting updates 2.6. Mailing list 2.7. Layout of this document 3. Introduction to iproute2 3.1. Why iproute2? 3.2. iproute2 tour 3.3. Prerequisites 3.4. Exploring your current configuration 3.5. ARP 4. Rules - routing policy database 4.1. Simple source policy routing 4.2. Routing for multiple uplinks/providers 5. GRE and other tunnels 5.1. A few general remarks about tunnels: 5.2. IP in IP tunneling 5.3. GRE tunneling 5.4. Userland tunnels 6. IPv6 tunneling with Cisco and/or 6bone 6.1. IPv6 Tunneling 7. IPsec: secure IP over the Internet 8. Multicast routing 9. Queueing Disciplines for Bandwidth Management 9.1. Queues and Queueing Disciplines explained 9.2. Simple, classless Queueing Disciplines 9.3. Advice for when to use which queue 9.4. Terminology 9.5. Classful Queueing Disciplines 9.6. Classifying packets with filters 9.7. The Intermediate queueing device (IMQ) 10. Load sharing over multiple interfaces 10.1. Caveats 10.2. Other possibilities 11. Netfilter & iproute - marking packets 12. Advanced filters for (re-)classifying packets 12.1. The u32 classifier 12.2. The route classifier 12.3. Policing filters 12.4. Hashing filters for very fast massive filtering 13. Kernel network parameters 13.1. Reverse Path Filtering 13.2. Obscure settings 14. Advanced & less common queueing disciplines 14.1. bfifo/pfifo 14.2. Clark-Shenker-Zhang algorithm (CSZ) 14.3. DSMARK 14.4. Ingress qdisc 14.5. Random Early Detection (RED) 14.6. Generic Random Early Detection 14.7. VC/ATM emulation 14.8. Weighted Round Robin (WRR) 15. Cookbook 15.1. Running multiple sites with different SLAs 15.2. Protecting your host from SYN floods 15.3. Rate limit ICMP to prevent dDoS 15.4. Prioritizing interactive traffic 15.5. Transparent web-caching using netfilter, iproute2, ipchains and squid 15.6. Circumventing Path MTU Discovery issues with per route MTU settings 15.7. Circumventing Path MTU Discovery issues with MSS Clamping (for ADSL, cable, PPPoE & PPtP users) 15.8. The Ultimate Traffic Conditioner: Low Latency, Fast Up & Downloads 15.9. Rate limiting a single host or netmask 16. Building bridges, and pseudo-bridges with Proxy ARP 16.1. State of bridging and iptables 16.2. Bridging and shaping 16.3. Pseudo-bridges with Proxy-ARP 17. Dynamic routing - OSPF and BGP 18. Other possibilities 19. Further reading 20. Acknowledgements ----------------------------------------------------------------------------- Chapter 1. Dedication This document is dedicated to lots of people, and is my attempt to do something back. To list but a few:   * Rusty Russell   * Alexey N. Kuznetsov   * The good folks from Google   * The staff of Casema Internet ----------------------------------------------------------------------------- Chapter 2. Introduction Welcome, gentle reader. This document hopes to enlighten you on how to do more with Linux 2.2/2.4 routing. Unbeknownst to most users, you already run tools which allow you to do spectacular things. Commands like route and ifconfig are actually very thin wrappers for the very powerful iproute2 infrastructure. I hope that this HOWTO will become as readable as the ones by Rusty Russell of (amongst other things) netfilter fame. You can always reach us by writing to the [mailto:HOWTO@ds9a.nl] HOWTO team. However, please consider posting to the mailing list (see the relevant section) if you have questions which are not directly related to this HOWTO. We are no free helpdesk, but we often will answer questions asked on the list. Before losing your way in this HOWTO, if all you want to do is simple traffic shaping, skip everything and head to the Other possibilities chapter, and read about CBQ.init. ----------------------------------------------------------------------------- 2.1. Disclaimer & License This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. In short, if your STM-64 backbone breaks down and distributes pornography to your most esteemed customers - it's never our fault. Sorry. Copyright (c) 2002 by bert hubert, Gregory Maxwell, Martijn van Oosterhout, Remco van Mook, Paul B. Schroeder and others. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/). Please freely copy and distribute (sell or give away) this document in any format. It's requested that corrections and/or comments be forwarded to the document maintainer. It is also requested that if you publish this HOWTO in hardcopy that you send the authors some samples for "review purposes" :-) ----------------------------------------------------------------------------- 2.2. Prior knowledge As the title implies, this is the "Advanced" HOWTO. While by no means rocket science, some prior knowledge is assumed. Here are some other references which might help teach you more: [http://netfilter.samba.org/unreliable-guides/networking-concepts-HOWTO/ index.html] Rusty Russell's networking-concepts-HOWTO Very nice introduction, explaining what a network is, and how it is connected to other networks. Linux Networking-HOWTO (Previously the Net-3 HOWTO) Great stuff, although very verbose. It teaches you a lot of stuff that's already configured if you are able to connect to the Internet. Should be located in /usr/doc/HOWTO/NET3-4-HOWTO.txt but can be also be found [http://www.linuxports.com/howto/networking] online. ----------------------------------------------------------------------------- 2.3. What Linux can do for you A small list of things that are possible:   * Throttle bandwidth for certain computers   * Throttle bandwidth TO certain computers   * Help you to fairly share your bandwidth   * Protect your network from DoS attacks   * Protect the Internet from your customers   * Multiplex several servers as one, for load balancing or enhanced availability   * Restrict access to your computers   * Limit access of your users to other hosts   * Do routing based on user id (yes!), MAC address, source IP address, port, type of service, time of day or content Currently, not many people are using these advanced features. This is for several reasons. While the provided documentation is verbose, it is not very hands-on. Traffic control is almost undocumented. ----------------------------------------------------------------------------- 2.4. Housekeeping notes There are several things which should be noted about this document. While I wrote most of it, I really don't want it to stay that way. I am a strong believer in Open Source, so I encourage you to send feedback, updates, patches etcetera. Do not hesitate to inform me of typos or plain old errors. If my English sounds somewhat wooden, please realize that I'm not a native speaker. Feel free to send suggestions. If you feel to you are better qualified to maintain a section, or think that you can author and maintain new sections, you are welcome to do so. The SGML of this HOWTO is available via CVS, I very much envision more people working on it. In aid of this, you will find lots of FIXME notices. Patches are always welcome! Wherever you find a FIXME, you should know that you are treading in unknown territory. This is not to say that there are no errors elsewhere, but be extra careful. If you have validated something, please let us know so we can remove the FIXME notice. About this HOWTO, I will take some liberties along the road. For example, I postulate a 10Mbit Internet connection, while I know full well that those are not very common. ----------------------------------------------------------------------------- 2.5. Access, CVS & submitting updates The canonical location for the HOWTO is [http://www.ds9a.nl/lartc] here. We now have anonymous CVS access available to the world at large. This is good in a number of ways. You can easily upgrade to newer versions of this HOWTO and submitting patches is no work at all. Furthermore, it allows the authors to work on the source independently, which is good too. +---------------------------------------------------------------------------+ |$ export CVSROOT=:pserver:anon@outpost.ds9a.nl:/var/cvsroot | |$ cvs login | |CVS password: [enter 'cvs' (without 's)] | |$ cvs co 2.4routing | |cvs server: Updating 2.4routing | |U 2.4routing/2.4routing.sgml | +---------------------------------------------------------------------------+ If you spot an error, or want to add something, just fix it locally, and run cvs diff -u, and send the result off to us. A Makefile is supplied which should help you create postscript, dvi, pdf, html and plain text. You may need to install docbook, docbook-utils, ghostscript and tetex to get all formats. ----------------------------------------------------------------------------- 2.6. Mailing list The authors receive an increasing amount of mail about this HOWTO. Because of the clear interest of the community, it has been decided to start a mailinglist where people can talk to each other about Advanced Routing and Traffic Control. You can subscribe to the list [http://mailman.ds9a.nl/ mailman/listinfo/lartc] here. It should be pointed out that the authors are very hesitant of answering questions not asked on the list. We would like the archive of the list to become some kind of knowledge base. If you have a question, please search the archive, and then post to the mailinglist. ----------------------------------------------------------------------------- 2.7. Layout of this document We will be doing interesting stuff almost immediately, which also means that there will initially be parts that are explained incompletely or are not perfect. Please gloss over these parts and assume that all will become clear. Routing and filtering are two distinct things. Filtering is documented very well by Rusty's HOWTOs, available here:   * [http://netfilter.samba.org/unreliable-guides/] Rusty's Remarkably Unreliable Guides We will be focusing mostly on what is possible by combining netfilter and iproute2. ----------------------------------------------------------------------------- Chapter 3. Introduction to iproute2 3.1. Why iproute2? Most Linux distributions, and most UNIX's, currently use the venerable arp, ifconfig and route commands. While these tools work, they show some unexpected behaviour under Linux 2.2 and up. For example, GRE tunnels are an integral part of routing these days, but require completely different tools. With iproute2, tunnels are an integral part of the tool set. The 2.2 and above Linux kernels include a completely redesigned network subsystem. This new networking code brings Linux performance and a feature set with little competition in the general OS arena. In fact, the new routing, filtering, and classifying code is more featureful than the one provided by many dedicated routers and firewalls and traffic shaping products. As new networking concepts have been invented, people have found ways to plaster them on top of the existing framework in existing OSes. This constant layering of cruft has lead to networking code that is filled with strange behaviour, much like most human languages. In the past, Linux emulated SunOS's handling of many of these things, which was not ideal. This new framework makes it possible to clearly express features previously beyond Linux's reach. ----------------------------------------------------------------------------- 3.2. iproute2 tour Linux has a sophisticated system for bandwidth provisioning called Traffic Control. This system supports various method for classifying, prioritizing, sharing, and limiting both inbound and outbound traffic. We'll start off with a tiny tour of iproute2 possibilities. ----------------------------------------------------------------------------- 3.3. Prerequisites You should make sure that you have the userland tools installed. This package is called 'iproute' on both RedHat and Debian, and may otherwise be found at ftp://ftp.inr.ac.ru/ip-routing/iproute2-2.2.4-now-ss??????.tar.gz". You can also try [ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz] here for the latest version. Some parts of iproute require you to have certain kernel options enabled. It should also be noted that all releases of RedHat up to and including 6.2 come without most of the traffic control features in the default kernel. RedHat 7.2 has everything in by default. Also make sure that you have netlink support, should you choose to roll your own kernel. Iproute2 needs it. ----------------------------------------------------------------------------- 3.4. Exploring your current configuration This may come as a surprise, but iproute2 is already configured! The current commands ifconfig and route are already using the advanced syscalls, but mostly with very default (ie. boring) settings. The ip tool is central, and we'll ask it to display our interfaces for us. ----------------------------------------------------------------------------- 3.4.1. ip shows us our links +-------------------------------------------------------------------------------+ |[ahu@home ahu]$ ip link list | |1: lo: mtu 3924 qdisc noqueue | | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 | |2: dummy: mtu 1500 qdisc noop | | link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff | |3: eth0: mtu 1400 qdisc pfifo_fast qlen 100 | | link/ether 48:54:e8:2a:47:16 brd ff:ff:ff:ff:ff:ff | |4: eth1: mtu 1500 qdisc pfifo_fast qlen 100 | | link/ether 00:e0:4c:39:24:78 brd ff:ff:ff:ff:ff:ff | |3764: ppp0: mtu 1492 qdisc pfifo_fast qlen 10 | | link/ppp | +-------------------------------------------------------------------------------+ Your mileage may vary, but this is what it shows on my NAT router at home. I'll only explain part of the output as not everything is directly relevant. We first see the loopback interface. While your computer may function somewhat without one, I'd advise against it. The MTU size (Maximum Transfer Unit) is 3924 octets, and it is not supposed to queue. Which makes sense because the loopback interface is a figment of your kernel's imagination. I'll skip the dummy interface for now, and it may not be present on your computer. Then there are my two physical network interfaces, one at the side of my cable modem, the other one serves my home ethernet segment. Furthermore, we see a ppp0 interface. Note the absence of IP addresses. iproute disconnects the concept of 'links' and 'IP addresses'. With IP aliasing, the concept of 'the' IP address had become quite irrelevant anyhow. It does show us the MAC addresses though, the hardware identifier of our ethernet interfaces. ----------------------------------------------------------------------------- 3.4.2. ip shows us our IP addresses +-------------------------------------------------------------------------------+ |[ahu@home ahu]$ ip address show | |1: lo: mtu 3924 qdisc noqueue | | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 | | inet 127.0.0.1/8 brd 127.255.255.255 scope host lo | |2: dummy: mtu 1500 qdisc noop | | link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff | |3: eth0: mtu 1400 qdisc pfifo_fast qlen 100 | | link/ether 48:54:e8:2a:47:16 brd ff:ff:ff:ff:ff:ff | | inet 10.0.0.1/8 brd 10.255.255.255 scope global eth0 | |4: eth1: mtu 1500 qdisc pfifo_fast qlen 100 | | link/ether 00:e0:4c:39:24:78 brd ff:ff:ff:ff:ff:ff | |3764: ppp0: mtu 1492 qdisc pfifo_fast qlen 10 | | link/ppp | | inet 212.64.94.251 peer 212.64.94.1/32 scope global ppp0 | +-------------------------------------------------------------------------------+ This contains more information. It shows all our addresses, and to which cards they belong. 'inet' stands for Internet (IPv4). There are lots of other address families, but these don't concern us right now. Let's examine eth0 somewhat closer. It says that it is related to the inet address '10.0.0.1/8'. What does this mean? The /8 stands for the number of bits that are in the Network Address. There are 32 bits, so we have 24 bits left that are part of our network. The first 8 bits of 10.0.0.1 correspond to 10.0.0.0, our Network Address, and our netmask is 255.0.0.0. The other bits are connected to this interface, so 10.250.3.13 is directly available on eth0, as is 10.0.0.1 for example. With ppp0, the same concept goes, though the numbers are different. Its address is 212.64.94.251, without a subnet mask. This means that we have a point-to-point connection and that every address, with the exception of 212.64.94.251, is remote. There is more information, however. It tells us that on the other side of the link there is, yet again, only one address, 212.64.94.1. The /32 tells us that there are no 'network bits'. It is absolutely vital that you grasp these concepts. Refer to the documentation mentioned at the beginning of this HOWTO if you have trouble. You may also note 'qdisc', which stands for Queueing Discipline. This will become vital later on. ----------------------------------------------------------------------------- 3.4.3. ip shows us our routes Well, we now know how to find 10.x.y.z addresses, and we are able to reach 212.64.94.1. This is not enough however, so we need instructions on how to reach the world. The Internet is available via our ppp connection, and it appears that 212.64.94.1 is willing to spread our packets around the world, and deliver results back to us. +---------------------------------------------------------------------------+ |[ahu@home ahu]$ ip route show | |212.64.94.1 dev ppp0 proto kernel scope link src 212.64.94.251 | |10.0.0.0/8 dev eth0 proto kernel scope link src 10.0.0.1 | |127.0.0.0/8 dev lo scope link | |default via 212.64.94.1 dev ppp0 | +---------------------------------------------------------------------------+ This is pretty much self explanatory. The first 4 lines of output explicitly state what was already implied by ip address show, the last line tells us that the rest of the world can be found via 212.64.94.1, our default gateway. We can see that it is a gateway because of the word via, which tells us that we need to send packets to 212.64.94.1, and that it will take care of things. For reference, this is what the old route utility shows us: +-----------------------------------------------------------------------------+ |[ahu@home ahu]$ route -n | |Kernel IP routing table | |Destination Gateway Genmask Flags Metric Ref Use | |Iface | |212.64.94.1 0.0.0.0 255.255.255.255 UH 0 0 0 ppp0 | |10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0 | |127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo | |0.0.0.0 212.64.94.1 0.0.0.0 UG 0 0 0 ppp0 | +-----------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 3.5. ARP ARP is the Address Resolution Protocol as described in [http://www.faqs.org/ rfcs/rfc826.html] RFC 826. ARP is used by a networked machine to resolve the hardware location/address of another machine on the same local network. Machines on the Internet are generally known by their names which resolve to IP addresses. This is how a machine on the foo.com network is able to communicate with another machine which is on the bar.net network. An IP address, though, cannot tell you the physical location of a machine. This is where ARP comes into the picture. Let's take a very simple example. Suppose I have a network composed of several machines. Two of the machines which are currently on my network are foo with an IP address of 10.0.0.1 and bar with an IP address of 10.0.0.2. Now foo wants to ping bar to see that he is alive, but alas, foo has no idea where bar is. So when foo decides to ping bar he will need to send out an ARP request. This ARP request is akin to foo shouting out on the network "Bar (10.0.0.2)! Where are you?" As a result of this every machine on the network will hear foo shouting, but only bar (10.0.0.2) will respond. Bar will then send an ARP reply directly back to foo which is akin bar saying, "Foo (10.0.0.1) I am here at 00:60:94:E9:08:12." After this simple transaction that's used to locate his friend on the network, foo is able to communicate with bar until he (his arp cache) forgets where bar is (typically after 15 minutes on Unix). Now let's see how this works. You can view your machines current arp/neighbor cache/table like so: +---------------------------------------------------------------------------+ |[root@espa041 /home/src/iputils]# ip neigh show | |9.3.76.42 dev eth0 lladdr 00:60:08:3f:e9:f9 nud reachable | |9.3.76.1 dev eth0 lladdr 00:06:29:21:73:c8 nud reachable | +---------------------------------------------------------------------------+ As you can see my machine espa041 (9.3.76.41) knows where to find espa042 (9.3.76.42) and espagate (9.3.76.1). Now let's add another machine to the arp cache. +-------------------------------------------------------------------------------+ |[root@espa041 /home/paulsch/.gnome-desktop]# ping -c 1 espa043 | |PING espa043.austin.ibm.com (9.3.76.43) from 9.3.76.41 : 56(84) bytes of data. | |64 bytes from 9.3.76.43: icmp_seq=0 ttl=255 time=0.9 ms | | | |--- espa043.austin.ibm.com ping statistics --- | |1 packets transmitted, 1 packets received, 0% packet loss | |round-trip min/avg/max = 0.9/0.9/0.9 ms | | | |[root@espa041 /home/src/iputils]# ip neigh show | |9.3.76.43 dev eth0 lladdr 00:06:29:21:80:20 nud reachable | |9.3.76.42 dev eth0 lladdr 00:60:08:3f:e9:f9 nud reachable | |9.3.76.1 dev eth0 lladdr 00:06:29:21:73:c8 nud reachable | +-------------------------------------------------------------------------------+ As a result of espa041 trying to contact espa043, espa043's hardware address/ location has now been added to the arp/neighbor cache. So until the entry for espa043 times out (as a result of no communication between the two) espa041 knows where to find espa043 and has no need to send an ARP request. Now let's delete espa043 from our arp cache: +---------------------------------------------------------------------------+ |[root@espa041 /home/src/iputils]# ip neigh delete 9.3.76.43 dev eth0 | |[root@espa041 /home/src/iputils]# ip neigh show | |9.3.76.43 dev eth0 nud failed | |9.3.76.42 dev eth0 lladdr 00:60:08:3f:e9:f9 nud reachable | |9.3.76.1 dev eth0 lladdr 00:06:29:21:73:c8 nud stale | +---------------------------------------------------------------------------+ Now espa041 has again forgotten where to find espa043 and will need to send another ARP request the next time he needs to communicate with espa043. You can also see from the above output that espagate (9.3.76.1) has been changed to the "stale" state. This means that the location shown is still valid, but it will have to be confirmed at the first transaction to that machine. ----------------------------------------------------------------------------- Chapter 4. Rules - routing policy database If you have a large router, you may well cater for the needs of different people, who should be served differently. The routing policy database allows you to do this by having multiple sets of routing tables. If you want to use this feature, make sure that your kernel is compiled with the "IP: advanced router" and "IP: policy routing" features. When the kernel needs to make a routing decision, it finds out which table needs to be consulted. By default, there are three tables. The old 'route' tool modifies the main and local tables, as does the ip tool (by default). The default rules: +---------------------------------------------------------------------------+ |[ahu@home ahu]$ ip rule list | |0: from all lookup local | |32766: from all lookup main | |32767: from all lookup default | +---------------------------------------------------------------------------+ This lists the priority of all rules. We see that all rules apply to all packets ('from all'). We've seen the 'main' table before, it is output by ip route ls, but the 'local' and 'default' table are new. If we want to do fancy things, we generate rules which point to different tables which allow us to override system wide routing rules. For the exact semantics on what the kernel does when there are more matching rules, see Alexey's ip-cref documentation. ----------------------------------------------------------------------------- 4.1. Simple source policy routing Let's take a real example once again, I have 2 (actually 3, about time I returned them) cable modems, connected to a Linux NAT ('masquerading') router. People living here pay me to use the Internet. Suppose one of my house mates only visits hotmail and wants to pay less. This is fine with me, but they'll end up using the low-end cable modem. The 'fast' cable modem is known as 212.64.94.251 and is a PPP link to 212.64.94.1. The 'slow' cable modem is known by various ip addresses, 212.64.78.148 in this example and is a link to 195.96.98.253. The local table: +---------------------------------------------------------------------------+ |[ahu@home ahu]$ ip route list table local | |broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 | |local 10.0.0.1 dev eth0 proto kernel scope host src 10.0.0.1 | |broadcast 10.0.0.0 dev eth0 proto kernel scope link src 10.0.0.1 | |local 212.64.94.251 dev ppp0 proto kernel scope host src 212.64.94.251 | |broadcast 10.255.255.255 dev eth0 proto kernel scope link src 10.0.0.1 | |broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 | |local 212.64.78.148 dev ppp2 proto kernel scope host src 212.64.78.148 | |local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 | |local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 | +---------------------------------------------------------------------------+ Lots of obvious things, but things that need to be specified somewhere. Well, here they are. The default table is empty. Let's view the 'main' table: +---------------------------------------------------------------------------+ |[ahu@home ahu]$ ip route list table main | |195.96.98.253 dev ppp2 proto kernel scope link src 212.64.78.148 | |212.64.94.1 dev ppp0 proto kernel scope link src 212.64.94.251 | |10.0.0.0/8 dev eth0 proto kernel scope link src 10.0.0.1 | |127.0.0.0/8 dev lo scope link | |default via 212.64.94.1 dev ppp0 | +---------------------------------------------------------------------------+ We now generate a new rule which we call 'John', for our hypothetical house mate. Although we can work with pure numbers, it's far easier if we add our tables to /etc/iproute2/rt_tables. +---------------------------------------------------------------------------+ |# echo 200 John >> /etc/iproute2/rt_tables | |# ip rule add from 10.0.0.10 table John | |# ip rule ls | |0: from all lookup local | |32765: from 10.0.0.10 lookup John | |32766: from all lookup main | |32767: from all lookup default | +---------------------------------------------------------------------------+ Now all that is left is to generate John's table, and flush the route cache: +---------------------------------------------------------------------------+ |# ip route add default via 195.96.98.253 dev ppp2 table John | |# ip route flush cache | +---------------------------------------------------------------------------+ And we are done. It is left as an exercise for the reader to implement this in ip-up. ----------------------------------------------------------------------------- 4.2. Routing for multiple uplinks/providers A common configuration is the following, in which there are two providers that connect a local network (or even a single machine) to the big Internet. +---------------------------------------------------------------------------+ | ________ | | +------------+ / | | | | | | | +-------------+ Provider 1 +------- | | __ | | | / | | ___/ \_ +------+-------+ +------------+ | | | _/ \__ | if1 | / | | / \ | | | | || Local network -----+ Linux router | | Internet | | \_ __/ | | | | | \__ __/ | if2 | \ | | \___/ +------+-------+ +------------+ | | | | | | \ | | +-------------+ Provider 2 +------- | | | | | | | +------------+ \________ | +---------------------------------------------------------------------------+ There are usually two questions given this setup. ----------------------------------------------------------------------------- 4.2.1. Split access The first is how to route answers to packets coming in over a particular provider, say Provider 1, back out again over that same provider. Let us first set some symbolical names. Let $IF1 be the name of the first interface (if1 in the picture above) and $IF2 the name of the second interface. Then let $IP1 be the IP address associated with $IF1 and $IP2 the IP address associated with $IF2. Next, let $P1 be the IP address of the gateway at Provider 1, and $P2 the IP address of the gateway at provider 2. Finally, let $P1_NET be the IP network $P1 is in, and $P2_NET the IP network $P2 is in. One creates two additional routing tables, say T1 and T2. These are added in /etc/iproute2/rt_tables. Then you set up routing in these tables as follows: +---------------------------------------------------------------------------+ | ip route add $P1_NET dev $IF1 src $IP1 table T1 | | ip route add default via $P1 table T1 | | ip route add $P2_NET dev $IF2 src $IP2 table T2 | | ip route add default via $P2 table T2 | | | +---------------------------------------------------------------------------+ Nothing spectacular, just build a route to the gateway and build a default route via that gateway, as you would do in the case of a single upstream provider, but put the routes in a separate table per provider. Note that the network route suffices, as it tells you how to find any host in that network, which includes the gateway, as specified above. Next you set up the main routing table. It is a good idea to route things to the direct neighbour through the interface connected to that neighbour. Note the `src' arguments, they make sure the right outgoing IP address is chosen. +---------------------------------------------------------------------------+ | ip route add $P1_NET dev $IF1 src $IP1 | | ip route add $P2_NET dev $IF2 src $IP2 | | | +---------------------------------------------------------------------------+ Then, your preference for default route: +---------------------------------------------------------------------------+ | ip route add default via $P1 | | | +---------------------------------------------------------------------------+ Next, you set up the routing rules. These actually choose what routing table to route with. You want to make sure that you route out a given interface if you already have the corresponding source address: +---------------------------------------------------------------------------+ | ip rule add from $IP1 table T1 | | ip rule add from $IP2 table T2 | | | +---------------------------------------------------------------------------+ This set of commands makes sure all answers to traffic coming in on a particular interface get answered from that interface. Now, this is just the very basic setup. It will work for all processes running on the router itself, and for the local network, if it is masqueraded. If it is not, then you either have IP space from both providers or you are going to want to masquerade to one of the two providers. In both cases you will want to add rules selecting which provider to route out from based on the IP address of the machine in the local network. ----------------------------------------------------------------------------- 4.2.2. Load balancing The second question is how to balance traffic going out over the two providers. This is actually not hard if you already have set up split access as above. Instead of choosing one of the two providers as your default route, you now set up the default route to be a multipath route. In the default kernel this will balance routes over the two providers. It is done as follows (once more building on the example in the section on split-access): +----------------------------------------------------------------------------------+ | ip route add default scope global nexthop via $P1 dev $IF1 weight 1 \ | | nexthop via $P2 dev $IF2 weight 1 | | | +----------------------------------------------------------------------------------+ This will balance the routes over both providers. The weight parameters can be tweaked to favor one provider over the other. Note that balancing will not be perfect, as it is route based, and routes are cached. This means that routes to often-used sites will always be over the same provider. Furthermore, if you really want to do this, you probably also want to look at Julian Anastasov's patches at http://www.linuxvirtualserver.org/~julian/# routes , Julian's route patch page. They will make things nicer to work with. ----------------------------------------------------------------------------- Chapter 5. GRE and other tunnels There are 3 kinds of tunnels in Linux. There's IP in IP tunneling, GRE tunneling and tunnels that live outside the kernel (like, for example PPTP). ----------------------------------------------------------------------------- 5.1. A few general remarks about tunnels: Tunnels can be used to do some very unusual and very cool stuff. They can also make things go horribly wrong when you don't configure them right. Don't point your default route to a tunnel device unless you know EXACTLY what you are doing :-). Furthermore, tunneling increases overhead, because it needs an extra set of IP headers. Typically this is 20 bytes per packet, so if the normal packet size (MTU) on a network is 1500 bytes, a packet that is sent through a tunnel can only be 1480 bytes big. This is not necessarily a problem, but be sure to read up on IP packet fragmentation/reassembly when you plan to connect large networks with tunnels. Oh, and of course, the fastest way to dig a tunnel is to dig at both sides. ----------------------------------------------------------------------------- 5.2. IP in IP tunneling This kind of tunneling has been available in Linux for a long time. It requires 2 kernel modules, ipip.o and new_tunnel.o. Let's say you have 3 networks: Internal networks A and B, and intermediate network C (or let's say, Internet). So we have network A: +---------------------------------------------------------------------------+ |network 10.0.1.0 | |netmask 255.255.255.0 | |router 10.0.1.1 | +---------------------------------------------------------------------------+ The router has address 172.16.17.18 on network C. and network B: +---------------------------------------------------------------------------+ |network 10.0.2.0 | |netmask 255.255.255.0 | |router 10.0.2.1 | +---------------------------------------------------------------------------+ The router has address 172.19.20.21 on network C. As far as network C is concerned, we assume that it will pass any packet sent from A to B and vice versa. You might even use the Internet for this. Here's what you do: First, make sure the modules are installed: +---------------------------------------------------------------------------+ |insmod ipip.o | |insmod new_tunnel.o | +---------------------------------------------------------------------------+ Then, on the router of network A, you do the following: +---------------------------------------------------------------------------+ |ifconfig tunl0 10.0.1.1 pointopoint 172.19.20.21 | |route add -net 10.0.2.0 netmask 255.255.255.0 dev tunl0 | +---------------------------------------------------------------------------+ And on the router of network B: +---------------------------------------------------------------------------+ |ifconfig tunl0 10.0.2.1 pointopoint 172.16.17.18 | |route add -net 10.0.1.0 netmask 255.255.255.0 dev tunl0 | +---------------------------------------------------------------------------+ And if you're finished with your tunnel: +---------------------------------------------------------------------------+ |ifconfig tunl0 down | +---------------------------------------------------------------------------+ Presto, you're done. You can't forward broadcast or IPv6 traffic through an IP-in-IP tunnel, though. You just connect 2 IPv4 networks that normally wouldn't be able to talk to each other, that's all. As far as compatibility goes, this code has been around a long time, so it's compatible all the way back to 1.3 kernels. Linux IP-in-IP tunneling doesn't work with other Operating Systems or routers, as far as I know. It's simple, it works. Use it if you have to, otherwise use GRE. ----------------------------------------------------------------------------- 5.3. GRE tunneling GRE is a tunneling protocol that was originally developed by Cisco, and it can do a few more things than IP-in-IP tunneling. For example, you can also transport multicast traffic and IPv6 through a GRE tunnel. In Linux, you'll need the ip_gre.o module. ----------------------------------------------------------------------------- 5.3.1. IPv4 Tunneling Let's do IPv4 tunneling first: Let's say you have 3 networks: Internal networks A and B, and intermediate network C (or let's say, Internet). So we have network A: +---------------------------------------------------------------------------+ |network 10.0.1.0 | |netmask 255.255.255.0 | |router 10.0.1.1 | +---------------------------------------------------------------------------+ The router has address 172.16.17.18 on network C. Let's call this network neta (ok, hardly original) and network B: +---------------------------------------------------------------------------+ |network 10.0.2.0 | |netmask 255.255.255.0 | |router 10.0.2.1 | +---------------------------------------------------------------------------+ The router has address 172.19.20.21 on network C. Let's call this network netb (still not original) As far as network C is concerned, we assume that it will pass any packet sent from A to B and vice versa. How and why, we do not care. On the router of network A, you do the following: +---------------------------------------------------------------------------+ |ip tunnel add netb mode gre remote 172.19.20.21 local 172.16.17.18 ttl 255 | |ip link set netb up | |ip addr add 10.0.1.1 dev netb | |ip route add 10.0.2.0/24 dev netb | +---------------------------------------------------------------------------+ Let's discuss this for a bit. In line 1, we added a tunnel device, and called it netb (which is kind of obvious because that's where we want it to go). Furthermore we told it to use the GRE protocol (mode gre), that the remote address is 172.19.20.21 (the router at the other end), that our tunneling packets should originate from 172.16.17.18 (which allows your router to have several IP addresses on network C and let you decide which one to use for tunneling) and that the TTL field of the packet should be set to 255 (ttl 255). The second line enables the device. In the third line we gave the newly born interface netb the address 10.0.1.1. This is OK for smaller networks, but when you're starting up a mining expedition (LOTS of tunnels), you might want to consider using another IP range for tunneling interfaces (in this example, you could use 10.0.3.0). In the fourth line we set the route for network B. Note the different notation for the netmask. If you're not familiar with this notation, here's how it works: you write out the netmask in binary form, and you count all the ones. If you don't know how to do that, just remember that 255.0.0.0 is /8, 255.255.0.0 is /16 and 255.255.255.0 is /24. Oh, and 255.255.254.0 is /23, in case you were wondering. But enough about this, let's go on with the router of network B. +---------------------------------------------------------------------------+ |ip tunnel add neta mode gre remote 172.16.17.18 local 172.19.20.21 ttl 255 | |ip link set neta up | |ip addr add 10.0.2.1 dev neta | |ip route add 10.0.1.0/24 dev neta | +---------------------------------------------------------------------------+ And when you want to remove the tunnel on router A: +---------------------------------------------------------------------------+ |ip link set netb down | |ip tunnel del netb | +---------------------------------------------------------------------------+ Of course, you can replace netb with neta for router B. ----------------------------------------------------------------------------- 5.3.2. IPv6 Tunneling See Section 6 for a short bit about IPv6 Addresses. On with the tunnels. Let's assume that you have the following IPv6 network, and you want to connect it to 6bone, or a friend. +---------------------------------------------------------------------------+ |Network 3ffe:406:5:1:5:a:2:1/96 | +---------------------------------------------------------------------------+ Your IPv4 address is 172.16.17.18, and the 6bone router has IPv4 address 172.22.23.24. +------------------------------------------------------------------------------+ |ip tunnel add sixbone mode sit remote 172.22.23.24 local 172.16.17.18 ttl 255 | |ip link set sixbone up | |ip addr add 3ffe:406:5:1:5:a:2:1/96 dev sixbone | |ip route add 3ffe::/15 dev sixbone | +------------------------------------------------------------------------------+ Let's discuss this. In the first line, we created a tunnel device called sixbone. We gave it mode sit (which is IPv6 in IPv4 tunneling) and told it where to go to (remote) and where to come from (local). TTL is set to maximum, 255. Next, we made the device active (up). After that, we added our own network address, and set a route for 3ffe::/15 (which is currently all of 6bone) through the tunnel. GRE tunnels are currently the preferred type of tunneling. It's a standard that is also widely adopted outside the Linux community and therefore a Good Thing. ----------------------------------------------------------------------------- 5.4. Userland tunnels There are literally dozens of implementations of tunneling outside the kernel. Best known are of course PPP and PPTP, but there are lots more (some proprietary, some secure, some that don't even use IP) and that is really beyond the scope of this HOWTO. ----------------------------------------------------------------------------- Chapter 6. IPv6 tunneling with Cisco and/or 6bone By Marco Davids NOTE to maintainer: As far as I am concerned, this IPv6-IPv4 tunneling is not per definition GRE tunneling. You could tunnel IPv6 over IPv4 by means of GRE tunnel devices (GRE tunnels ANY to IPv4), but the device used here ("sit") only tunnels IPv6 over IPv4 and is therefore something different. ----------------------------------------------------------------------------- 6.1. IPv6 Tunneling This is another application of the tunneling capabilities of Linux. It is popular among the IPv6 early adopters, or pioneers if you like. The 'hands-on' example described below is certainly not the only way to do IPv6 tunneling. However, it is the method that is often used to tunnel between Linux and a Cisco IPv6 capable router and experience tells us that this is just the thing many people are after. Ten to one this applies to you too ;-) A short bit about IPv6 addresses: IPv6 addresses are, compared to IPv4 addresses, really big: 128 bits against 32 bits. And this provides us just with the thing we need: many, many IP-addresses: 340,282,266,920,938,463,463,374,607,431,768,211,465 to be precise. Apart from this, IPv6 (or IPng, for IP Next Generation) is supposed to provide for smaller routing tables on the Internet's backbone routers, simpler configuration of equipment, better security at the IP level and better support for QoS. An example: 2002:836b:9820:0000:0000:0000:836b:9886 Writing down IPv6 addresses can be quite a burden. Therefore, to make life easier there are some rules:   * Don't use leading zeroes. Same as in IPv4.   * Use colons to separate every 16 bits or two bytes.   * When you have lots of consecutive zeroes, you can write this down as ::. You can only do this once in an address and only for quantities of 16 bits, though. The address 2002:836b:9820:0000:0000:0000:836b:9886 can be written down as 2002:836b:9820::836b:9886, which is somewhat friendlier. Another example, the address 3ffe:0000:0000:0000:0000:0020:34A1:F32C can be written down as 3ffe::20:34A1:F32C, which is a lot shorter. IPv6 is intended to be the successor of the current IPv4. Because it is relatively new technology, there is no worldwide native IPv6 network yet. To be able to move forward swiftly, the 6bone was introduced. Native IPv6 networks are connected to each other by encapsulating the IPv6 protocol in IPv4 packets and sending them over the existing IPv4 infrastructure from one IPv6 site to another. That is precisely where the tunnel steps in. To be able to use IPv6, we should have a kernel that supports it. There are many good documents on how to achieve this. But it all comes down to a few steps:   * Get yourself a recent Linux distribution, with suitable glibc.   * Then get yourself an up-to-date kernel source. If you are all set, then you can go ahead and compile an IPv6 capable kernel:   * Go to /usr/src/linux and type:   * make menuconfig   * Choose "Networking Options"   * Select "The IPv6 protocol", "IPv6: enable EUI-64 token format", "IPv6: disable provider based addresses" HINT: Don't go for the 'module' option. Often this won't work well. In other words, compile IPv6 as 'built-in' in your kernel. You can then save your config like usual and go ahead with compiling the kernel. HINT: Before doing so, consider editing the Makefile: EXTRAVERSION = -x ; --> ; EXTRAVERSION = -x-IPv6 There is a lot of good documentation about compiling and installing a kernel, however this document is about something else. If you run into problems at this stage, go and look for documentation about compiling a Linux kernel according to your own specifications. The file /usr/src/linux/README might be a good start. After you accomplished all this, and rebooted with your brand new kernel, you might want to issue an '/sbin/ifconfig -a' and notice the brand new 'sit0-device'. SIT stands for Simple Internet Transition. You may give yourself a compliment; you are now one major step closer to IP, the Next Generation ;-) Now on to the next step. You want to connect your host, or maybe even your entire LAN to another IPv6 capable network. This might be the "6bone" that is setup especially for this particular purpose. Let's assume that you have the following IPv6 network: 3ffe:604:6:8::/64 and you want to connect it to 6bone, or a friend. Please note that the /64 subnet notation works just like with regular IP addresses. Your IPv4 address is 145.100.24.181 and the 6bone router has IPv4 address 145.100.1.5 +-----------------------------------------------------------------------------------+ |# ip tunnel add sixbone mode sit remote 145.100.1.5 [local 145.100.24.181 ttl 255] | |# ip link set sixbone up | |# ip addr add 3FFE:604:6:7::2/126 dev sixbone | |# ip route add 3ffe::0/16 dev sixbone | +-----------------------------------------------------------------------------------+ Let's discuss this. In the first line, we created a tunnel device called sixbone. We gave it mode sit (which is IPv6 in IPv4 tunneling) and told it where to go to (remote) and where to come from (local). TTL is set to maximum, 255. Next, we made the device active (up). After that, we added our own network address, and set a route for 3ffe::/15 (which is currently all of 6bone) through the tunnel. If the particular machine you run this on is your IPv6 gateway, then consider adding the following lines: +---------------------------------------------------------------------------+ |# echo 1 >/proc/sys/net/ipv6/conf/all/forwarding | |# /usr/local/sbin/radvd | +---------------------------------------------------------------------------+ The latter, radvd is -like zebra- a router advertisement daemon, to support IPv6's autoconfiguration features. Search for it with your favourite search-engine if you like. You can check things like this: +---------------------------------------------------------------------------+ |# /sbin/ip -f inet6 addr | +---------------------------------------------------------------------------+ If you happen to have radvd running on your IPv6 gateway and boot your IPv6 capable Linux on a machine on your local LAN, you would be able to enjoy the benefits of IPv6 autoconfiguration: +------------------------------------------------------------------------------+ |# /sbin/ip -f inet6 addr | |1: lo: mtu 3924 qdisc noqueue inet6 ::1/128 scope host | | | |3: eth0: mtu 1500 qdisc pfifo_fast qlen 100 | |inet6 3ffe:604:6:8:5054:4cff:fe01:e3d6/64 scope global dynamic | |valid_lft forever preferred_lft 604646sec inet6 fe80::5054:4cff:fe01:e3d6/10 | |scope link | +------------------------------------------------------------------------------+ You could go ahead and configure your bind for IPv6 addresses. The A type has an equivalent for IPv6: AAAA. The in-addr.arpa's equivalent is: ip6.int. There's a lot of information available on this topic. There is an increasing number of IPv6-aware applications available, including secure shell, telnet, inetd, Mozilla the browser, Apache the webserver and a lot of others. But this is all outside the scope of this Routing document ;-) On the Cisco side the configuration would be something like this: +---------------------------------------------------------------------------+ |! | |interface Tunnel1 | |description IPv6 tunnel | |no ip address | |no ip directed-broadcast | |ipv6 enable | |ipv6 address 3FFE:604:6:7::1/126 | |tunnel source Serial0 | |tunnel destination 145.100.24.181 | |tunnel mode ipv6ip | |! | |ipv6 route 3FFE:604:6:8::/64 Tunnel1 | +---------------------------------------------------------------------------+ But if you don't have a Cisco at your disposal, try one of the many IPv6 tunnel brokers available on the Internet. They are willing to configure their Cisco with an extra tunnel for you. Mostly by means of a friendly web interface. Search for "ipv6 tunnel broker" on your favourite search engine. ----------------------------------------------------------------------------- Chapter 7. IPsec: secure IP over the Internet FIXME: editor vacancy. In the meantime, see: [http://www.freeswan.org/] The FreeS/WAN project. Another IPSec implementation for Linux is Cerberus, by NIST. However, their web pages have not been updated in over a year, and their version tended to trail well behind the current Linux kernel. USAGI, an alternative IPv6 implementation for Linux, also includes an IPSec implementation, but that might only be for IPv6. ----------------------------------------------------------------------------- Chapter 8. Multicast routing FIXME: Editor Vacancy! The Multicast-HOWTO is ancient (relatively-speaking) and may be inaccurate or misleading in places, for that reason. Before you can do any multicast routing, you need to configure the Linux kernel to support the type of multicast routing you want to do. This, in turn, requires you to decide what type of multicast routing you expect to be using. There are essentially four "common" types - DVMRP (the Multicast version of the RIP unicast protocol), MOSPF (the same, but for OSPF), PIM-SM ("Protocol Independent Multicasting - Sparse Mode", which assumes that users of any multicast group are spread out, rather than clumped) and PIM-DM (the same, but "Dense Mode", which assumes that there will be significant clumps of users of the same multicast group). In the Linux kernel, you will notice that these options don't appear. This is because the protocol itself is handled by a routing application, such as Zebra, mrouted, or pimd. However, you still have to have a good idea of which you're going to use, to select the right options in the kernel. For all multicast routing, you will definitely need to enable "multicasting" and "multicast routing". For DVMRP and MOSPF, this is sufficient. If you are going to use PIM, you must also enable PIMv1 or PIMv2, depending on whether the network you are connecting to uses version 1 or 2 of the PIM protocol. Once you have all that sorted out, and your new Linux kernel compiled, you will see that the IP protocols listed, at boot time, now include IGMP. This is a protocol for managing multicast groups. At the time of writing, Linux supports IGMP versions 1 and 2 only, although version 3 does exist and has been documented. This doesn't really affect us that much, as IGMPv3 is still new enough that the extra capabilities of IGMPv3 aren't going to be that much use. Because IGMP deals with groups, only the features present in the simplest version of IGMP over the entire group are going to be used. For the most part, that will be IGMPv2, although IGMPv1 is sill going to be encountered. So far, so good. We've enabled multicasting. Now, we have to tell the Linux kernel to actually do something with it, so we can start routing. This means adding the Multicast virtual network to the router table: ip route add 224.0.0.0/4 dev eth0 (Assuming, of course, that you're multicasting over eth0! Substitute the device of your choice, for this.) Now, tell Linux to forward packets... echo 1 > /proc/sys/net/ipv4/ip_forward At this point, you may be wondering if this is ever going to do anything. So, to test our connection, we ping the default group, 224.0.0.1, to see if anyone is alive. All machines on your LAN with multicasting enabled should respond, but nothing else. You'll notice that none of the machines that respond have an IP address of 224.0.0.1. What a surprise! :) This is a group address (a "broadcast" to subscribers), and all members of the group will respond with their own address, not the group address. ping -c 2 224.0.0.1 At this point, you're ready to do actual multicast routing. Well, assuming that you have two networks to route between. (To Be Continued!) ----------------------------------------------------------------------------- Chapter 9. Queueing Disciplines for Bandwidth Management Now, when I discovered this, it really blew me away. Linux 2.2/2.4 comes with everything to manage bandwidth in ways comparable to high-end dedicated bandwidth management systems. Linux even goes far beyond what Frame and ATM provide. Just to prevent confusion, tc uses the following rules for bandwith specification: mbps = 1024 kbps = 1024 * 1024 bps => byte/s mbit = 1024 kbit => kilo bit/s. mb = 1024 kb = 1024 * 1024 b => byte mbit = 1024 kbit => kilo bit. Internally, the number is stored in bps and b. But when tc prints the rate, it uses following : 1Mbit = 1024 Kbit = 1024 * 1024 bps => bit/s ----------------------------------------------------------------------------- 9.1. Queues and Queueing Disciplines explained With queueing we determine the way in which data is SENT. It is important to realise that we can only shape data that we transmit. With the way the Internet works, we have no direct control of what people send us. It's a bit like your (physical!) mailbox at home. There is no way you can influence the world to modify the amount of mail they send you, short of contacting everybody. However, the Internet is mostly based on TCP/IP which has a few features that help us. TCP/IP has no way of knowing the capacity of the network between two hosts, so it just starts sending data faster and faster ('slow start') and when packets start getting lost, because there is no room to send them, it will slow down. In fact it is a bit smarter than this, but more about that later. This is the equivalent of not reading half of your mail, and hoping that people will stop sending it to you. With the difference that it works for the Internet :-) If you have a router and wish to prevent certain hosts within your network from downloading too fast, you need to do your shaping on the *inner* interface of your router, the one that sends data to your own computers. You also have to be sure you are controlling the bottleneck of the link. If you have a 100Mbit NIC and you have a router that has a 256kbit link, you have to make sure you are not sending more data than your router can handle. Otherwise, it will be the router who is controlling the link and shaping the available bandwith. We need to 'own the queue' so to speak, and be the slowest link in the chain. Luckily this is easily possible. ----------------------------------------------------------------------------- 9.2. Simple, classless Queueing Disciplines As said, with queueing disciplines, we change the way data is sent. Classless queueing disciplines are those that, by and large accept data and only reschedule, delay or drop it. These can be used to shape traffic for an entire interface, without any subdivisions. It is vital that you understand this part of queueing before we go on the the classful qdisc-containing-qdiscs! By far the most widely used discipline is the pfifo_fast qdisc - this is the default. This also explains why these advanced features are so robust. They are nothing more than 'just another queue'. Each of these queues has specific strengths and weaknesses. Not all of them may be as well tested. ----------------------------------------------------------------------------- 9.2.1. pfifo_fast This queue is, as the name says, First In, First Out, which means that no packet receives special treatment. At least, not quite. This queue has 3 so called 'bands'. Within each band, FIFO rules apply. However, as long as there are packets waiting in band 0, band 1 won't be processed. Same goes for band 1 and band 2. The kernel honors the so called Type of Service flag of packets, and takes care to insert 'minimum delay' packets in band 0. Do not confuse this classless simple qdisc with the classful PRIO one! Although they behave similarly, pfifo_fast is classless and you cannot add other qdiscs to it with the tc command. ----------------------------------------------------------------------------- 9.2.1.1. Parameters & usage You can't configure the pfifo_fast qdisc as it is the hardwired default. This is how it is configured by default: priomap Determines how packet priorities, as assigned by the kernel, map to bands. Mapping occurs based on the TOS octet of the packet, which looks like this: +---------------------------------------------------------------+ | 0 1 2 3 4 5 6 7 | |+-----+-----+-----+-----+-----+-----+-----+-----+ | || | | | | || PRECEDENCE | TOS | MBZ | | || | | | | |+-----+-----+-----+-----+-----+-----+-----+-----+ | +---------------------------------------------------------------+ The four TOS bits (the 'TOS field') are defined as: +---------------------------------------------------------------+ |Binary Decimcal Meaning | |----------------------------------------- | |1000 8 Minimize delay (md) | |0100 4 Maximize throughput (mt) | |0010 2 Maximize reliability (mr) | |0001 1 Minimize monetary cost (mmc) | |0000 0 Normal Service | +---------------------------------------------------------------+ As there is 1 bit to the right of these four bits, the actual value of the TOS field is double the value of the TOS bits. Tcpdump -v -v shows you the value of the entire TOS field, not just the four bits. It is the value you see in the first column of this table: +---------------------------------------------------------------+ |TOS Bits Means Linux Priority Band | |------------------------------------------------------------ | |0x0 0 Normal Service 0 Best Effort 1 | |0x2 1 Minimize Monetary Cost 1 Filler 2 | |0x4 2 Maximize Reliability 0 Best Effort 1 | |0x6 3 mmc+mr 0 Best Effort 1 | |0x8 4 Maximize Throughput 2 Bulk 2 | |0xa 5 mmc+mt 2 Bulk 2 | |0xc 6 mr+mt 2 Bulk 2 | |0xe 7 mmc+mr+mt 2 Bulk 2 | |0x10 8 Minimize Delay 6 Interactive 0 | |0x12 9 mmc+md 6 Interactive 0 | |0x14 10 mr+md 6 Interactive 0 | |0x16 11 mmc+mr+md 6 Interactive 0 | |0x18 12 mt+md 4 Int. Bulk 1 | |0x1a 13 mmc+mt+md 4 Int. Bulk 1 | |0x1c 14 mr+mt+md 4 Int. Bulk 1 | |0x1e 15 mmc+mr+mt+md 4 Int. Bulk 1 | +---------------------------------------------------------------+ Lots of numbers. The second column contains the value of the relevant four TOS bits, followed by their translated meaning. For example, 15 stands for a packet wanting Minimal Monetary Cost, Maximum Reliability, Maximum Throughput AND Minimum Delay. I would call this a 'Dutch Packet'. The fourth column lists the way the Linux kernel interprets the TOS bits, by showing to which Priority they are mapped. The last column shows the result of the default priomap. On the command line, the default priomap looks like this: +---------------------------------------------------------------+ |1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 | +---------------------------------------------------------------+ This means that priority 4, for example, gets mapped to band number 1. The priomap also allows you to list higher priorities (> 7) which do not correspond to TOS mappings, but which are set by other means. This table from RFC 1349 (read it for more details) tells you how applications might very well set their TOS bits: +-----------------------------------------------------------------+ |TELNET 1000 (minimize delay) | |FTP | | Control 1000 (minimize delay) | | Data 0100 (maximize throughput) | | | |TFTP 1000 (minimize delay) | | | |SMTP | | Command phase 1000 (minimize delay) | | DATA phase 0100 (maximize throughput) | | | |Domain Name Service | | UDP Query 1000 (minimize delay) | | TCP Query 0000 | | Zone Transfer 0100 (maximize throughput) | | | |NNTP 0001 (minimize monetary cost) | | | |ICMP | | Errors 0000 | | Requests 0000 (mostly) | | Responses (mostly) | +-----------------------------------------------------------------+ txqueuelen The length of this queue is gleaned from the interface configuration, which you can see and set with ifconfig and ip. To set the queue length to 10, execute: ifconfig eth0 txqueuelen 10 You can't set this parameter with tc! ----------------------------------------------------------------------------- 9.2.2. Token Bucket Filter The Token Bucket Filter (TBF) is a simple qdisc that only passes packets arriving at a rate which is not exceeding some administratively set rate, but with the possibility to allow short bursts in excess of this rate. TBF is very precise, network- and processor friendly. It should be your first choice if you simply want to slow an interface down! The TBF implementation consists of a buffer (bucket), constantly filled by some virtual pieces of information called tokens, at a specific rate (token rate). The most important parameter of the bucket is its size, that is the number of tokens it can store. Each arriving token collects one incoming data packet from the data queue and is then deleted from the bucket. Associating this algorithm with the two flows -- token and data, gives us three possible scenarios:   * The data arrives in TBF at a rate that's equal to the rate of incoming tokens. In this case each incoming packet has its matching token and passes the queue without delay.   * The data arrives in TBF at a rate that's smaller than the token rate. Only a part of the tokens are deleted at output of each data packet that's sent out the queue, so the tokens accumulate, up to the bucket size. The unused tokens can then be used to send data a a speed that's exceeding the standard token rate, in case short data bursts occur.   * The data arrives in TBF at a rate bigger than the token rate. This means that the bucket will soon be devoid of tokens, which causes the TBF to throttle itself for a while. This is called an 'overlimit situation'. If packets keep coming in, packets will start to get dropped. The last scenario is very important, because it allows to administratively shape the bandwidth available to data that's passing the filter. The accumulation of tokens allows a short burst of overlimit data to be still passed without loss, but any lasting overload will cause packets to be constantly delayed, and then dropped. Please note that in the actual implementation, tokens correspond to bytes, not packets. ----------------------------------------------------------------------------- 9.2.2.1. Parameters & usage Even though you will probably not need to change them, tbf has some knobs available. First the parameters that are always available: limit or latency Limit is the number of bytes that can be queued waiting for tokens to become available. You can also specify this the other way around by setting the latency parameter, which specifies the maximum amount of time a packet can sit in the TBF. The latter calculation takes into account the size of the bucket, the rate and possibly the peakrate (if set). burst/buffer/maxburst Size of the bucket, in bytes. This is the maximum amount of bytes that tokens can be available for instantaneously. In general, larger shaping rates require a larger buffer. For 10mbit/s on Intel, you need at least 10kbyte buffer if you want to reach your configured rate! If your buffer is too small, packets may be dropped because more tokens arrive per timer tick than fit in your bucket. mpu A zero-sized packet does not use zero bandwidth. For ethernet, no packet uses less than 64 bytes. The Minimum Packet Unit determines the minimal token usage for a packet. rate The speedknob. See remarks above about limits! If the bucket contains tokens and is allowed to empty, by default it does so at infinite speed. If this is unacceptable, use the following parameters: peakrate If tokens are available, and packets arrive, they are sent out immediately by default, at 'lightspeed' so to speak. That may not be what you want, especially if you have a large bucket. The peakrate can be used to specify how quickly the bucket is allowed to be depleted. If doing everything by the book, this is achieved by releasing a packet, and then wait just long enough, and release the next. We calculated our waits so we send just at peakrate. However, due to de default 10ms timer resolution of Unix, with 10.000 bits average packets, we are limited to 1mbit/s of peakrate! mtu/minburst The 1mbit/s peakrate is not very useful if your regular rate is more than that. A higher peakrate is possible by sending out more packets per timertick, which effectively means that we create a second bucket! This second bucket defaults to a single packet, which is not a bucket at all. To calculate the maximum possible peakrate, multiply the configured mtu by 100 (or more correctly, HZ, which is 100 on Intel, 1024 on Alpha). ----------------------------------------------------------------------------- 9.2.2.2. Sample configuration A simple but *very* useful configuration is this: +---------------------------------------------------------------------------+ |# tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540 | +---------------------------------------------------------------------------+ Ok, why is this useful? If you have a networking device with a large queue, like a DSL modem or a cable modem, and you talk to it over a fast device, like over an ethernet interface, you will find that uploading absolutely destroys interactivity. This is because uploading will fill the queue in the modem, which is probably *huge* because this helps actually achieving good data throughput uploading. But this is not what you want, you want to have the queue not too big so interactivity remains and you can still do other stuff while sending data. The line above slows down sending to a rate that does not lead to a queue in the modem - the queue will be in Linux, where we can control it to a limited size. Change 220kbit to your uplink's *actual* speed, minus a few percent. If you have a really fast modem, raise 'burst' a bit. ----------------------------------------------------------------------------- 9.2.3. Stochastic Fairness Queueing Stochastic Fairness Queueing (SFQ) is a simple implementation of the fair queueing algorithms family. It's less accurate than others, but it also requires less calculations while being almost perfectly fair. The key word in SFQ is conversation (or flow), which mostly corresponds to a TCP session or a UDP stream. Traffic is divided into a pretty large number of FIFO queues, one for each conversation. Traffic is then sent in a round robin fashion, giving each session the chance to send data in turn. This leads to very fair behaviour and disallows any single conversation from drowning out the rest. SFQ is called 'Stochastic' because it doesn't really allocate a queue for each session, it has an algorithm which divides traffic over a limited number of queues using a hashing algorithm. Because of the hash, multiple sessions might end up in the same bucket, which would halve each session's chance of sending a packet, thus halving the effective speed available. To prevent this situation from becoming noticeable, SFQ changes its hashing algorithm quite often so that any two colliding sessions will only do so for a small number of seconds. It is important to note that SFQ is only useful in case your actual outgoing interface is really full! If it isn't then there will be no queue on your linux machine and hence no effect. Later on we will describe how to combine SFQ with other qdiscs to get a best-of-both worlds situation. Specifically, setting SFQ on the ethernet interface heading to your cable modem or DSL router is pointless without further shaping! ----------------------------------------------------------------------------- 9.2.3.1. Parameters & usage The SFQ is pretty much self tuning: perturb Reconfigure hashing once this many seconds. If unset, hash will never be reconfigured. Not recommended. 10 seconds is probably a good value. quantum Amount of bytes a stream is allowed to dequeue before the next queue gets a turn. Defaults to 1 maximum sized packet (MTU-sized). Do not set below the MTU! ----------------------------------------------------------------------------- 9.2.3.2. Sample configuration If you have a device which has identical link speed and actual available rate, like a phone modem, this configuration will help promote fairness: +--------------------------------------------------------------------------------+ |# tc qdisc add dev ppp0 root sfq perturb 10 | |# tc -s -d qdisc ls | |qdisc sfq 800c: dev ppp0 quantum 1514b limit 128p flows 128/1024 perturb 10sec | | Sent 4812 bytes 62 pkts (dropped 0, overlimits 0) | +--------------------------------------------------------------------------------+ The number 800c: is the automatically assigned handle number, limit means that 128 packets can wait in this queue. There are 1024 hashbuckets available for accounting, of which 128 can be active at a time (no more packets fit in the queue!) Once every 10 seconds, the hashes are reconfigured. ----------------------------------------------------------------------------- 9.3. Advice for when to use which queue Summarizing, these are the simple queues that actually manage traffic by reordering, slowing or dropping packets. The following tips may help in choosing which queue to use. It mentions some qdiscs described in the Chapter 14 chapter.   * To purely slow down outgoing traffic, use the Token Bucket Filter. Works up to huge bandwidths, if you scale the bucket.   * If your link is truly full and you want to make sure that no single session can dominate your outgoing bandwidth, use Stochastical Fairness Queueing.   * If you have a big backbone and know what you are doing, consider Random Early Drop (see Advanced chapter).   * To 'shape' incoming traffic which you are not forwarding, use the Ingress Policer. Incoming shaping is called 'policing', by the way, not 'shaping'.   * If you *are* forwarding it, use a TBF on the interface you are forwarding the data to. Unless you want to shape traffic that may go out over several interfaces, in which case the only common factor is the incoming interface. In that case use the Ingress Policer.   * If you don't want to shape, but only want to see if your interface is so loaded that it has to queue, use the pfifo queue (not pfifo_fast). It lacks internal bands but does account the size of its backlog.   * Finally - you can also do "social shaping". You may not always be able to use technology to achieve what you want. Users experience technical constraints as hostile. A kind word may also help with getting your bandwidth to be divided right! ----------------------------------------------------------------------------- 9.4. Terminology To properly understand more complicated configurations it is necessary to explain a few concepts first. Because of the complexity and he relative youth of the subject, a lot of different words are used when people in fact mean the same thing. The following is loosely based on draft-ietf-diffserv-model-06.txt, An Informal Management Model for Diffserv Routers. It can currently be found at [http://www.ietf.org/internet-drafts/draft-ietf-diffserv-model-06.txt] http:/ /www.ietf.org/internet-drafts/draft-ietf-diffserv-model-06.txt. Read it for the strict definitions of the terms used. Queueing Discipline An algorithm that manages the queue of a device, either incoming (ingress) or outgoing (egress). Classless qdisc A qdisc with no configurable internal subdivisions. Classful qdisc A classful qdisc contains multiple classes. Each of these classes contains a further qdisc, which may again be classful, but need not be. According to the strict definition, pfifo_fast *is* classful, because it contains three bands which are, in fact, classes. However, from the user's configuration perspective, it is classless as the classes can't be touched with the tc tool. Classes A classful qdisc may have many classes, which each are internal to the qdisc. Each of these classes may contain a real qdisc. Classifier Each classful qdisc needs to determine to which class it needs to send a packet. This is done using the classifier. Filter Classification can be performed using filters. A filter contains a number of conditions which if matched, make the filter match. Scheduling A qdisc may, with the help of a classifier, decide that some packets need to go out earlier than others. This process is called Scheduling, and is performed for example by the pfifo_fast qdisc mentioned earlier. Scheduling is also called 'reordering', but this is confusing. Shaping The process of delaying packets before they go out to make traffic confirm to a configured maximum rate. Shaping is performed on egress. Colloquially, dropping packets to slow traffic down is also often called Shaping. Policing Delaying or dropping packets in order to make traffic stay below a configured bandwidth. In Linux, policing can only drop a packet and not delay it - there is no 'ingress queue'. Work-Conserving A work-conserving qdisc always delivers a packet if one is available. In other words, it never delays a packet if the network adaptor is ready to send one (in the case of an egress qdisc). non-Work-Conserving Some queues, like for example the Token Bucket Filter, may need to hold on to a packet for a certain time in order to limit the bandwidth. This means that they sometimes refuse to give up a packet, even though they have one available. Now that we have our terminology straight, let's see where all these things are. +---------------------------------------------------------------------------+ | Userspace programs | | ^ | | | | | +---------------+-----------------------------------------+ | | | Y | | | | -------> IP Stack | | | | | | | | | | | Y | | | | | Y | | | | ^ | | | | | | / ----------> Forwarding -> | | | | ^ / | | | | | |/ Y | | | | | | | | | | ^ Y /-qdisc1-\ | | | | | Egress /--qdisc2--\ | | | --->->Ingress Classifier ---qdisc3---- | -> | | | Qdisc \__qdisc4__/ | | | | \-qdiscN_/ | | | | | | | +----------------------------------------------------------+ | +---------------------------------------------------------------------------+ Thanks to Jamal Hadi Salim for this ASCII representation. The big block represents the kernel. The leftmost arrow represents traffic entering your machine from the network. It is then fed to the Ingress Qdisc which may apply Filters to a packet, and decide to drop it. This is called 'Policing'. This happens at a very early stage, before it has seen a lot of the kernel. It is therefore a very good place to drop traffic very early, without consuming a lot of CPU power. If the packet is allowed to continue, it may be destined for a local application, in which case it enters the IP stack in order to be processed, and handed over to a userspace program. The packet may also be forwarded without entering an application, in which case it is destined for egress. Userspace programs may also deliver data, which is then examined and forwarded to the Egress Classifier. There it is investigated and enqueued to any of a number of qdiscs. In the unconfigured default case, there is only one egress qdisc installed, the pfifo_fast, which always receives the packet. This is called 'enqueueing'. The packet now sits in the qdisc, waiting for the kernel to ask for it for transmission over the network interface. This is called 'dequeueing'. This picture also holds in case there is only one network adaptor - the arrows entering and leaving the kernel should not be taken too literally. Each network adaptor has both ingress and egress hooks. ----------------------------------------------------------------------------- 9.5. Classful Queueing Disciplines Classful qdiscs are very useful if you have different kinds of traffic which should have differing treatment. One of the classful qdiscs is called 'CBQ' , 'Class Based Queueing' and it is so widely mentioned that people identify queueing with classes solely with CBQ, but this is not the case. CBQ is merely the oldest kid on the block - and also the most complex one. It may not always do what you want. This may come as something of a shock to many who fell for the 'sendmail effect', which teaches us that any complex technology which doesn't come with documentation must be the best available. More about CBQ and its alternatives shortly. ----------------------------------------------------------------------------- 9.5.1. Flow within classful qdiscs & classes When traffic enters a classful qdisc, it needs to be sent to any of the classes within - it needs to be 'classified'. To determine what to do with a packet, the so called 'filters' are consulted. It is important to know that the filters are called from within a qdisc, and not the other way around! The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains. Besides containing other qdiscs, most classful qdiscs also perform shaping. This is useful to perform both packet scheduling (with SFQ, for example) and rate control. You need this in cases where you have a high speed interface (for example, ethernet) to a slower device (a cable modem). If you were only to run SFQ, nothing would happen, as packets enter & leave your router without delay: the output interface is far faster than your actual link speed. There is no queue to schedule then. ----------------------------------------------------------------------------- 9.5.2. The qdisc family: roots, handles, siblings and parents Each interface has one egress 'root qdisc', by default the earlier mentioned classless pfifo_fast queueing discipline. Each qdisc can be assigned a handle, which can be used by later configuration statements to refer to that qdisc. Besides an egress qdisc, an interface may also have an ingress, which polices traffic coming in. The handles of these qdiscs consist of two parts, a major number and a minor number. It is habitual to name the root qdisc '1:', which is equal to '1:0'. The minor number of a qdisc is always 0. Classes need to have the same major number as their parent. ----------------------------------------------------------------------------- 9.5.2.1. How filters are used to classify traffic Recapping, a typical hierarchy might look like this: +---------------------------------------------------------------------------+ | root 1: | | | | | _1:1_ | | / | \ | | / | \ | | / | \ | | 10: 11: 12: | | / \ / \ | | 10:1 10:2 12:1 12:2 | +---------------------------------------------------------------------------+ But don't let this tree fool you! You should *not* imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to. A packet might get classified in a chain like this: 1: -> 1:1 -> 12: -> 12:2 The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each choosing a branch to take next. This can make sense. However, this is also possible: 1: -> 12:2 In this case, a filter attached to the root decided to send the packet directly to 12:2. ----------------------------------------------------------------------------- 9.5.2.2. How packets are dequeued to the hardware When the kernel decides that it needs to extract packets to send to the interface, the root qdisc 1: gets a dequeue request, which is passed to 1:1, which is in turn passed to 10:, 11: and 12:, which each query their siblings, and try to dequeue() from them. In this case, the kernel needs to walk the entire tree, because only 12:2 contains a packet. In short, nested classes ONLY talk to their parent qdiscs, never to an interface. Only the root qdisc gets dequeued by the kernel! The upshot of this is that classes never get dequeued faster than their parents allow. And this is exactly what we want: this way we can have SFQ in an inner class, which doesn't do any shaping, only scheduling, and have a shaping outer qdisc, which does the shaping. ----------------------------------------------------------------------------- 9.5.3. The PRIO qdisc The PRIO qdisc doesn't actually shape, it only subdivides traffic based on how you configured your filters. You can consider the PRIO qdisc a kind of pfifo_fast on steroids, whereby each band is a separate class instead of a simple FIFO. When a packet is enqueued to the PRIO qdisc, a class is chosen based on the filter commands you gave. By default, three classes are created. These classes by default contain pure FIFO qdiscs with no internal structure, but you can replace these by any qdisc you have available. Whenever a packet needs to be dequeued, class :1 is tried first. Higher classes are only used if lower bands all did not give up a packet. This qdisc is very useful in case you want to prioritize certain kinds of traffic without using only TOS-flags but using all the power of the tc filters. It can also contain more all qdiscs, whereas pfifo_fast is limited to simple fifo qdiscs. Because it doesn't actually shape, the same warning as for SFQ holds: either use it only if your physical link is really full or wrap it inside a classful qdisc that does shape. The last holds for almost all cable modems and DSL devices. In formal words, the PRIO qdisc is a Work-Conserving scheduler. ----------------------------------------------------------------------------- 9.5.3.1. PRIO parameters & usage The following parameters are recognized by tc: bands Number of bands to create. Each band is in fact a class. If you change this number, you must also change: priomap If you do not provide tc filters to classify traffic, the PRIO qdisc looks at the TC_PRIO priority to decide how to enqueue traffic. This works just like with the pfifo_fast qdisc mentioned earlier, see there for lots of detail. The bands are classes, and are called major:1 to major:3 by default, so if your PRIO qdisc is called 12:, tc filter traffic to 12:1 to grant it more priority. Reiterating, band 0 goes to minor number 1! Band 1 to minor number 2, etc. ----------------------------------------------------------------------------- 9.5.3.2. Sample configuration We will create this tree: +---------------------------------------------------------------------------+ | root 1: prio | | / | \ | | 1:1 1:2 1:3 | | | | | | | 10: 20: 30: | | sfq tbf sfq | |band 0 1 2 | +---------------------------------------------------------------------------+ Bulk traffic will go to 30:, interactive traffic to 20: or 10:. Command lines: +-------------------------------------------------------------------------------------+ |# tc qdisc add dev eth0 root handle 1: prio | |## This *instantly* creates classes 1:1, 1:2, 1:3 | | | |# tc qdisc add dev eth0 parent 1:1 handle 10: sfq | |# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000 | |# tc qdisc add dev eth0 parent 1:3 handle 30: sfq | +-------------------------------------------------------------------------------------+ Now let's see what we created: +---------------------------------------------------------------------------+ |# tc -s qdisc ls dev eth0 | |qdisc sfq 30: quantum 1514b | | Sent 0 bytes 0 pkts (dropped 0, overlimits 0) | | | | qdisc tbf 20: rate 20Kbit burst 1599b lat 667.6ms | | Sent 0 bytes 0 pkts (dropped 0, overlimits 0) | | | | qdisc sfq 10: quantum 1514b | | Sent 132 bytes 2 pkts (dropped 0, overlimits 0) | | | | qdisc prio 1: bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 | | Sent 174 bytes 3 pkts (dropped 0, overlimits 0) | +---------------------------------------------------------------------------+ As you can see, band 0 has already had some traffic, and one packet was sent while running this command! We now do some bulk data transfer with a tool that properly sets TOS flags, and take another look: +--------------------------------------------------------------------------------+ |# scp tc ahu@10.0.0.11:./ | |ahu@10.0.0.11's password: | |tc 100% |*****************************| 353 KB 00:00 | |# tc -s qdisc ls dev eth0 | |qdisc sfq 30: quantum 1514b | | Sent 384228 bytes 274 pkts (dropped 0, overlimits 0) | | | | qdisc tbf 20: rate 20Kbit burst 1599b lat 667.6ms | | Sent 2640 bytes 20 pkts (dropped 0, overlimits 0) | | | | qdisc sfq 10: quantum 1514b | | Sent 2230 bytes 31 pkts (dropped 0, overlimits 0) | | | | qdisc prio 1: bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 | | Sent 389140 bytes 326 pkts (dropped 0, overlimits 0) | +--------------------------------------------------------------------------------+ As you can see, all traffic went to handle 30:, which is the lowest priority band, just as intended. Now to verify that interactive traffic goes to higher bands, we create some interactive traffic: +---------------------------------------------------------------------------+ |# tc -s qdisc ls dev eth0 | |qdisc sfq 30: quantum 1514b | | Sent 384228 bytes 274 pkts (dropped 0, overlimits 0) | | | | qdisc tbf 20: rate 20Kbit burst 1599b lat 667.6ms | | Sent 2640 bytes 20 pkts (dropped 0, overlimits 0) | | | | qdisc sfq 10: quantum 1514b | | Sent 14926 bytes 193 pkts (dropped 0, overlimits 0) | | | | qdisc prio 1: bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 | | Sent 401836 bytes 488 pkts (dropped 0, overlimits 0) | +---------------------------------------------------------------------------+ It worked - all additional traffic has gone to 10:, which is our highest priority qdisc. No traffic was sent to the lowest priority, which previously received our entire scp. ----------------------------------------------------------------------------- 9.5.4. The famous CBQ qdisc As said before, CBQ is the most complex qdisc available, the most hyped, the least understood, and probably the trickiest one to get right. This is not because the authors are evil or incompetent, far from it, it's just that the CBQ algorithm isn't all that precise and doesn't really match the way Linux works. Besides being classful, CBQ is also a shaper and it is in that aspect that it really doesn't work very well. It should work like this. If you try to shape a 10mbit/s connection to 1mbit/s, the link should be idle 90% of the time. If it isn't, we need to throttle so that it IS idle 90% of the time. This is pretty hard to measure, so CBQ instead derives the idle time from the number of microseconds that elapse between requests from the hardware layer for more data. Combined, this can be used to approximate how full or empty the link is. This is rather circumspect and doesn't always arrive at proper results. For example, what if the actual link speed of an interface that is not really able to transmit the full 100mbit/s of data, perhaps because of a badly implemented driver? A PCMCIA network card will also never achieve 100mbit/s because of the way the bus is designed - again, how do we calculate the idle time? It gets even worse if we consider not-quite-real network devices like PPP over Ethernet or PPTP over TCP/IP. The effective bandwidth in that case is probably determined by the efficiency of pipes to userspace - which is huge. People who have done measurements discover that CBQ is not always very accurate and sometimes completely misses the mark. In many circumstances however it works well. With the documentation provided here, you should be able to configure it to work well in most cases. ----------------------------------------------------------------------------- 9.5.4.1. CBQ shaping in detail As said before, CBQ works by making sure that the link is idle just long enough to bring down the real bandwidth to the configured rate. To do so, it calculates the time that should pass between average packets. During operations, the effective idletime is measured using an exponential weighted moving average (EWMA), which considers recent packets to be exponentially more important than past ones. The UNIX loadaverage is calculated in the same way. The calculated idle time is subtracted from the EWMA measured one, the resulting number is called 'avgidle'. A perfectly loaded link has an avgidle of zero: packets arrive exactly once every calculated interval. An overloaded link has a negative avgidle and if it gets too negative, CBQ shuts down for a while and is then 'overlimit'. Conversely, an idle link might amass a huge avgidle, which would then allow infinite bandwidths after a few hours of silence. To prevent this, avgidle is capped at maxidle. If overlimit, in theory, the CBQ could throttle itself for exactly the amount of time that was calculated to pass between packets, and then pass one packet, and throttle again. But see the 'minburst' parameter below. These are parameters you can specify in order to configure shaping: avpkt Average size of a packet, measured in bytes. Needed for calculating maxidle, which is derived from maxburst, which is specified in packets. bandwidth The physical bandwidth of your device, needed for idle time calculations. cell The time a packet takes to be transmitted over a device may grow in steps, based on the packet size. An 800 and an 806 size packet may take just as long to send, for example - this sets the granularity. Most often set to '8'. Must be an integral power of two. maxburst This number of packets is used to calculate maxidle so that when avgidle is at maxidle, this number of average packets can be burst before avgidle drops to 0. Set it higher to be more tolerant of bursts. You can't set maxidle directly, only via this parameter. minburst As mentioned before, CBQ needs to throttle in case of overlimit. The ideal solution is to do so for exactly the calculated idle time, and pass 1 packet. However, Unix kernels generally have a hard time scheduling events shorter than 10ms, so it is better to throttle for a longer period, and then pass minburst packets in one go, and then sleep minburst times longer. The time to wait is called the offtime. Higher values of minburst lead to more accurate shaping in the long term, but to bigger bursts at millisecond timescales. minidle If avgidle is below 0, we are overlimits and need to wait until avgidle will be big enough to send one packet. To prevent a sudden burst from shutting down the link for a prolonged period of time, avgidle is reset to minidle if it gets too low. Minidle is specified in negative microseconds, so 10 means that avgidle is capped at -10us. mpu Minimum packet size - needed because even a zero size packet is padded to 64 bytes on ethernet, and so takes a certain time to transmit. CBQ needs to know this to accurately calculate the idle time. rate Desired rate of traffic leaving this qdisc - this is the 'speed knob'! Internally, CBQ has a lot of fine tuning. For example, classes which are known not to have data enqueued to them aren't queried. Overlimit classes are penalized by lowering their effective priority. All very smart & complicated. ----------------------------------------------------------------------------- 9.5.4.2. CBQ classful behaviour Besides shaping, using the aforementioned idletime approximations, CBQ also acts like the PRIO queue in the sense that classes can have differing priorities and that lower priority numbers will be polled before the higher priority ones. Each time a packet is requested by the hardware layer to be sent out to the network, a weighted round robin process ('WRR') starts, beginning with the lower priority classes. These are then grouped and queried if they have data available. If so, it is returned. After a class has been allowed to dequeue a number of bytes, the next class within that priority is tried. The following parameters control the WRR process: allot When the outer CBQ is asked for a packet to send out on the interface, it will try all inner qdiscs (in the classes) in turn, in order of the 'priority' parameter. Each time a class gets its turn, it can only send out a limited amount of data. 'Allot' is the base unit of this amount. See the 'weight' parameter for more information. prio The CBQ can also act like the PRIO device. Inner classes with lower priority are tried first and as long as they have traffic, other classes are not polled for traffic. weight Weight helps in the Weighted Round Robin process. Each class gets a chance to send in turn. If you have classes with significantly more bandwidth than other classes, it makes sense to allow them to send more data in one round than the others. A CBQ adds up all weights under a class, and normalizes them, so you can use arbitrary numbers: only the ratios are important. People have been using 'rate/10' as a rule of thumb and it appears to work well. The renormalized weight is multiplied by the 'allot' parameter to determine how much data can be sent in one round. Please note that all classes within an CBQ hierarchy need to share the same major number! ----------------------------------------------------------------------------- 9.5.4.3. CBQ parameters that determine link sharing & borrowing Besides purely limiting certain kinds of traffic, it is also possible to specify which classes can borrow capacity from other classes or, conversely, lend out bandwidth. Isolated/sharing A class that is configured with 'isolated' will not lend out bandwidth to sibling classes. Use this if you have competing or mutually-unfriendly agencies on your link who do want to give each other freebies. The control program tc also knows about 'sharing', which is the reverse of 'isolated'. bounded/borrow A class can also be 'bounded', which means that it will not try to borrow bandwidth from sibling classes. tc also knows about 'borrow', which is the reverse of 'bounded'. A typical situation might be where you have two agencies on your link which are both 'isolated' and 'bounded', which means that they are really limited to their assigned rate, and also won't allow each other to borrow. Within such an agency class, there might be other classes which are allowed to swap bandwidth. ----------------------------------------------------------------------------- 9.5.4.4. Sample configuration This configuration limits webserver traffic to 5mbit and SMTP traffic to 3 mbit. Together, they may not get more than 6mbit. We have a 100mbit NIC and the classes may borrow bandwidth from each other. +---------------------------------------------------------------------------+ |# tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \ | | avpkt 1000 cell 8 | |# tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \ | | rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \ | | avpkt 1000 bounded | +---------------------------------------------------------------------------+ This part installs the root and the customary 1:0 class. The 1:1 class is bounded, so the total bandwidth can't exceed 6mbit. As said before, CBQ requires a *lot* of knobs. All parameters are explained above, however. The corresponding HTB configuration is lots simpler. +---------------------------------------------------------------------------+ |# tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \ | | rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \ | | avpkt 1000 | |# tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \ | | rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \ | | avpkt 1000 | +---------------------------------------------------------------------------+ These are our two classes. Note how we scale the weight with the configured rate. Both classes are not bounded, but they are connected to class 1:1 which is bounded. So the sum of bandwith of the 2 classes will never be more than 6mbit. The classids need to be within the same major number as the parent CBQ, by the way! +---------------------------------------------------------------------------+ |# tc qdisc add dev eth0 parent 1:3 handle 30: sfq | |# tc qdisc add dev eth0 parent 1:4 handle 40: sfq | +---------------------------------------------------------------------------+ Both classes have a FIFO qdisc by default. But we replaced these with an SFQ queue so each flow of data is treated equally. +---------------------------------------------------------------------------+ |# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ | | sport 80 0xffff flowid 1:3 | |# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ | | sport 25 0xffff flowid 1:4 | +---------------------------------------------------------------------------+ These commands, attached directly to the root, send traffic to the right qdiscs. Note that we use 'tc class add' to CREATE classes within a qdisc, but that we use 'tc qdisc add' to actually add qdiscs to these classes. You may wonder what happens to traffic that is not classified by any of the two rules. It appears that in this case, data will then be processed within 1:0, and be unlimited. If SMTP+web together try to exceed the set limit of 6mbit/s, bandwidth will be divided according to the weight parameter, giving 5/8 of traffic to the webserver and 3/8 to the mail server. With this configuration you can also say that webserver traffic will always get at minimum 5/8 * 6 mbit = 3.75 mbit. ----------------------------------------------------------------------------- 9.5.4.5. Other CBQ parameters: split & defmap As said before, a classful qdisc needs to call filters to determine which class a packet will be enqueued to. Besides calling the filter, CBQ offers other options, defmap & split. This is pretty complicated to understand, and it is not vital. But as this is the only known place where defmap & split are properly explained, I'm doing my best. As you will often want to filter on the Type of Service field only, a special syntax is provided. Whenever the CBQ needs to figure out where a packet needs to be enqueued, it checks if this node is a 'split node'. If so, one of the sub-qdiscs has indicated that it wishes to receive all packets with a certain configured priority, as might be derived from the TOS field, or socket options set by applications. The packets' priority bits are or-ed with the defmap field to see if a match exists. In other words, this is a short-hand way of creating a very fast filter, which only matches certain priorities. A defmap of ff (hex) will match everything, a map of 0 nothing. A sample configuration may help make things clearer: +---------------------------------------------------------------------------+ |# tc qdisc add dev eth1 root handle 1: cbq bandwidth 10Mbit allot 1514 \ | | cell 8 avpkt 1000 mpu 64 | | | |# tc class add dev eth1 parent 1:0 classid 1:1 cbq bandwidth 10Mbit \ | | rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 \ | | avpkt 1000 | +---------------------------------------------------------------------------+ Standard CBQ preamble. I never get used to the sheer amount of numbers required! Defmap refers to TC_PRIO bits, which are defined as follows: +---------------------------------------------------------------------------+ |TC_PRIO.. Num Corresponds to TOS | |------------------------------------------------- | |BESTEFFORT 0 Maximize Reliablity | |FILLER 1 Minimize Cost | |BULK 2 Maximize Throughput (0x8) | |INTERACTIVE_BULK 4 | |INTERACTIVE 6 Minimize Delay (0x10) | |CONTROL 7 | +---------------------------------------------------------------------------+ The TC_PRIO.. number corresponds to bits, counted from the right. See the pfifo_fast section for more details how TOS bits are converted to priorities. Now the interactive and the bulk classes: +---------------------------------------------------------------------------+ |# tc class add dev eth1 parent 1:1 classid 1:2 cbq bandwidth 10Mbit \ | | rate 1Mbit allot 1514 cell 8 weight 100Kbit prio 3 maxburst 20 \ | | avpkt 1000 split 1:0 defmap c0 | | | |# tc class add dev eth1 parent 1:1 classid 1:3 cbq bandwidth 10Mbit \ | | rate 8Mbit allot 1514 cell 8 weight 800Kbit prio 7 maxburst 20 \ | | avpkt 1000 split 1:0 defmap 3f | +---------------------------------------------------------------------------+ The 'split qdisc' is 1:0, which is where the choice will be made. C0 is binary for 11000000, 3F for 00111111, so these two together will match everything. The first class matches bits 7 & 6, and thus corresponds to 'interactive' and 'control' traffic. The second class matches the rest. Node 1:0 now has a table like this: +---------------------------------------------------------------------------+ |priority send to | |0 1:3 | |1 1:3 | |2 1:3 | |3 1:3 | |4 1:3 | |5 1:3 | |6 1:2 | |7 1:2 | +---------------------------------------------------------------------------+ For additional fun, you can also pass a 'change mask', which indicates exactly which priorities you wish to change. You only need to use this if you are running 'tc class change'. For example, to add best effort traffic to 1: 2, we could run this: +---------------------------------------------------------------------------+ |# tc class change dev eth1 classid 1:2 cbq defmap 01/01 | +---------------------------------------------------------------------------+ The priority map over at 1:0 now looks like this: +---------------------------------------------------------------------------+ |priority send to | |0 1:2 | |1 1:3 | |2 1:3 | |3 1:3 | |4 1:3 | |5 1:3 | |6 1:2 | |7 1:2 | +---------------------------------------------------------------------------+ FIXME: did not test 'tc class change', only looked at the source. ----------------------------------------------------------------------------- 9.5.5. Hierarchical Token Bucket Martin Devera () rightly realised that CBQ is complex and does not seem optimized for many typical situations. His Hierarchical approach is well suited for setups where you have a fixed amount of bandwidth which you want to divide for different purposes, giving each purpose a guaranteed bandwidth, with the possibility of specifying how much bandwidth can be borrowed. HTB works just like CBQ but does not resort to idle time calculations to shape. Instead, it is a classful Token Bucket Filter - hence the name. It has only a few parameters, which are well documented on his [http://luxik.cdi.cz/ ~devik/qos/htb/] site. As your HTB configuration gets more complex, your configuration scales well. With CBQ it is already complex even in simple cases! HTB is not yet a part of the standard kernel, but it should soon be! If you are in a position to patch your kernel, by all means consider HTB. ----------------------------------------------------------------------------- 9.5.5.1. Sample configuration Functionally almost identical to the CBQ sample configuration above: +------------------------------------------------------------------------------------+ |# tc qdisc add dev eth0 root handle 1: htb default 30 | | | |# tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k | | | |# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k | |# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k | |# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k | +------------------------------------------------------------------------------------+ The author then recommends SFQ for beneath these classes: +---------------------------------------------------------------------------+ |# tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10 | |# tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10 | |# tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10 | +---------------------------------------------------------------------------+ Add the filters which direct traffic to the right classes: +---------------------------------------------------------------------------+ |# U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32" | |# $U32 match ip dport 80 0xffff flowid 1:10 | |# $U32 match ip sport 25 0xffff flowid 1:20 | +---------------------------------------------------------------------------+ And that's it - no unsightly unexplained numbers, no undocumented parameters. HTB certainly looks wonderful - if 10: and 20: both have their guaranteed bandwidth, and more is left to divide, they borrow in a 5:3 ratio, just as you would expect. Unclassified traffic gets routed to 30:, which has little bandwidth of its own but can borrow everything that is left over. Because we chose SFQ internally, we get fairness thrown in for free! ----------------------------------------------------------------------------- 9.6. Classifying packets with filters To determine which class shall process a packet, the so-called 'classifier chain' is called each time a choice needs to be made. This chain consists of all filters attached to the classful qdisc that needs to decide. To reiterate the tree, which is not a tree: +---------------------------------------------------------------------------+ | root 1: | | | | | _1:1_ | | / | \ | | / | \ | | / | \ | | 10: 11: 12: | | / \ / \ | | 10:1 10:2 12:1 12:2 | +---------------------------------------------------------------------------+ When enqueueing a packet, at each branch the filter chain is consulted for a relevant instruction. A typical setup might be to have a filter in 1:1 that directs a packet to 12: and a filter on 12: that sends the packet to 12:2. You might also attach this latter rule to 1:1, but you can make efficiency gains by having more specific tests lower in the chain. You can't filter a packet 'upwards', by the way. Also, with HTB, you should attach all filters to the root! And again - packets are only enqueued downwards! When they are dequeued, they go up again, where the interface lives. They do NOT fall off the end of the tree to the network adaptor! ----------------------------------------------------------------------------- 9.6.1. Some simple filtering examples As explained in the Classifier chapter, you can match on literally anything, using a very complicated syntax. To start, we will show how to do the obvious things, which luckily are quite easy. Let's say we have a PRIO qdisc called '10:' which contains three classes, and we want to assign all traffic from and to port 22 to the highest priority band, the filters would be: +---------------------------------------------------------------------------+ |# tc filter add dev eth0 protocol ip parent 10: prio 1 u32 match \ | | ip dport 22 0xffff flowid 10:1 | |# tc filter add dev eth0 protocol ip parent 10: prio 1 u32 match \ | | ip sport 80 0xffff flowid 10:1 | |# tc filter add dev eth0 protocol ip parent 10: prio 2 flowid 10:2 | +---------------------------------------------------------------------------+ What does this say? It says: attach to eth0, node 10: a priority 1 u32 filter that matches on IP destination port 22 *exactly* and send it to band 10:1. And it then repeats the same for source port 80. The last command says that anything unmatched so far should go to band 10:2, the next-highest priority. You need to add 'eth0', or whatever your interface is called, because each interface has a unique namespace of handles. To select on an IP address, use this: +---------------------------------------------------------------------------+ |# tc filter add dev eth0 parent 10:0 protocol ip prio 1 u32 \ | | match ip dst 4.3.2.1/32 flowid 10:1 | |# tc filter add dev eth0 parent 10:0 protocol ip prio 1 u32 \ | | match ip src 1.2.3.4/32 flowid 10:1 | |# tc filter add dev eth0 protocol ip parent 10: prio 2 \ | | flowid 10:2 | +---------------------------------------------------------------------------+ This assigns traffic to 4.3.2.1 and traffic from 1.2.3.4 to the highest priority queue, and the rest to the next-highest one. You can concatenate matches, to match on traffic from 1.2.3.4 and from port 80, do this: +------------------------------------------------------------------------------------+ |# tc filter add dev eth0 parent 10:0 protocol ip prio 1 u32 match ip src 4.3.2.1/32 | | match ip sport 80 0xffff flowid 10:1 | +------------------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 9.6.2. All the filtering commands you will normally need Most shaping commands presented here start with this preamble: +---------------------------------------------------------------------------+ |# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 .. | +---------------------------------------------------------------------------+ These are the so called 'u32' matches, which can match on ANY part of a packet. On source/destination address Source mask 'match ip src 1.2.3.0/24', destination mask 'match ip dst 4.3.2.0/24'. To match a single host, use /32, or omit the mask. On source/destination port, all IP protocols Source: 'match ip sport 80 0xffff', 'match ip dport 0xffff' On ip protocol (tcp, udp, icmp, gre, ipsec) Use the numbers from /etc/protocols, for example, icmp is 1: 'match ip protocol 1 0xff'. On fwmark You can mark packets with either ipchains and have that mark survive routing across interfaces. This is really useful to for example only shape traffic on eth1 that came in on eth0. Syntax: # tc filter add dev eth1 protocol ip parent 1:0 prio 1 handle 6 fw flowid 1:1 Note that this is not a u32 match! You can place a mark like this: +---------------------------------------------------------------+ |# iptables -A PREROUTING -t mangle -i eth0 -j MARK --set-mark 6| +---------------------------------------------------------------+ The number 6 is arbitrary. If you don't want to understand the full tc filter syntax, just use iptables, and only learn to select on fwmark. On the TOS field To select interactive, minimum delay traffic: +---------------------------------------------------------------+ |# tc filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \ | | match ip tos 0x10 0xff \ | | flowid 1:4 | +---------------------------------------------------------------+ Use 0x08 0xff for bulk traffic. For more filtering commands, see the Advanced Filters chapter. ----------------------------------------------------------------------------- 9.7. The Intermediate queueing device (IMQ) The Intermediate queueing device is not a qdisc but its usage is tightly bound to qdiscs. Within linux, qdiscs are attached to network devices and everything that is queued to the device is first queued to the qdisc. From this concept, two limitations arise: 1. Only egress shaping is possible (an ingress qdisc exists, but its possibilities are very limited compared to classful qdiscs). 2. A qdisc can only see traffic of one interface, global limitations can't be placed. IMQ is there to help solve those two limitations. In short, you can put everything you choose in a qdisc. Specially marked packets get intercepted in netfilter NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING hooks and pass through the qdisc attached to an imq device. An iptables target is used for marking the packets. This enables you to do ingress shaping as you can just mark packets coming in from somewhere and/or treat interfaces as classes to set global limits. You can also do lots of other stuff like just putting your http traffic in a qdisc, put new connection requests in a qdisc, ... ----------------------------------------------------------------------------- 9.7.1. Sample configuration The first thing that might come to mind is use ingress shaping to give yourself a high guaranteed bandwidth. ;) Configuration is just like with any other interface: +---------------------------------------------------------------------------+ |tc qdisc add dev imq0 root handle 1: htb default 20 | | | |tc class add dev imq0 parent 1: classid 1:1 htb rate 2mbit burst 15k | | | |tc class add dev imq0 parent 1:1 classid 1:10 htb rate 1mbit | |tc class add dev imq0 parent 1:1 classid 1:20 htb rate 1mbit | | | |tc qdisc add dev imq0 parent 1:10 handle 10: pfifo | |tc qdisc add dev imq0 parent 1:20 handle 20: sfq | | | |tc filter add dev imq0 parent 10:0 protocol ip prio 1 u32 match \ | | ip dst 10.0.0.230/32 flowid 1:10 | +---------------------------------------------------------------------------+ In this example u32 is used for classification. Other classifiers should work as expected. Next traffic has to be selected and marked to be enqueued to imq0. +---------------------------------------------------------------------------+ |iptables -t mangle -A PREROUTING -i eth0 -j IMQ --todev 0 | | | |ip link set imq0 up | +---------------------------------------------------------------------------+ The IMQ iptables targets is valid in the PREROUTING and POSTROUTING chains of the mangle table. It's syntax is +---------------------------------------------------------------------------+ |IMQ [ --todev n ] n : number of imq device | +---------------------------------------------------------------------------+ An ip6tables target is also provided. Please note traffic is not enqueued when the target is hit but afterwards. The exact location where traffic enters the imq device depends on the direction of the traffic (in/out). These are the predefined netfilter hooks used by iptables: +---------------------------------------------------------------------------+ |enum nf_ip_hook_priorities { | | NF_IP_PRI_FIRST = INT_MIN, | | NF_IP_PRI_CONNTRACK = -200, | | NF_IP_PRI_MANGLE = -150, | | NF_IP_PRI_NAT_DST = -100, | | NF_IP_PRI_FILTER = 0, | | NF_IP_PRI_NAT_SRC = 100, | | NF_IP_PRI_LAST = INT_MAX, | |}; | +---------------------------------------------------------------------------+ For ingress traffic, imq registers itself with NF_IP_PRI_MANGLE + 1 priority which means packets enter the imq device directly after the mangle PREROUTING chain has been passed. For egress imq uses NF_IP_PRI_LAST which honours the fact that packets dropped by the filter table won't occupy bandwidth. The patches and some more information can be found at the [http:// luxik.cdi.cz/~patrick/imq/] imq site. ----------------------------------------------------------------------------- Chapter 10. Load sharing over multiple interfaces There are several ways of doing this. One of the easiest and straightforward ways is 'TEQL' - "True" (or "trivial") link equalizer. Like most things having to do with queueing, load sharing goes both ways. Both ends of a link may need to participate for full effect. Imagine this situation: +---------------------------------------------------------------------------+ | +-------+ eth1 +-------+ | | | |==========| | | | 'network 1' ----| A | | B |---- 'network 2' | | | |==========| | | | +-------+ eth2 +-------+ | +---------------------------------------------------------------------------+ A and B are routers, and for the moment we'll assume both run Linux. If traffic is going from network 1 to network 2, router A needs to distribute the packets over both links to B. Router B needs to be configured to accept this. Same goes the other way around, when packets go from network 2 to network 1, router B needs to send the packets over both eth1 and eth2. The distributing part is done by a 'TEQL' device, like this (it couldn't be easier): +---------------------------------------------------------------------------+ |# tc qdisc add dev eth1 root teql0 | |# tc qdisc add dev eth2 root teql0 | |# ip link set dev teql0 up | +---------------------------------------------------------------------------+ Don't forget the 'ip link set up' command! This needs to be done on both hosts. The device teql0 is basically a roundrobbin distributor over eth1 and eth2, for sending packets. No data ever comes in over an teql device, that just appears on the 'raw' eth1 and eth2. But now we just have devices, we also need proper routing. One way to do this is to assign a /31 network to both links, and a /31 to the teql0 device as well: FIXME: does this need something like 'nobroadcast'? A /31 is too small to house a network address and a broadcast address - if this doesn't work as planned, try a /30, and adjust the ip addresses accordingly. You might even try to make eth1 and eth2 do without an IP address! On router A: +---------------------------------------------------------------------------+ |# ip addr add dev eth1 10.0.0.0/31 | |# ip addr add dev eth2 10.0.0.2/31 | |# ip addr add dev teql0 10.0.0.4/31 | +---------------------------------------------------------------------------+ On router B: +---------------------------------------------------------------------------+ |# ip addr add dev eth1 10.0.0.1/31 | |# ip addr add dev eth2 10.0.0.3/31 | |# ip addr add dev teql0 10.0.0.5/31 | +---------------------------------------------------------------------------+ Router A should now be able to ping 10.0.0.1, 10.0.0.3 and 10.0.0.5 over the 2 real links and the 1 equalized device. Router B should be able to ping 10.0.0.0, 10.0.0.2 and 10.0.0.4 over the links. If this works, Router A should make 10.0.0.5 its route for reaching network 2, and Router B should make 10.0.0.4 its route for reaching network 1. For the special case where network 1 is your network at home, and network 2 is the Internet, Router A should make 10.0.0.5 its default gateway. ----------------------------------------------------------------------------- 10.1. Caveats Nothing is as easy as it seems. eth1 and eth2 on both router A and B need to have return path filtering turned off, because they will otherwise drop packets destined for ip addresses other than their own: +---------------------------------------------------------------------------+ |# echo 0 > /proc/net/ipv4/conf/eth1/rp_filter | |# echo 0 > /proc/net/ipv4/conf/eth2/rp_filter | +---------------------------------------------------------------------------+ Then there is the nasty problem of packet reordering. Let's say 6 packets need to be sent from A to B - eth1 might get 1, 3 and 5. eth2 would then do 2, 4 and 6. In an ideal world, router B would receive this in order, 1, 2, 3, 4, 5, 6. But the possibility is very real that the kernel gets it like this: 2, 1, 4, 3, 6, 5. The problem is that this confuses TCP/IP. While not a problem for links carrying many different TCP/IP sessions, you won't be able to to a bundle multiple links and get to ftp a single file lots faster, except when your receiving or sending OS is Linux, which is not easily shaken by some simple reordering. However, for lots of applications, link load balancing is a great idea. ----------------------------------------------------------------------------- 10.2. Other possibilities William Stearns has used an advanced tunneling setup to achieve good use of multiple, unrelated, internet connections together. It can be found on [http: //www.stearns.org/tunnel/] his tunneling page. The HOWTO may feature more about this in the future. ----------------------------------------------------------------------------- Chapter 11. Netfilter & iproute - marking packets So far we've seen how iproute works, and netfilter was mentioned a few times. This would be a good time to browse through [http://netfilter.samba.org/ unreliable-guides/] Rusty's Remarkably Unreliable Guides. Netfilter itself can be found [http://netfilter.filewatcher.org/] here. Netfilter allows us to filter packets, or mangle their headers. One special feature is that we can mark a packet with a number. This is done with the --set-mark facility. As an example, this command marks all packets destined for port 25, outgoing mail: +---------------------------------------------------------------------------+ |# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 \ | | -j MARK --set-mark 1 | +---------------------------------------------------------------------------+ Let's say that we have multiple connections, one that is fast (and expensive, per megabyte) and one that is slower, but flat fee. We would most certainly like outgoing mail to go via the cheap route. We've already marked the packets with a '1', we now instruct the routing policy database to act on this: +---------------------------------------------------------------------------+ |# echo 201 mail.out >> /etc/iproute2/rt_tables | |# ip rule add fwmark 1 table mail.out | |# ip rule ls | |0: from all lookup local | |32764: from all fwmark 1 lookup mail.out | |32766: from all lookup main | |32767: from all lookup default | +---------------------------------------------------------------------------+ Now we generate the mail.out table with a route to the slow but cheap link: +---------------------------------------------------------------------------+ |# /sbin/ip route add default via 195.96.98.253 dev ppp0 table mail.out | +---------------------------------------------------------------------------+ And we are done. Should we want to make exceptions, there are lots of ways to achieve this. We can modify the netfilter statement to exclude certain hosts, or we can insert a rule with a lower priority that points to the main table for our excepted hosts. We can also use this feature to honour TOS bits by marking packets with a different type of service with different numbers, and creating rules to act on that. This way you can even dedicate, say, an ISDN line to interactive sessions. Needless to say, this also works fine on a host that's doing NAT ('masquerading'). IMPORTANT: We received a report that MASQ and SNAT at least collide with marking packets. Rusty Russell explains it in [http://lists.samba.org/ pipermail/netfilter/2000-November/006089.html] this posting. Turn off the reverse path filter to make it work properly. Note: to mark packets, you need to have some options enabled in your kernel: +----------------------------------------------------------------------------+ |IP: advanced router (CONFIG_IP_ADVANCED_ROUTER) [Y/n/?] | |IP: policy routing (CONFIG_IP_MULTIPLE_TABLES) [Y/n/?] | |IP: use netfilter MARK value as routing key (CONFIG_IP_ROUTE_FWMARK) [Y/n/?]| +----------------------------------------------------------------------------+ See also the Section 15.5 in the Cookbook. ----------------------------------------------------------------------------- Chapter 12. Advanced filters for (re-)classifying packets As explained in the section on classful queueing disciplines, filters are needed to classify packets into any of the sub-queues. These filters are called from within the classful qdisc. Here is an incomplete list of classifiers available: fw Bases the decision on how the firewall has marked the packet. This can be the easy way out if you don't want to learn tc filter syntax. See the Queueing chapter for details. u32 Bases the decision on fields within the packet (i.e. source IP address, etc) route Bases the decision on which route the packet will be routed by rsvp, rsvp6 Routes packets based on [http://www.isi.edu/div7/rsvp/overview.html] RSVP . Only useful on networks you control - the Internet does not respect RSVP. tcindex Used in the DSMARK qdisc, see the relevant section. Note that in general there are many ways in which you can classify packet and that it generally comes down to preference as to which system you wish to use. Classifiers in general accept a few arguments in common. They are listed here for convenience: protocol The protocol this classifier will accept. Generally you will only be accepting only IP traffic. Required. parent The handle this classifier is to be attached to. This handle must be an already existing class. Required. prio The priority of this classifier. Lower numbers get tested first. handle This handle means different things to different filters. All the following sections will assume you are trying to shape the traffic going to HostA. They will assume that the root class has been configured on 1: and that the class you want to send the selected traffic to is 1:1. ----------------------------------------------------------------------------- 12.1. The u32 classifier The U32 filter is the most advanced filter available in the current implementation. It entirely based on hashing tables, which make it robust when there are many filter rules. In its simplest form the U32 filter is a list of records, each consisting of two fields: a selector and an action. The selectors, described below, are compared with the currently processed IP packet until the first match occurs, and then the associated action is performed. The simplest type of action would be directing the packet into defined CBQ class. The command line of tc filter program, used to configure the filter, consists of three parts: filter specification, a selector and an action. The filter specification can be defined as: +---------------------------------------------------------------------------+ |tc filter add dev IF [ protocol PROTO ] | | [ (preference|priority) PRIO ] | | [ parent CBQ ] | +---------------------------------------------------------------------------+ The protocol field describes protocol that the filter will be applied to. We will only discuss case of ip protocol. The preference field (priority can be used alternatively) sets the priority of currently defined filter. This is important, since you can have several filters (lists of rules) with different priorities. Each list will be passed in the order the rules were added, then list with lower priority (higher preference number) will be processed. The parent field defines the CBQ tree top (e.g. 1:0), the filter should be attached to. The options described above apply to all filters, not only U32. ----------------------------------------------------------------------------- 12.1.1. U32 selector The U32 selector contains definition of the pattern, that will be matched to the currently processed packet. Precisely, it defines which bits are to be matched in the packet header and nothing more, but this simple method is very powerful. Let's take a look at the following examples, taken directly from a pretty complex, real-world filter: +---------------------------------------------------------------------------+ |# tc filter add dev eth0 protocol ip parent 1:0 pref 10 u32 \ | | match u32 00100000 00ff0000 at 0 flowid 1:10 | +---------------------------------------------------------------------------+ For now, leave the first line alone - all these parameters describe the filter's hash tables. Focus on the selector line, containing match keyword. This selector will match to IP headers, whose second byte will be 0x10 (0010). As you can guess, the 00ff number is the match mask, telling the filter exactly which bits to match. Here it's 0xff, so the byte will match if it's exactly 0x10. The at keyword means that the match is to be started at specified offset (in bytes) -- in this case it's beginning of the packet. Translating all that to human language, the packet will match if its Type of Service field will have `low delay' bits set. Let's analyze another rule: +---------------------------------------------------------------------------+ |# tc filter add dev eth0 protocol ip parent 1:0 pref 10 u32 \ | | match u32 00000016 0000ffff at nexthdr+0 flowid 1:10 | +---------------------------------------------------------------------------+ The nexthdr option means next header encapsulated in the IP packet, i.e. header of upper-layer protocol. The match will also start here at the beginning of the next header. The match should occur in the second, 32-bit word of the header. In TCP and UDP protocols this field contains packet's destination port. The number is given in big-endian format, i.e. older bits first, so we simply read 0x0016 as 22 decimal, which stands for SSH service if this was TCP. As you guess, this match is ambiguous without a context, and we will discuss this later. Having understood all the above, we will find the following selector quite easy to read: match c0a80100 ffffff00 at 16. What we got here is a three byte match at 17-th byte, counting from the IP header start. This will match for packets with destination address anywhere in 192.168.1/24 network. After analyzing the examples, we can summarize what we have learned. ----------------------------------------------------------------------------- 12.1.2. General selectors General selectors define the pattern, mask and offset the pattern will be matched to the packet contents. Using the general selectors you can match virtually any single bit in the IP (or upper layer) header. They are more difficult to write and read, though, than specific selectors that described below. The general selector syntax is: +---------------------------------------------------------------------------+ |match [ u32 | u16 | u8 ] PATTERN MASK [ at OFFSET | nexthdr+OFFSET] | +---------------------------------------------------------------------------+ One of the keywords u32, u16 or u8 specifies length of the pattern in bits. PATTERN and MASK should follow, of length defined by the previous keyword. The OFFSET parameter is the offset, in bytes, to start matching. If nexthdr+ keyword is given, the offset is relative to start of the upper layer header. Some examples: +---------------------------------------------------------------------------+ |# tc filter add dev ppp14 parent 1:0 prio 10 u32 \ | | match u8 64 0xff at 8 \ | | flowid 1:4 | +---------------------------------------------------------------------------+ Packet will match to this rule, if its time to live (TTL) is 64. TTL is the field starting just after 8-th byte of the IP header. +---------------------------------------------------------------------------+ |# tc filter add dev ppp14 parent 1:0 prio 10 u32 \ | | match u8 0x10 0xff at nexthdr+13 \ | | protocol tcp \ | | flowid 1:3 | +---------------------------------------------------------------------------+ FIXME: it has been pointed out that this syntax does not work currently. Use this to match ACKs on packets smaller than 64 bytes: +---------------------------------------------------------------------------+ |## match acks the hard way, | |## IP protocol 6, | |## IP header length 0x5(32 bit words), | |## IP Total length 0x34 (ACK + 12 bytes of TCP options) | |## TCP ack set (bit 5, offset 33) | |# tc filter add dev ppp14 parent 1:0 protocol ip prio 10 u32 \ | | match ip protocol 6 0xff \ | | match u8 0x05 0x0f at 0 \ | | match u16 0x0000 0xffc0 at 2 \ | | match u8 0x10 0xff at 33 \ | | flowid 1:3 | +---------------------------------------------------------------------------+ This rule will only match TCP packets with ACK bit set, and no further payload. Here we can see an example of using two selectors, the final result will be logical AND of their results. If we take a look at TCP header diagram, we can see that the ACK bit is second older bit (0x10) in the 14-th byte of the TCP header (at nexthdr+13). As for the second selector, if we'd like to make our life harder, we could write match u8 0x06 0xff at 9 instead of using the specific selector protocol tcp, because 6 is the number of TCP protocol, present in 10-th byte of the IP header. On the other hand, in this example we couldn't use any specific selector for the first match - simply because there's no specific selector to match TCP ACK bits. ----------------------------------------------------------------------------- 12.1.3. Specific selectors The following table contains a list of all specific selectors the author of this section has found in the tc program source code. They simply make your life easier and increase readability of your filter's configuration. FIXME: table placeholder - the table is in separate file ,,selector.html'' FIXME: it's also still in Polish :-( FIXME: must be sgml'ized Some examples: +---------------------------------------------------------------------------+ |# tc filter add dev ppp0 parent 1:0 prio 10 u32 \ | | match ip tos 0x10 0xff \ | | flowid 1:4 | +---------------------------------------------------------------------------+ FIXME: tcp dst match does not work as described below: The above rule will match packets which have the TOS field set to 0x10. The TOS field starts at second byte of the packet and is one byte big, so we could write an equivalent general selector: match u8 0x10 0xff at 1. This gives us hint to the internals of U32 filter -- the specific rules are always translated to general ones, and in this form they are stored in the kernel memory. This leads to another conclusion -- the tcp and udp selectors are exactly the same and this is why you can't use single match tcp dst 53 0xffff selector to match TCP packets sent to given port -- they will also match UDP packets sent to this port. You must remember to also specify the protocol and end up with the following rule: +---------------------------------------------------------------------------+ |# tc filter add dev ppp0 parent 1:0 prio 10 u32 \ | | match tcp dst 53 0xffff \ | | match ip protocol 0x6 0xff \ | | flowid 1:2 | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 12.2. The route classifier This classifier filters based on the results of the routing tables. When a packet that is traversing through the classes reaches one that is marked with the "route" filter, it splits the packets up based on information in the routing table. +---------------------------------------------------------------------------+ |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 route | +---------------------------------------------------------------------------+ Here we add a route classifier onto the parent node 1:0 with priority 100. When a packet reaches this node (which, since it is the root, will happen immediately) it will consult the routing table and if one matches will send it to the given class and give it a priority of 100. Then, to finally kick it into action, you add the appropriate routing entry: The trick here is to define 'realm' based on either destination or source. The way to do it is like this: +---------------------------------------------------------------------------+ |# ip route add Host/Network via Gateway dev Device realm RealmNumber | +---------------------------------------------------------------------------+ For instance, we can define our destination network 192.168.10.0 with a realm number 10: +---------------------------------------------------------------------------+ |# ip route add 192.168.10.0/24 via 192.168.10.1 dev eth1 realm 10 | +---------------------------------------------------------------------------+ When adding route filters, we can use realm numbers to represent the networks or hosts and specify how the routes match the filters. +---------------------------------------------------------------------------+ |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 \ | | route to 10 classid 1:10 | +---------------------------------------------------------------------------+ The above rule says packets going to the network 192.168.10.0 match class id 1:10. Route filter can also be used to match source routes. For example, there is a subnetwork attached to the Linux router on eth2. +---------------------------------------------------------------------------+ |# ip route add 192.168.2.0/24 dev eth2 realm 2 | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 \ | | route from 2 classid 1:2 | +---------------------------------------------------------------------------+ Here the filter specifies that packets from the subnetwork 192.168.2.0 (realm 2) will match class id 1:2. ----------------------------------------------------------------------------- 12.3. Policing filters To make even more complicated setups possible, you can have filters that only match up to a certain bandwidth. You can declare a filter to entirely cease matching above a certain rate, or only to not match only the bandwidth exceeding a certain rate. So if you decided to police at 4mbit/s, but 5mbit/s of traffic is present, you can stop matching either the entire 5mbit/s, or only not match 1mbit/s, and do send 4mbit/s to the configured class. If bandwidth exceeds the configured rate, you can drop a packet, reclassify it, or see if another filter will match it. ----------------------------------------------------------------------------- 12.3.1. Ways to police There are basically two ways to police. If you compiled the kernel with 'Estimators', the kernel can measure for each filter how much traffic it is passing, more or less. These estimators are very easy on the CPU, as they simply count 25 times per second how many data has been passed, and calculate the bitrate from that. The other way works again via a Token Bucket Filter, this time living within your filter. The TBF only matches traffic UP TO your configured bandwidth, if more is offered, only the excess is subject to the configured overlimit action. ----------------------------------------------------------------------------- 12.3.1.1. With the kernel estimator This is very simple and has only one parameter: avrate. Either the flow remains below avrate, and the filter classifies the traffic to the classid configured, or your rate exceeds it in which case the specified action is taken, which is 'reclassify' by default. The kernel uses an Exponential Weighted Moving Average for your bandwidth which makes it less sensitive to short bursts. ----------------------------------------------------------------------------- 12.3.1.2. With Token Bucket Filter Uses the following parameters:   * buffer/maxburst   * mtu/minburst   * mpu   * rate Which behave mostly identical to those described in the Token Bucket Filter section. Please note however that if you set the mtu of a TBF policer too low, *no* packets will pass, whereas the egress TBF qdisc will just pass them slower. Another difference is that a policer can only let a packet pass, or drop it. It cannot delay hold on to it in order to delay it. ----------------------------------------------------------------------------- 12.3.2. Overlimit actions If your filter decides that it is overlimit, it can take 'actions'. Currently, three actions are available: continue Causes this filter not to match, but perhaps other filters will. drop This is a very fierce option which simply discards traffic exceeding a certain rate. It is often used in the ingress policer and has limited uses. For example, you may have a name server that falls over if offered more than 5mbit/s of packets, in which case an ingress filter could be used to make sure no more is ever offered. Pass/OK Pass on traffic ok. Might be used to disable a complicated filter, but leave it in place. reclassify Most often comes down to reclassification to Best Effort. This is the default action. ----------------------------------------------------------------------------- 12.3.3. Examples The only real example known is mentioned in the 'Protecting your host from SYN floods' section. FIXME: if you have used this, please share your experience with us ----------------------------------------------------------------------------- 12.4. Hashing filters for very fast massive filtering If you have a need for thousands of rules, for example if you have a lot of clients or computers, all with different QoS specifications, you may find that the kernel spends a lot of time matching all those rules. By default, all filters reside in one big chain which is matched in descending order of priority. If you have 1000 rules, 1000 checks may be needed to determine what to do with a packet. Matching would go much quicker if you would have 256 chains with each four rules - if you could divide packets over those 256 chains, so that the right rule will be there. Hashing makes this possible. Let's say you have 1024 cable modem customers in your network, with IP addresses ranging from 1.2.0.0 to 1.2.3.255, and each has to go in another bin, for example 'lite', 'regular' and 'premium'. You would then have 1024 rules like this: +---------------------------------------------------------------------------+ |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.0.0 classid 1:1 | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.0.1 classid 1:1 | |... | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.3.254 classid 1:3 | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.3.255 classid 1:2 | +---------------------------------------------------------------------------+ To speed this up, we can use the last part of the IP address as a 'hash key'. We then get 256 tables, the first of which looks like this: +---------------------------------------------------------------------------+ |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.0.0 classid 1:1 | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.1.0 classid 1:1 | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.2.0 classid 1:3 | |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.3.0 classid 1:2 | +---------------------------------------------------------------------------+ The next one starts like this: +---------------------------------------------------------------------------+ |# tc filter add dev eth1 parent 1:0 protocol ip prio 100 match ip src \ | | 1.2.0.1 classid 1:1 | |... | +---------------------------------------------------------------------------+ This way, only four checks are needed at most, two on average. Configuration is pretty complicated, but very worth it by the time you have this many rules. First we make a filter root, then we create a table with 256 entries: +--------------------------------------------------------------------------------+ |# tc filter add dev eth1 parent 1:0 prio 5 protocol ip u32 | |# tc filter add dev eth1 parent 1:0 prio 5 handle 2: protocol ip u32 divisor 256| +--------------------------------------------------------------------------------+ Now we add some rules to entries in the created table: +---------------------------------------------------------------------------+ |# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: \ | | match ip src 1.2.0.123 flowid 1:1 | |# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: \ | | match ip src 1.2.1.123 flowid 1:2 | |# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: \ | | match ip src 1.2.3.123 flowid 1:3 | |# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: \ | | match ip src 1.2.4.123 flowid 1:2 | +---------------------------------------------------------------------------+ This is entry 123, which contains matches for 1.2.0.123, 1.2.1.123, 1.2.2.123, 1.2.3.123, and sends them to 1:1, 1:2, 1:3 and 1:2 respectively. Note that we need to specify our hash bucket in hex, 0x7b is 123. Next create a 'hashing filter' that directs traffic to the right entry in the hashing table: +---------------------------------------------------------------------------+ |# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: \ | | match ip src 1.2.0.0/16 \ | | hashkey mask 0x000000ff at 12 \ | | link 2: | +---------------------------------------------------------------------------+ Ok, some numbers need explaining. The default hash table is called 800:: and all filtering starts there. Then we select the source address, which lives as position 12, 13, 14 and 15 in the IP header, and indicate that we are only interested in the last part. This we send to hash table 2:, which we created earlier. It is quite complicated, but it does work in practice and performance will be staggering. Note that this example could be improved to the ideal case where each chain contains 1 filter! ----------------------------------------------------------------------------- Chapter 13. Kernel network parameters The kernel has lots of parameters which can be tuned for different circumstances. While, as usual, the default parameters serve 99% of installations very well, we don't call this the Advanced HOWTO for the fun of it! The interesting bits are in /proc/sys/net, take a look there. Not everything will be documented here initially, but we're working on it. In the meantime you may want to have a look at the Linux-Kernel sources; read the file Documentation/filesystems/proc.txt. Most of the features are explained there. (FIXME) ----------------------------------------------------------------------------- 13.1. Reverse Path Filtering By default, routers route everything, even packets which 'obviously' don't belong on your network. A common example is private IP space escaping onto the Internet. If you have an interface with a route of 195.96.96.0/24 to it, you do not expect packets from 212.64.94.1 to arrive there. Lots of people will want to turn this feature off, so the kernel hackers have made it easy. There are files in /proc where you can tell the kernel to do this for you. The method is called "Reverse Path Filtering". Basically, if the reply to this packet wouldn't go out the interface this packet came in, then this is a bogus packet and should be ignored. The following fragment will turn this on for all current and future interfaces. +---------------------------------------------------------------------------+ |# for i in /proc/sys/net/ipv4/conf/*/rp_filter ; do | |> echo 2 > $i | |> done | +---------------------------------------------------------------------------+ Going by the example above, if a packet arrived on the Linux router on eth1 claiming to come from the Office+ISP subnet, it would be dropped. Similarly, if a packet came from the Office subnet, claiming to be from somewhere outside your firewall, it would be dropped also. The above is full reverse path filtering. The default is to only filter based on IPs that are on directly connected networks. This is because the full filtering breaks in the case of asymmetric routing (where packets come in one way and go out another, like satellite traffic, or if you have dynamic (bgp, ospf, rip) routes in your network. The data comes down through the satellite dish and replies go back through normal land-lines). If this exception applies to you (and you'll probably know if it does) you can simply turn off the rp_filter on the interface where the satellite data comes in. If you want to see if any packets are being dropped, the log_martians file in the same directory will tell the kernel to log them to your syslog. +---------------------------------------------------------------------------+ |# echo 1 >/proc/sys/net/ipv4/conf//log_martians | +---------------------------------------------------------------------------+ FIXME: is setting the conf/{default,all}/* files enough? - martijn ----------------------------------------------------------------------------- 13.2. Obscure settings Ok, there are a lot of parameters which can be modified. We try to list them all. Also documented (partly) in Documentation/ip-sysctl.txt. Some of these settings have different defaults based on whether you answered 'Yes' to 'Configure as router and not host' while compiling your kernel. ----------------------------------------------------------------------------- 13.2.1. Generic ipv4 As a generic note, most rate limiting features don't work on loopback, so don't test them locally. The limits are supplied in 'jiffies', and are enforced using the earlier mentioned token bucket filter. The kernel has an internal clock which runs at 'HZ' ticks (or 'jiffies') per second. On Intel, 'HZ' is mostly 100. So setting a *_rate file to, say 50, would allow for 2 packets per second. The token bucket filter is also configured to allow for a burst of at most 6 packets, if enough tokens have been earned. Several entries in the following list have been copied from /usr/src/linux/ Documentation/networking/ip-sysctl.txt, written by Alexey Kuznetsov < kuznet@ms2.inr.ac.ru> and Andi Kleen /proc/sys/net/ipv4/icmp_destunreach_rate If the kernel decides that it can't deliver a packet, it will drop it, and send the source of the packet an ICMP notice to this effect. /proc/sys/net/ipv4/icmp_echo_ignore_all Don't act on echo packets at all. Please don't set this by default, but if you are used as a relay in a DoS attack, it may be useful. /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts [Useful] If you ping the broadcast address of a network, all hosts are supposed to respond. This makes for a dandy denial-of-service tool. Set this to 1 to ignore these broadcast messages. /proc/sys/net/ipv4/icmp_echoreply_rate The rate at which echo replies are sent to any one destination. /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses Set this to ignore ICMP errors caused by hosts in the network reacting badly to frames sent to what they perceive to be the broadcast address. /proc/sys/net/ipv4/icmp_paramprob_rate A relatively unknown ICMP message, which is sent in response to incorrect packets with broken IP or TCP headers. With this file you can control the rate at which it is sent. /proc/sys/net/ipv4/icmp_timeexceed_rate This the famous cause of the 'Solaris middle star' in traceroutes. Limits number of ICMP Time Exceeded messages sent. /proc/sys/net/ipv4/igmp_max_memberships Maximum number of listening igmp (multicast) sockets on the host. FIXME: Is this true? /proc/sys/net/ipv4/inet_peer_gc_maxtime FIXME: Add a little explanation about the inet peer storage? Minimum interval between garbage collection passes. This interval is in effect under low (or absent) memory pressure on the pool. Measured in jiffies. /proc/sys/net/ipv4/inet_peer_gc_mintime Minimum interval between garbage collection passes. This interval is in effect under high memory pressure on the pool. Measured in jiffies. /proc/sys/net/ipv4/inet_peer_maxttl Maximum time-to-live of entries. Unused entries will expire after this period of time if there is no memory pressure on the pool (i.e. when the number of entries in the pool is very small). Measured in jiffies. /proc/sys/net/ipv4/inet_peer_minttl Minimum time-to-live of entries. Should be enough to cover fragment time-to-live on the reassembling side. This minimum time-to-live is guaranteed if the pool size is less than inet_peer_threshold. Measured in jiffies. /proc/sys/net/ipv4/inet_peer_threshold The approximate size of the INET peer storage. Starting from this threshold entries will be thrown aggressively. This threshold also determines entries' time-to-live and time intervals between garbage collection passes. More entries, less time-to-live, less GC interval. /proc/sys/net/ipv4/ip_autoconfig This file contains the number one if the host received its IP configuration by RARP, BOOTP, DHCP or a similar mechanism. Otherwise it is zero. /proc/sys/net/ipv4/ip_default_ttl Time To Live of packets. Set to a safe 64. Raise it if you have a huge network. Don't do so for fun - routing loops cause much more damage that way. You might even consider lowering it in some circumstances. /proc/sys/net/ipv4/ip_dynaddr You need to set this if you use dial-on-demand with a dynamic interface address. Once your demand interface comes up, any local TCP sockets which haven't seen replies will be rebound to have the right address. This solves the problem that the connection that brings up your interface itself does not work, but the second try does. /proc/sys/net/ipv4/ip_forward If the kernel should attempt to forward packets. Off by default. /proc/sys/net/ipv4/ip_local_port_range Range of local ports for outgoing connections. Actually quite small by default, 1024 to 4999. /proc/sys/net/ipv4/ip_no_pmtu_disc Set this if you want to disable Path MTU discovery - a technique to determine the largest Maximum Transfer Unit possible on your path. See also the section on Path MTU discovery in the Cookbook chapter. /proc/sys/net/ipv4/ipfrag_high_thresh Maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory is allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh is reached. /proc/sys/net/ipv4/ip_nonlocal_bind Set this if you want your applications to be able to bind to an address which doesn't belong to a device on your system. This can be useful when your machine is on a non-permanent (or even dynamic) link, so your services are able to start up and bind to a specific address when your link is down. /proc/sys/net/ipv4/ipfrag_low_thresh Minimum memory used to reassemble IP fragments. /proc/sys/net/ipv4/ipfrag_time Time in seconds to keep an IP fragment in memory. /proc/sys/net/ipv4/tcp_abort_on_overflow A boolean flag controlling the behaviour under lots of incoming connections. When enabled, this causes the kernel to actively send RST packets when a service is overloaded. /proc/sys/net/ipv4/tcp_fin_timeout Time to hold socket in state FIN-WAIT-2, if it was closed by our side. Peer can be broken and never close its side, or even died unexpectedly. Default value is 60sec. Usual value used in 2.2 was 180 seconds, you may restore it, but remember that if your machine is even underloaded WEB server, you risk to overflow memory with kilotons of dead sockets, FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1, because they eat maximum 1.5K of memory, but they tend to live longer. Cf. tcp_max_orphans. /proc/sys/net/ipv4/tcp_keepalive_time How often TCP sends out keepalive messages when keepalive is enabled. Default: 2hours. /proc/sys/net/ipv4/tcp_keepalive_intvl How frequent probes are retransmitted, when a probe isn't acknowledged. Default: 75 seconds. /proc/sys/net/ipv4/tcp_keepalive_probes How many keepalive probes TCP will send, until it decides that the connection is broken. Default value: 9. Multiplied with tcp_keepalive_intvl, this gives the time a link can be non-responsive after a keepalive has been sent. /proc/sys/net/ipv4/tcp_max_orphans Maximal number of TCP sockets not attached to any user file handle, held by system. If this number is exceeded orphaned connections are reset immediately and warning is printed. This limit exists only to prevent simple DoS attacks, you _must_ not rely on this or lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value, and tune network services to linger and kill such states more aggressively. Let me remind you again: each orphan eats up to  64K of unswappable memory. /proc/sys/net/ipv4/tcp_orphan_retries How may times to retry before killing TCP connection, closed by our side. Default value 7 corresponds to  50sec-16min depending on RTO. If your machine is a loaded WEB server, you should think about lowering this value, such sockets may consume significant resources. Cf. tcp_max_orphans. /proc/sys/net/ipv4/tcp_max_syn_backlog Maximal number of remembered connection requests, which still did not receive an acknowledgment from connecting client. Default value is 1024 for systems with more than 128Mb of memory, and 128 for low memory machines. If server suffers of overload, try to increase this number. Warning! If you make it greater than 1024, it would be better to change TCP_SYNQ_HSIZE in include/net/tcp.h to keep TCP_SYNQ_HSIZE*16<= tcp_max_syn_backlog and to recompile kernel. /proc/sys/net/ipv4/tcp_max_tw_buckets Maximal number of timewait sockets held by system simultaneously. If this number is exceeded time-wait socket is immediately destroyed and warning is printed. This limit exists only to prevent simple DoS attacks, you _must_ not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value. /proc/sys/net/ipv4/tcp_retrans_collapse Bug-to-bug compatibility with some broken printers. On retransmit try to send bigger packets to work around bugs in certain TCP stacks. /proc/sys/net/ipv4/tcp_retries1 How many times to retry before deciding that something is wrong and it is necessary to report this suspicion to network layer. Minimal RFC value is 3, it is default, which corresponds to  3sec-8min depending on RTO. /proc/sys/net/ipv4/tcp_retries2 How may times to retry before killing alive TCP connection. [http:// www.ietf.org/rfc/rfc1122.txt] RFC 1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to  13-30min depending on RTO. /proc/sys/net/ipv4/tcp_rfc1337 This boolean enables a fix for 'time-wait assassination hazards in tcp', described in RFC 1337. If enabled, this causes the kernel to drop RST packets for sockets in the time-wait state. Default: 0 /proc/sys/net/ipv4/tcp_sack Use Selective ACK which can be used to signify that specific packets are missing - therefore helping fast recovery. /proc/sys/net/ipv4/tcp_stdurg Use the Host requirements interpretation of the TCP urg pointer field. Most hosts use the older BSD interpretation, so if you turn this on Linux might not communicate correctly with them. Default: FALSE /proc/sys/net/ipv4/tcp_syn_retries Number of SYN packets the kernel will send before giving up on the new connection. /proc/sys/net/ipv4/tcp_synack_retries To open the other side of the connection, the kernel sends a SYN with a piggybacked ACK on it, to acknowledge the earlier received SYN. This is part 2 of the threeway handshake. This setting determines the number of SYN+ACK packets sent before the kernel gives up on the connection. /proc/sys/net/ipv4/tcp_timestamps Timestamps are used, amongst other things, to protect against wrapping sequence numbers. A 1 gigabit link might conceivably re-encounter a previous sequence number with an out-of-line value, because it was of a previous generation. The timestamp will let it recognize this 'ancient packet'. /proc/sys/net/ipv4/tcp_tw_recycle Enable fast recycling TIME-WAIT sockets. Default value is 1. It should not be changed without advice/request of technical experts. /proc/sys/net/ipv4/tcp_window_scaling TCP/IP normally allows windows up to 65535 bytes big. For really fast networks, this may not be enough. The window scaling options allows for almost gigabyte windows, which is good for high bandwidth*delay products. ----------------------------------------------------------------------------- 13.2.2. Per device settings DEV can either stand for a real interface, or for 'all' or 'default'. Default also changes settings for interfaces yet to be created. /proc/sys/net/ipv4/conf/DEV/accept_redirects If a router decides that you are using it for a wrong purpose (ie, it needs to resend your packet on the same interface), it will send us a ICMP Redirect. This is a slight security risk however, so you may want to turn it off, or use secure redirects. /proc/sys/net/ipv4/conf/DEV/accept_source_route Not used very much anymore. You used to be able to give a packet a list of IP addresses it should visit on its way. Linux can be made to honor this IP option. /proc/sys/net/ipv4/conf/DEV/bootp_relay Accept packets with source address 0.b.c.d with destinations not to this host as local ones. It is supposed that a BOOTP relay daemon will catch and forward such packets. The default is 0, since this feature is not implemented yet (kernel version 2.2.12). /proc/sys/net/ipv4/conf/DEV/forwarding Enable or disable IP forwarding on this interface. /proc/sys/net/ipv4/conf/DEV/log_martians See the section on Reverse Path Filtering. /proc/sys/net/ipv4/conf/DEV/mc_forwarding If we do multicast forwarding on this interface /proc/sys/net/ipv4/conf/DEV/proxy_arp If you set this to 1, this interface will respond to ARP requests for addresses the kernel has routes to. Can be very useful when building 'ip pseudo bridges'. Do take care that your netmasks are very correct before enabling this! Also be aware that the rp_filter, mentioned elsewhere, also operates on ARP queries! /proc/sys/net/ipv4/conf/DEV/rp_filter See the section on Reverse Path Filtering. /proc/sys/net/ipv4/conf/DEV/secure_redirects Accept ICMP redirect messages only for gateways, listed in default gateway list. Enabled by default. /proc/sys/net/ipv4/conf/DEV/send_redirects If we send the above mentioned redirects. /proc/sys/net/ipv4/conf/DEV/shared_media If it is not set the kernel does not assume that different subnets on this device can communicate directly. Default setting is 'yes'. /proc/sys/net/ipv4/conf/DEV/tag FIXME: fill this in ----------------------------------------------------------------------------- 13.2.3. Neighbor policy Dev can either stand for a real interface, or for 'all' or 'default'. Default also changes settings for interfaces yet to be created. /proc/sys/net/ipv4/neigh/DEV/anycast_delay Maximum for random delay of answers to neighbor solicitation messages in jiffies (1/100 sec). Not yet implemented (Linux does not have anycast support yet). /proc/sys/net/ipv4/neigh/DEV/app_solicit Determines the number of requests to send to the user level ARP daemon. Use 0 to turn off. /proc/sys/net/ipv4/neigh/DEV/base_reachable_time A base value used for computing the random reachable time value as specified in RFC2461. /proc/sys/net/ipv4/neigh/DEV/delay_first_probe_time Delay for the first time probe if the neighbor is reachable. (see gc_stale_time) /proc/sys/net/ipv4/neigh/DEV/gc_stale_time Determines how often to check for stale ARP entries. After an ARP entry is stale it will be resolved again (which is useful when an IP address migrates to another machine). When ucast_solicit is greater than 0 it first tries to send an ARP packet directly to the known host When that fails and mcast_solicit is greater than 0, an ARP request is broadcast. /proc/sys/net/ipv4/neigh/DEV/locktime An ARP/neighbor entry is only replaced with a new one if the old is at least locktime old. This prevents ARP cache thrashing. /proc/sys/net/ipv4/neigh/DEV/mcast_solicit Maximum number of retries for multicast solicitation. /proc/sys/net/ipv4/neigh/DEV/proxy_delay Maximum time (real time is random [0..proxytime]) before answering to an ARP request for which we have an proxy ARP entry. In some cases, this is used to prevent network flooding. /proc/sys/net/ipv4/neigh/DEV/proxy_qlen Maximum queue length of the delayed proxy arp timer. (see proxy_delay). /proc/sys/net/ipv4/neigh/DEV/retrans_time The time, expressed in jiffies (1/100 sec), between retransmitted Neighbor Solicitation messages. Used for address resolution and to determine if a neighbor is unreachable. /proc/sys/net/ipv4/neigh/DEV/ucast_solicit Maximum number of retries for unicast solicitation. /proc/sys/net/ipv4/neigh/DEV/unres_qlen Maximum queue length for a pending arp request - the number of packets which are accepted from other layers while the ARP address is still resolved. Internet QoS: Architectures and Mechanisms for Quality of Service, Zheng Wang, ISBN 1-55860-608-4 Hardcover textbook covering topics related to Quality of Service. Good for understanding basic concepts. ----------------------------------------------------------------------------- 13.2.4. Routing settings /proc/sys/net/ipv4/route/error_burst These parameters are used to limit the warning messages written to the kernel log from the routing code. The higher the error_cost factor is, the fewer messages will be written. Error_burst controls when messages will be dropped. The default settings limit warning messages to one every five seconds. /proc/sys/net/ipv4/route/error_cost These parameters are used to limit the warning messages written to the kernel log from the routing code. The higher the error_cost factor is, the fewer messages will be written. Error_burst controls when messages will be dropped. The default settings limit warning messages to one every five seconds. /proc/sys/net/ipv4/route/flush Writing to this file results in a flush of the routing cache. /proc/sys/net/ipv4/route/gc_elasticity Values to control the frequency and behavior of the garbage collection algorithm for the routing cache. This can be important for when doing fail over. At least gc_timeout seconds will elapse before Linux will skip to another route because the previous one has died. By default set to 300, you may want to lower it if you want to have a speedy fail over. Also see [http://mailman.ds9a.nl/pipermail/lartc/2002q1/002667.html] this post by Ard van Breemen. /proc/sys/net/ipv4/route/gc_interval See /proc/sys/net/ipv4/route/gc_elasticity. /proc/sys/net/ipv4/route/gc_min_interval See /proc/sys/net/ipv4/route/gc_elasticity. /proc/sys/net/ipv4/route/gc_thresh See /proc/sys/net/ipv4/route/gc_elasticity. /proc/sys/net/ipv4/route/gc_timeout See /proc/sys/net/ipv4/route/gc_elasticity. /proc/sys/net/ipv4/route/max_delay Delays for flushing the routing cache. /proc/sys/net/ipv4/route/max_size Maximum size of the routing cache. Old entries will be purged once the cache reached has this size. /proc/sys/net/ipv4/route/min_adv_mss FIXME: fill this in /proc/sys/net/ipv4/route/min_delay Delays for flushing the routing cache. /proc/sys/net/ipv4/route/min_pmtu FIXME: fill this in /proc/sys/net/ipv4/route/mtu_expires FIXME: fill this in /proc/sys/net/ipv4/route/redirect_load Factors which determine if more ICMP redirects should be sent to a specific host. No redirects will be sent once the load limit or the maximum number of redirects has been reached. /proc/sys/net/ipv4/route/redirect_number See /proc/sys/net/ipv4/route/redirect_load. /proc/sys/net/ipv4/route/redirect_silence Timeout for redirects. After this period redirects will be sent again, even if this has been stopped, because the load or number limit has been reached. ----------------------------------------------------------------------------- Chapter 14. Advanced & less common queueing disciplines Should you find that you have needs not addressed by the queues mentioned earlier, the kernel contains some other more specialized queues mentioned here. ----------------------------------------------------------------------------- 14.1. bfifo/pfifo These classless queues are even simpler than pfifo_fast in that they lack the internal bands - all traffic is really equal. They have one important benefit though, they have some statistics. So even if you don't need shaping or prioritizing, you can use this qdisc to determine the backlog on your interface. pfifo has a length measured in packets, bfifo in bytes. ----------------------------------------------------------------------------- 14.1.1. Parameters & usage limit Specifies the length of the queue. Measured in bytes for bfifo, in packets for pfifo. Defaults to the interface txqueuelen (see pfifo_fast chapter) packets long or txqueuelen*mtu bytes for bfifo. ----------------------------------------------------------------------------- 14.2. Clark-Shenker-Zhang algorithm (CSZ) This is so theoretical that not even Alexey (the main CBQ author) claims to understand it. From his source: David D. Clark, Scott Shenker and Lixia Zhang Supporting Real-Time Applications in an Integrated Services Packet Network: Architecture and Mechanism. As I understand it, the main idea is to create WFQ flows for each guaranteed service and to allocate the rest of bandwith to dummy flow-0. Flow-0 comprises the predictive services and the best effort traffic; it is handled by a priority scheduler with the highest priority band allocated for predictive services, and the rest --- to the best effort packets. Note that in CSZ flows are NOT limited to their bandwidth. It is supposed that the flow passed admission control at the edge of the QoS network and it doesn't need further shaping. Any attempt to improve the flow or to shape it to a token bucket at intermediate hops will introduce undesired delays and raise jitter. At the moment CSZ is the only scheduler that provides true guaranteed service. Another schemes (including CBQ) do not provide guaranteed delay and randomize jitter." Does not currently seem like a good candidate to use, unless you've read and understand the article mentioned. ----------------------------------------------------------------------------- 14.3. DSMARK Esteve Camps This text is an extract from my thesis on QoS Support in Linux, September 2000. Source documents:   * [ftp://icaftp.epfl.ch/pub/linux/diffserv/misc/dsid-01.txt.gz] Draft-almesberger-wajhak-diffserv-linux-01.txt.   * Examples in iproute2 distribution.   * [http://www.qosforum.com/white-papers/qosprot_v3.pdf] White Paper-QoS protocols and architectures and [http://www.qosforum.com/docs/faq] IP QoS Frequently Asked Questions both by Quality of Service Forum. This chapter was written by Esteve Camps . ----------------------------------------------------------------------------- 14.3.1. Introduction First of all, first of all, it would be a great idea for you to read RFCs written about this (RFC2474, RFC2475, RFC2597 and RFC2598) at [http:// www.ietf.org/html.charters/diffserv-charter.html] IETF DiffServ working Group web site and [http://diffserv.sf.net/] Werner Almesberger web site (he wrote the code to support Differentiated Services on Linux). ----------------------------------------------------------------------------- 14.3.2. What is Dsmark related to? Dsmark is a queueing discipline that offers the capabilities needed in Differentiated Services (also called DiffServ or, simply, DS). DiffServ is one of two actual QoS architectures (the other one is called Integrated Services) that is based on a value carried by packets in the DS field of the IP header. One of the first solutions in IP designed to offer some QoS level was the Type of Service field (TOS byte) in IP header. By changing that value, we could choose a high/low level of throughput, delay or reliability. But this didn't provide sufficient flexibility to the needs of new services (such as real-time applications, interactive applications and others). After this, new architectures appeared. One of these was DiffServ which kept TOS bits and renamed DS field. ----------------------------------------------------------------------------- 14.3.3. Differentiated Services guidelines Differentiated Services is group-oriented. I mean, we don't know anything about flows (this will be the Integrated Services purpose); we know about flow aggregations and we will apply different behaviours depending on which aggregation a packet belongs to. When a packet arrives to an edge node (entry node to a DiffServ domain) entering to a DiffServ Domain we'll have to policy, shape and/or mark those packets (marking refers to assigning a value to the DS field. It's just like the cows :-) ). This will be the mark/value that the internal/core nodes on our DiffServ Domain will look at to determine which behaviour or QoS level apply. As you can deduce, Differentiated Services involves a domain on which all DS rules will have to be applied. In fact you can think I will classify all the packets entering my domain. Once they enter my domain they will be subjected to the rules that my classification dictates and every traversed node will apply that QoS level. In fact, you can apply your own policies into your local domains, but some Service Level Agreements should be considered when connecting to other DS domains. At this point, you maybe have a lot of questions. DiffServ is more than I've explained. In fact, you can understand that I can not resume more than 3 RFCs in just 50 lines :-). ----------------------------------------------------------------------------- 14.3.4. Working with Dsmark As the DiffServ bibliography specifies, we differentiate boundary nodes and interior nodes. These are two important points in the traffic path. Both types perform a classification when the packets arrive. Its result may be used in different places along the DS process before the packet is released to the network. It's just because of this that the diffserv code supplies an structure called sk_buff, including a new field called skb->tc_index where we'll store the result of initial classification that may be used in several points in DS treatment. The skb->tc_index value will be initially set by the DSMARK qdisc, retrieving it from the DS field in IP header of every received packet. Besides, cls_tcindex classifier will read all or part of skb->tcindex value and use it to select classes. But, first of all, take a look at DSMARK qdisc command and its parameters: +---------------------------------------------------------------------------+ |... dsmark indices INDICES [ default_index DEFAULT_INDEX ] [ set_tc_index ]| +---------------------------------------------------------------------------+ What do these parameters mean?   * indices: size of table of (mask,value) pairs. Maximum value is 2^n, where n>=0.   * Default_index: the default table entry index if classifier finds no match.   * Set_tc_index: instructs dsmark discipline to retrieve the DS field and store it onto skb->tc_index. Let's see the DSMARK process. ----------------------------------------------------------------------------- 14.3.5. How SCH_DSMARK works. This qdisc will apply the next steps:   * If we have declared set_tc_index option in qdisc command, DS field is retrieved and stored onto skb->tc_index variable.   * Classifier is invoked. The classifier will be executed and it will return a class ID that will be stored in skb->tc_index variable.If no filter matches are found, we consider the default_index option to be the classId to store. If neither set_tc_index nor default_index has been declared results may be unpredictable.   * After been sent to internal qdiscs where you can reuse the result of the filter, the classid returned by the internal qdisc is stored into skb-> tc_index. We will use this value in the future to index a mask- value table. The final result to assign to the packet will be that resulting from next operation: +---------------------------------------------------------------+ |New_Ds_field = ( Old_DS_field & mask ) | value | +---------------------------------------------------------------+   * Thus, new value will result from "anding" ds_field and mask values and next, this result "ORed" with value parameter. See next diagram to understand all this process: +---------------------------------------------------------------------------------------+ | skb->ihp->tos | |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | | | | ^ | | | -- If you declare set_tc_index, we set DS | | <-----May change | | | value into skb->tc_index variable | |O DS field | | | A| |R | | +-|-+ +------+ +---+-+ Internal +-+ +---N|-----|----+ | | | | | | tc |--->| | |--> . . . -->| | | D| | | | | | | |----->|index |--->| | | Qdisc | |---->| v | | | | | | | |filter|--->| | | +---------------+ | ---->(mask,value) | | |-->| O | +------+ +-|-+--------------^----+ / | (. , .) | | | | | | ^ | | | | (. , .) | | | | | +----------|---------|----------------|-------|--+ (. , .) | | | | | sch_dsmark | | | | | | | +-|------------|---------|----------------|-------|------------------+ | | | | | <- tc_index -> | | | | | |(read) | may change | | <--------------Index to the | | | | | | | (mask,value) | | v | v v | pairs table | |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -> | | skb->tc_index | +---------------------------------------------------------------------------------------+ How to do marking? Just change the mask and value of the class you want to remark. See next line of code: +---------------------------------------------------------------------------+ |tc class change dev eth0 classid 1:1 dsmark mask 0x3 value 0xb8 | +---------------------------------------------------------------------------+ This changes the (mask,value) pair in hash table, to remark packets belonging to class 1:1.You have to "change" this values because of default values that (mask,value) gets initially (see table below). Now, we'll explain how TC_INDEX filter works and how fits into this. Besides, TCINDEX filter can be used in other configurations rather than those including DS services. ----------------------------------------------------------------------------- 14.3.6. TC_INDEX Filter This is the basic command to declare a TC_INDEX filter: +---------------------------------------------------------------------------+ |... tcindex [ hash SIZE ] [ mask MASK ] [ shift SHIFT ] | | [ pass_on | fall_through ] | | [ classid CLASSID ] [ police POLICE_SPEC ] | +---------------------------------------------------------------------------+ Next, we show the example used to explain TC_INDEX operation mode. Pay attention to bolded words: tc qdisc add dev eth0 handle 1:0 root dsmark indices 64 set_tc_index tc filter add dev eth0 parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 tc qdisc add dev eth0 parent 1:0 handle 2:0 cbq bandwidth 10Mbit cell 8 avpkt 1000 mpu 64 # EF traffic class tc class add dev eth0 parent 2:0 classid 2:1 cbq bandwidth 10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated allot 1514 weight 1 maxburst 10 # Packet fifo qdisc for EF traffic tc qdisc add dev eth0 parent 2:1 pfifo limit 5 tc filter add dev eth0 parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on (This code is not complete. It's just an extract from EFCBQ example included in iproute2 distribution). First of all, suppose we receive a packet marked as EF . If you read RFC2598, you'll see that DSCP recommended value for EF traffic is 101110. This means that DS field will be 10111000 (remember that less significant bits in TOS byte are not used in DS) or 0xb8 in hexadecimal codification. +----------------------------------------------------------------------------------+ | TC INDEX | | FILTER | | +---+ +-------+ +---+-+ +------+ +-+ +-------+ | | | | | | | | | |FILTER| +-+ +-+ | | | | | | | |----->| MASK | -> | | | -> |HANDLE|->| | | | -> | | -> | | | | | | . | =0xfc | | | | |0x2E | | +----+ | | | | | | | | | . | | | | | +------+ +--------+ | | | | | | | | . | | | | | | | | | | |-->| | . | SHIFT | | | | | | | |--> | | | | . | =2 | | | +----------------------------+ | | | | | | | | | | | CBQ 2:0 | | | | | | | +-------+ +---+--------------------------------+ | | | | | | | | | | | +-------------------------------------------------------------+ | | | | DSMARK 1:0 | | | +-------------------------------------------------------------------------+ | +----------------------------------------------------------------------------------+ The packet arrives, then, set with 0xb8 value at DS field. As we explained before, dsmark qdisc identified by 1:0 id in the example, retrieves DS field and store it in skb->tc_index variable. Next step in the example will correspond to the filter associated to this qdisc (second line in the example). This will perform next operations: +---------------------------------------------------------------------------+ |Value1 = skb->tc_index & MASK | |Key = Value1 >> SHIFT | +---------------------------------------------------------------------------+ In the example, MASK=0xFC i SHIFT=2. +---------------------------------------------------------------------------+ |Value1 = 10111000 & 11111100 = 10111000 | |Key = 10111000 >> 2 = 00101110 -> 0x2E in hexadecimal | +---------------------------------------------------------------------------+ The returned value will correspond to a qdisc internal filter handle (in the example, identifier 2:0). If a filter with this id exists, policing and metering conditions will be verified (in case that filter includes this) and the classid will be returned (in our example, classid 2:1) and stored in skb- >tc_index variable. But if any filter with that identifier is found, the result will depend on fall_through flag declaration. If so, value key is returned as classid. If not, an error is returned and process continues with the rest filters. Be careful if you use fall_through flag; this can be done if a simple relation exists between values of skb->tc_index variable and class id's. The latest parameters to comment on are hash and pass_on. The first one relates to hash table size. Pass_on will be used to indicate that if no classid equal to the result of this filter is found, try next filter. The default action is fall_through (look at next table). Finally, let's see which possible values can be set to all this TCINDEX parameters: +---------------------------------------------------------------------------+ |TC Name Value Default | |----------------------------------------------------------------- | |Hash 1...0x10000 Implementation dependent | |Mask 0...0xffff 0xffff | |Shift 0...15 0 | |Fall through / Pass_on Flag Fall_through | |Classid Major:minor None | |Police ..... None | +---------------------------------------------------------------------------+ This kind of filter is very powerful. It's necessary to explore all possibilities. Besides, this filter is not only used in DiffServ configurations. You can use it as any other kind of filter. I recommend you to look at all DiffServ examples included in iproute2 distribution. I promise I will try to complement this text as soon as I can. Besides, all I have explained is the result of a lot of tests. I would thank you tell me if I'm wrong in any point. ----------------------------------------------------------------------------- 14.4. Ingress qdisc All qdiscs discussed so far are egress qdiscs. Each interface however can also have an ingress qdisc which is not used to send packets out to the network adaptor. Instead, it allows you to apply tc filters to packets coming in over the interface, regardless of whether they have a local destination or are to be forwarded. As the tc filters contain a full Token Bucket Filter implementation, and are also able to match on the kernel flow estimator, there is a lot of functionality available. This effectively allows you to police incoming traffic, before it even enters the IP stack. ----------------------------------------------------------------------------- 14.4.1. Parameters & usage The ingress qdisc itself does not require any parameters. It differs from other qdiscs in that it does not occupy the root of a device. Attach it like this: +---------------------------------------------------------------------------+ |# tc qdisc add dev eth0 ingress | +---------------------------------------------------------------------------+ This allows you to have other, sending, qdiscs on your device besides the ingress qdisc. For a contrived example how the ingress qdisc could be used, see the Cookbook. ----------------------------------------------------------------------------- 14.5. Random Early Detection (RED) This section is meant as an introduction to backbone routing, which often involves <100 megabit bandwidths, which requires a different approach than your ADSL modem at home. The normal behaviour of router queues on the Internet is called tail-drop. Tail-drop works by queueing up to a certain amount, then dropping all traffic that 'spills over'. This is very unfair, and also leads to retransmit synchronization. When retransmit synchronization occurs, the sudden burst of drops from a router that has reached its fill will cause a delayed burst of retransmits, which will over fill the congested router again. In order to cope with transient congestion on links, backbone routers will often implement large queues. Unfortunately, while these queues are good for throughput, they can substantially increase latency and cause TCP connections to behave very burstily during congestion. These issues with tail-drop are becoming increasingly troublesome on the Internet because the use of network unfriendly applications is increasing. The Linux kernel offers us RED, short for Random Early Detect, also called Random Early Drop, as that is how it works. RED isn't a cure-all for this, applications which inappropriately fail to implement exponential backoff still get an unfair share of the bandwidth, however, with RED they do not cause as much harm to the throughput and latency of other connections. RED statistically drops packets from flows before it reaches its hard limit. This causes a congested backbone link to slow more gracefully, and prevents retransmit synchronization. This also helps TCP find its 'fair' speed faster by allowing some packets to get dropped sooner keeping queue sizes low and latency under control. The probability of a packet being dropped from a particular connection is proportional to its bandwidth usage rather than the number of packets it transmits. RED is a good queue for backbones, where you can't afford the complexity of per-session state tracking needed by fairness queueing. In order to use RED, you must decide on three parameters: Min, Max, and burst. Min sets the minimum queue size in bytes before dropping will begin, Max is a soft maximum that the algorithm will attempt to stay under, and burst sets the maximum number of packets that can 'burst through'. You should set the min by calculating that highest acceptable base queueing latency you wish, and multiply it by your bandwidth. For instance, on my 64kbit/s ISDN link, I might want a base queueing latency of 200ms so I set min to 1600 bytes. Setting min too small will degrade throughput and too large will degrade latency. Setting a small min is not a replacement for reducing the MTU on a slow link to improve interactive response. You should make max at least twice min to prevent synchronization. On slow links with small Min's it might be wise to make max perhaps four or more times large then min. Burst controls how the RED algorithm responds to bursts. Burst must be set larger then min/avpkt. Experimentally, I've found (min+min+max)/(3*avpkt) to work ok. Additionally, you need to set limit and avpkt. Limit is a safety value, after there are limit bytes in the queue, RED 'turns into' tail-drop. I typical set limit to eight times max. Avpkt should be your average packet size. 1000 works OK on high speed Internet links with a 1500byte MTU. Read [http://www.aciri.org/floyd/papers/red/red.html] the paper on RED queueing by Sally Floyd and Van Jacobson for technical information. ----------------------------------------------------------------------------- 14.6. Generic Random Early Detection Not a lot is known about GRED. It looks like GRED with several internal queues, whereby the internal queue is chosen based on the Diffserv tcindex field. According to a slide found [http://www.davin.ottawa.on.ca/ols/ img22.htm] here, it contains the capabilities of Cisco's 'Distributed Weighted RED', as well as Dave Clark's RIO. Each virtual queue can have its own Drop Parameters specified. FIXME: get Jamal or Werner to tell us more ----------------------------------------------------------------------------- 14.7. VC/ATM emulation This is quite a major effort by Werner Almesberger to allow you to build Virtual Circuits over TCP/IP sockets. A Virtual Circuit is a concept from ATM network theory. For more information, see the [http://linux-atm.sourceforge.net/] ATM on Linux homepage. ----------------------------------------------------------------------------- 14.8. Weighted Round Robin (WRR) This qdisc is not included in the standard kernels but can be downloaded from [http://wipl-wrr.dkik.dk/wrr/] ??. Currently the qdisc is only tested with Linux 2.2 kernels but it will probably work with 2.4/2.5 kernels too. The WRR qdisc distributes bandwidth between its classes using the weighted round robin scheme. That is, like the CBQ qdisc it contains classes into which arbitrary qdiscs can be plugged. All classes which have sufficient demand will get bandwidth proportional to the weights associated with the classes. The weights can be set manually using the tc program. But they can also be made automatically decreasing for classes transferring much data. The qdisc has a built-in classifier which assigns packets coming from or sent to different machines to different classes. Either the MAC or IP and either source or destination addresses can be used. The MAC address can only be used when the Linux box is acting as an ethernet bridge, however. The classes are automatically assigned to machines based on the packets seen. The qdisc can be very useful at sites such as dorms where a lot of unrelated individuals share an Internet connection. A set of scripts setting up a relevant behavior for such a site is a central part of the WRR distribution. ----------------------------------------------------------------------------- Chapter 15. Cookbook This section contains 'cookbook' entries which may help you solve problems. A cookbook is no replacement for understanding however, so try and comprehend what is going on. ----------------------------------------------------------------------------- 15.1. Running multiple sites with different SLAs You can do this in several ways. Apache has some support for this with a module, but we'll show how Linux can do this for you, and do so for other services as well. These commands are stolen from a presentation by Jamal Hadi that's referenced below. Let's say we have two customers, with http, ftp and streaming audio, and we want to sell them a limited amount of bandwidth. We do so on the server itself. Customer A should have at most 2 megabits, customer B has paid for 5 megabits. We separate our customers by creating virtual IP addresses on our server. +---------------------------------------------------------------------------+ |# ip address add 188.177.166.1 dev eth0 | |# ip address add 188.177.166.2 dev eth0 | +---------------------------------------------------------------------------+ It is up to you to attach the different servers to the right IP address. All popular daemons have support for this. We first attach a CBQ qdisc to eth0: +--------------------------------------------------------------------------------+ |# tc qdisc add dev eth0 root handle 1: cbq bandwidth 10Mbit cell 8 avpkt 1000 \ | | mpu 64 | +--------------------------------------------------------------------------------+ We then create classes for our customers: +---------------------------------------------------------------------------+ |# tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 10Mbit rate \ | | 2MBit avpkt 1000 prio 5 bounded isolated allot 1514 weight 1 maxburst 21 | |# tc class add dev eth0 parent 1:0 classid 1:2 cbq bandwidth 10Mbit rate \ | | 5Mbit avpkt 1000 prio 5 bounded isolated allot 1514 weight 1 maxburst 21 | +---------------------------------------------------------------------------+ Then we add filters for our two classes: +-------------------------------------------------------------------------------+ |##FIXME: Why this line, what does it do?, what is a divisor?: | |##FIXME: A divisor has something to do with a hash table, and the number of | |## buckets - ahu | |# tc filter add dev eth0 parent 1:0 protocol ip prio 5 handle 1: u32 divisor 1 | |# tc filter add dev eth0 parent 1:0 prio 5 u32 match ip src 188.177.166.1 | | flowid 1:1 | |# tc filter add dev eth0 parent 1:0 prio 5 u32 match ip src 188.177.166.2 | | flowid 1:2 | +-------------------------------------------------------------------------------+ And we're done. FIXME: why no token bucket filter? is there a default pfifo_fast fallback somewhere? ----------------------------------------------------------------------------- 15.2. Protecting your host from SYN floods >From Alexey's iproute documentation, adapted to netfilter and with more plausible paths. If you use this, take care to adjust the numbers to reasonable values for your system. If you want to protect an entire network, skip this script, which is best suited for a single host. It appears that you need the very latest version of the iproute2 tools to get this to work with 2.4.0. +---------------------------------------------------------------------------+ |#! /bin/sh -x | |# | |# sample script on using the ingress capabilities | |# this script shows how one can rate limit incoming SYNs | |# Useful for TCP-SYN attack protection. You can use | |# IPchains to have more powerful additions to the SYN (eg | |# in addition the subnet) | |# | |#path to various utilities; | |#change to reflect yours. | |# | |TC=/sbin/tc | |IP=/sbin/ip | |IPTABLES=/sbin/iptables | |INDEV=eth2 | |# | |# tag all incoming SYN packets through $INDEV as mark value 1 | |############################################################ | |$iptables -A PREROUTING -i $INDEV -t mangle -p tcp --syn \ | | -j MARK --set-mark 1 | |############################################################ | |# | |# install the ingress qdisc on the ingress interface | |############################################################ | |$TC qdisc add dev $INDEV handle ffff: ingress | |############################################################ | | | |# | |# | |# SYN packets are 40 bytes (320 bits) so three SYNs equals | |# 960 bits (approximately 1kbit); so we rate limit below | |# the incoming SYNs to 3/sec (not very useful really; but | |#serves to show the point - JHS | |############################################################ | |$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 1 fw \ | |police rate 1kbit burst 40 mtu 9k drop flowid :1 | |############################################################ | | | | | |# | |echo "---- qdisc parameters Ingress ----------" | |$TC qdisc ls dev $INDEV | |echo "---- Class parameters Ingress ----------" | |$TC class ls dev $INDEV | |echo "---- filter parameters Ingress ----------" | |$TC filter ls dev $INDEV parent ffff: | | | |#deleting the ingress qdisc | |#$TC qdisc del $INDEV ingress | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 15.3. Rate limit ICMP to prevent dDoS Recently, distributed denial of service attacks have become a major nuisance on the Internet. By properly filtering and rate limiting your network, you can both prevent becoming a casualty or the cause of these attacks. You should filter your networks so that you do not allow non-local IP source addressed packets to leave your network. This stops people from anonymously sending junk to the Internet. Rate limiting goes much as shown earlier. To refresh your memory, our ASCIIgram again: +---------------------------------------------------------------------------+ |[The Internet] ------ [Linux router] --- [Office+ISP] | | eth1 eth0 | +---------------------------------------------------------------------------+ We first set up the prerequisite parts: +-----------------------------------------------------------------------------+ |# tc qdisc add dev eth0 root handle 10: cbq bandwidth 10Mbit avpkt 1000 | |# tc class add dev eth0 parent 10:0 classid 10:1 cbq bandwidth 10Mbit rate \ | | 10Mbit allot 1514 prio 5 maxburst 20 avpkt 1000 | +-----------------------------------------------------------------------------+ If you have 100Mbit, or more, interfaces, adjust these numbers. Now you need to determine how much ICMP traffic you want to allow. You can perform measurements with tcpdump, by having it write to a file for a while, and seeing how much ICMP passes your network. Do not forget to raise the snapshot length! If measurement is impractical, you might want to choose 5% of your available bandwidth. Let's set up our class: +-------------------------------------------------------------------------------+ |# tc class add dev eth0 parent 10:1 classid 10:100 cbq bandwidth 10Mbit rate \ | | 100Kbit allot 1514 weight 800Kbit prio 5 maxburst 20 avpkt 250 \ | | bounded | +-------------------------------------------------------------------------------+ This limits at 100Kbit. Now we need a filter to assign ICMP traffic to this class: +---------------------------------------------------------------------------+ |# tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip | | protocol 1 0xFF flowid 10:100 | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 15.4. Prioritizing interactive traffic If lots of data is coming down your link, or going up for that matter, and you are trying to do some maintenance via telnet or ssh, this may not go too well. Other packets are blocking your keystrokes. Wouldn't it be great if there were a way for your interactive packets to sneak past the bulk traffic? Linux can do this for you! As before, we need to handle traffic going both ways. Evidently, this works best if there are Linux boxes on both ends of your link, although other UNIX's are able to do this. Consult your local Solaris/BSD guru for this. The standard pfifo_fast scheduler has 3 different 'bands'. Traffic in band 0 is transmitted first, after which traffic in band 1 and 2 gets considered. It is vital that our interactive traffic be in band 0! We blatantly adapt from the (soon to be obsolete) ipchains HOWTO: There are four seldom-used bits in the IP header, called the Type of Service (TOS) bits. They effect the way packets are treated; the four bits are "Minimum Delay", "Maximum Throughput", "Maximum Reliability" and "Minimum Cost". Only one of these bits is allowed to be set. Rob van Nieuwkerk, the author of the ipchains TOS-mangling code, puts it as follows: +---------------------------------------------------------------------------+ |Especially the "Minimum Delay" is important for me. I switch it on for | |"interactive" packets in my upstream (Linux) router. I'm | |behind a 33k6 modem link. Linux prioritizes packets in 3 queues. This | |way I get acceptable interactive performance while doing bulk | |downloads at the same time. | +---------------------------------------------------------------------------+ The most common use is to set telnet & ftp control connections to "Minimum Delay" and FTP data to "Maximum Throughput". This would be done as follows, on your upstream router: +---------------------------------------------------------------------------+ |# iptables -A PREROUTING -t mangle -p tcp --sport telnet \ | | -j TOS --set-tos Minimize-Delay | |# iptables -A PREROUTING -t mangle -p tcp --sport ftp \ | | -j TOS --set-tos Minimize-Delay | |# iptables -A PREROUTING -t mangle -p tcp --sport ftp-data \ | | -j TOS --set-tos Maximize-Throughput | +---------------------------------------------------------------------------+ Now, this only works for data going from your telnet foreign host to your local computer. The other way around appears to be done for you, ie, telnet, ssh & friends all set the TOS field on outgoing packets automatically. Should you have an application that does not do this, you can always do it with netfilter. On your local box: +---------------------------------------------------------------------------+ |# iptables -A OUTPUT -t mangle -p tcp --dport telnet \ | | -j TOS --set-tos Minimize-Delay | |# iptables -A OUTPUT -t mangle -p tcp --dport ftp \ | | -j TOS --set-tos Minimize-Delay | |# iptables -A OUTPUT -t mangle -p tcp --dport ftp-data \ | | -j TOS --set-tos Maximize-Throughput | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 15.5. Transparent web-caching using netfilter, iproute2, ipchains and squid This section was sent in by reader Ram Narula from Internet for Education (Thailand). The regular technique in accomplishing this in Linux is probably with use of ipchains AFTER making sure that the "outgoing" port 80(web) traffic gets routed through the server running squid. There are 3 common methods to make sure "outgoing" port 80 traffic gets routed to the server running squid and 4th one is being introduced here. Making the gateway router do it. If you can tell your gateway router to match packets that has outgoing destination port of 80 to be sent to the IP address of squid server. BUT This would put additional load on the router and some commercial routers might not even support this. Using a Layer 4 switch. Layer 4 switches can handle this without any problem. BUT The cost for this equipment is usually very high. Typical layer 4 switch would normally cost more than a typical router+good linux server. Using cache server as network's gateway. You can force ALL traffic through cache server. BUT This is quite risky because Squid does utilize lots of CPU power which might result in slower over-all network performance or the server itself might crash and no one on the network will be able to access the Internet if that occurs. Linux+NetFilter router. By using NetFilter another technique can be implemented which is using NetFilter for "mark"ing the packets with destination port 80 and using iproute2 to route the "mark"ed packets to the Squid server. +---------------------------------------------------------------------------+ ||----------------| | || Implementation | | ||----------------| | | | | Addresses used | | 10.0.0.1 naret (NetFilter server) | | 10.0.0.2 silom (Squid server) | | 10.0.0.3 donmuang (Router connected to the Internet) | | 10.0.0.4 kaosarn (other server on network) | | 10.0.0.5 RAS | | 10.0.0.0/24 main network | | 10.0.0.0/19 total network | | | ||---------------| | ||Network diagram| | ||---------------| | | | |Internet | || | |donmuang | || | |------------hub/switch---------- | || | | | | |naret silom kaosarn RAS etc. | +---------------------------------------------------------------------------+ First, make all traffic pass through naret by making sure it is the default gateway except for silom. Silom's default gateway has to be donmuang (10.0.0.3) or this would create web traffic loop. (all servers on my network had 10.0.0.1 as the default gateway which was the former IP address of donmuang router so what I did was changed the IP address of donmuang to 10.0.0.3 and gave naret ip address of 10.0.0.1) +---------------------------------------------------------------------------+ |Silom | |----- | |-setup squid and ipchains | +---------------------------------------------------------------------------+ Setup Squid server on silom, make sure it does support transparent caching/ proxying, the default port is usually 3128, so all traffic for port 80 has to be redirected to port 3128 locally. This can be done by using ipchains with the following: +---------------------------------------------------------------------------+ |silom# ipchains -N allow1 | |silom# ipchains -A allow1 -p TCP -s 10.0.0.0/19 -d 0/0 80 -j REDIRECT 3128 | |silom# ipchains -I input -j allow1 | +---------------------------------------------------------------------------+ Or, in netfilter lingo: +-----------------------------------------------------------------------------------------+ |silom# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128| +-----------------------------------------------------------------------------------------+ (note: you might have other entries as well) For more information on setting Squid server please refer to Squid FAQ page on [http://squid.nlanr.net] http://squid.nlanr.net). Make sure ip forwarding is enabled on this server and the default gateway for this server is donmuang router (NOT naret). +---------------------------------------------------------------------------+ |Naret | |----- | |-setup iptables and iproute2 | |-disable icmp REDIRECT messages (if needed) | +---------------------------------------------------------------------------+ 1. "Mark" packets of destination port 80 with value 2 +--------------------------------------------------------------------+ | | |naret# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 80 \ | | -j MARK --set-mark 2 | +--------------------------------------------------------------------+ 2. Setup iproute2 so it will route packets with "mark" 2 to silom +----------------------------------------------------------------+ |naret# echo 202 www.out >> /etc/iproute2/rt_tables | |naret# ip rule add fwmark 2 table www.out | |naret# ip route add default via 10.0.0.2 dev eth0 table www.out | |naret# ip route flush cache | +----------------------------------------------------------------+ If donmuang and naret is on the same subnet then naret should not send out icmp REDIRECT messages. In this case it is, so icmp REDIRECTs has to be disabled by: +---------------------------------------------------------------+ |naret# echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects | |naret# echo 0 > /proc/sys/net/ipv4/conf/default/send_redirects | |naret# echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects | +---------------------------------------------------------------+ The setup is complete, check the configuration +--------------------------------------------------------------------------------------+ |On naret: | | | |naret# iptables -t mangle -L | |Chain PREROUTING (policy ACCEPT) | |target prot opt source destination | |MARK tcp -- anywhere anywhere tcp dpt:www MARK set 0x2 | | | |Chain OUTPUT (policy ACCEPT) | |target prot opt source destination | | | |naret# ip rule ls | |0: from all lookup local | |32765: from all fwmark 2 lookup www.out | |32766: from all lookup main | |32767: from all lookup default | | | |naret# ip route list table www.out | |default via 203.114.224.8 dev eth0 | | | |naret# ip route | |10.0.0.1 dev eth0 scope link | |10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.1 | |127.0.0.0/8 dev lo scope link | |default via 10.0.0.3 dev eth0 | | | |(make sure silom belongs to one of the above lines, in this case | |it's the line with 10.0.0.0/24) | | | ||------| | ||-DONE-| | ||------| | | | +--------------------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 15.5.1. Traffic flow diagram after implementation +---------------------------------------------------------------------------+ ||-----------------------------------------| | ||Traffic flow diagram after implementation| | ||-----------------------------------------| | | | |INTERNET | |/\ | ||| | |\/ | |-----------------donmuang router--------------------- | |/\ /\ || | ||| || || | ||| \/ || | |naret silom || | |*destination port 80 traffic=========>(cache) || | |/\ || || | ||| \/ \/ | |\\===================================kaosarn, RAS, etc. | +---------------------------------------------------------------------------+ Note that the network is asymmetric as there is one extra hop on general outgoing path. +---------------------------------------------------------------------------+ |Here is run down for packet traversing the network from kaosarn | |to and from the Internet. | | | |For web/http traffic: | |kaosarn http request->naret->silom->donmuang->internet | |http replies from Internet->donmuang->silom->kaosarn | | | |For non-web/http requests(eg. telnet): | |kaosarn outgoing data->naret->donmuang->internet | |incoming data from Internet->donmuang->kaosarn | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 15.6. Circumventing Path MTU Discovery issues with per route MTU settings For sending bulk data, the Internet generally works better when using larger packets. Each packet implies a routing decision, when sending a 1 megabyte file, this can either mean around 700 packets when using packets that are as large as possible, or 4000 if using the smallest default. However, not all parts of the Internet support full 1460 bytes of payload per packet. It is therefore necessary to try and find the largest packet that will 'fit', in order to optimize a connection. This process is called 'Path MTU Discovery', where MTU stands for 'Maximum Transfer Unit.' When a router encounters a packet that's too big too send in one piece, AND it has been flagged with the "Don't Fragment" bit, it returns an ICMP message stating that it was forced to drop a packet because of this. The sending host acts on this hint by sending smaller packets, and by iterating it can find the optimum packet size for a connection over a certain path. This used to work well until the Internet was discovered by hooligans who do their best to disrupt communications. This in turn lead administrators to either block or shape ICMP traffic in a misguided attempt to improve security or robustness of their Internet service. What has happened now is that Path MTU Discovery is working less and less well and fails for certain routes, which leads to strange TCP/IP sessions which die after a while. Although I have no proof for this, two sites who I used to have this problem with both run Alteon Acedirectors before the affected systems - perhaps somebody more knowledgeable can provide clues as to why this happens. ----------------------------------------------------------------------------- 15.6.1. Solution When you encounter sites that suffer from this problem, you can disable Path MTU discovery by setting it manually. Koos van den Hout, slightly edited, writes: The following problem: I set the mtu/mru of my leased line running ppp to 296 because it's only 33k6 and I cannot influence the queueing on the other side. At 296, the response to a key press is within a reasonable time frame. And, on my side I have a masqrouter running (of course) Linux. Recently I split 'server' and 'router' so most applications are run on a different machine than the routing happens on. I then had trouble logging into irc. Big panic! Some digging did find out that I got connected to irc, even showed up as 'connected' on irc but I did not receive the motd from irc. I checked what could be wrong and noted that I already had some previous trouble reaching certain websites related to the MTU, since I had no trouble reaching them when the MTU was 1500, the problem just showed when the MTU was set to 296. Since irc servers block about every kind of traffic not needed for their immediate operation, they also block icmp. I managed to convince the operators of a webserver that this was the cause of a problem, but the irc server operators were not going to fix this. So, I had to make sure outgoing masqueraded traffic started with the lower mtu of the outside link. But I want local ethernet traffic to have the normal mtu (for things like nfs traffic). Solution: +-----------------------------------------------------------------------+ |ip route add default via 10.0.0.1 mtu 296 | +-----------------------------------------------------------------------+ (10.0.0.1 being the default gateway, the inside address of the masquerading router) In general, it is possible to override PMTU Discovery by setting specific routes. For example, if only a certain subnet is giving problems, this should help: +---------------------------------------------------------------------------+ |ip route add 195.96.96.0/24 via 10.0.0.1 mtu 1000 | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 15.7. Circumventing Path MTU Discovery issues with MSS Clamping (for ADSL, cable, PPPoE & PPtP users) As explained above, Path MTU Discovery doesn't work as well as it should anymore. If you know for a fact that a hop somewhere in your network has a limited (<1500) MTU, you cannot rely on PMTU Discovery finding this out. Besides MTU, there is yet another way to set the maximum packet size, the so called Maximum Segment Size. This is a field in the TCP Options part of a SYN packet. Recent Linux kernels, and a few PPPoE drivers (notably, the excellent Roaring Penguin one), feature the possibility to 'clamp the MSS'. The good thing about this is that by setting the MSS value, you are telling the remote side unequivocally 'do not ever try to send me packets bigger than this value'. No ICMP traffic is needed to get this to work. The bad thing is that it's an obvious hack - it breaks 'end to end' by modifying packets. Having said that, we use this trick in many places and it works like a charm. In order for this to work you need at least iptables-1.2.1a and Linux 2.4.3 or higher. The basic command line is: +-----------------------------------------------------------------------------------+ |# iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu| +-----------------------------------------------------------------------------------+ This calculates the proper MSS for your link. If you are feeling brave, or think that you know best, you can also do something like this: +----------------------------------------------------------------------------+ |# iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 128| +----------------------------------------------------------------------------+ This sets the MSS of passing SYN packets to 128. Use this if you have VoIP with tiny packets, and huge http packets which are causing chopping in your voice calls. ----------------------------------------------------------------------------- 15.8. The Ultimate Traffic Conditioner: Low Latency, Fast Up & Downloads Note: This script has recently been upgraded and previously only worked for Linux clients in your network! So you might want to update if you have Windows machines or Macs in your network and noticed that they were not able to download faster while others were uploading. I attempted to create the holy grail: Maintain low latency for interactive traffic at all times This means that downloading or uploading files should not disturb SSH or even telnet. These are the most important things, even 200ms latency is sluggish to work over. Allow 'surfing' at reasonable speeds while up or downloading Even though http is 'bulk' traffic, other traffic should not drown it out too much. Make sure uploads don't harm downloads, and the other way around This is a much observed phenomenon where upstream traffic simply destroys download speed. It turns out that all this is possible, at the cost of a tiny bit of bandwidth. The reason that uploads, downloads and ssh hurt each other is the presence of large queues in many domestic access devices like cable or DSL modems. The next section explains in depth what causes the delays, and how we can fix them. You can safely skip it and head straight for the script if you don't care how the magic is performed. ----------------------------------------------------------------------------- 15.8.1. Why it doesn't work well by default ISPs know that they are benchmarked solely on how fast people can download. Besides available bandwidth, download speed is influenced heavily by packet loss, which seriously hampers TCP/IP performance. Large queues can help prevent packet loss, and speed up downloads. So ISPs configure large queues. These large queues however damage interactivity. A keystroke must first travel the upstream queue, which may be seconds (!) long and go to your remote host. It is then displayed, which leads to a packet coming back, which must then traverse the downstream queue, located at your ISP, before it appears on your screen. This HOWTO teaches you how to mangle and process the queue in many ways, but sadly, not all queues are accessible to us. The queue over at the ISP is completely off-limits, whereas the upstream queue probably lives inside your cable modem or DSL device. You may or may not be able to configure it. Most probably not. So, what next? As we can't control either of those queues, they must be eliminated, and moved to your Linux router. Luckily this is possible. Limit upload speed By limiting our upload speed to slightly less than the truly available rate, no queues are built up in our modem. The queue is now moved to Linux. Limit download speed This is slightly trickier as we can't really influence how fast the internet ships us data. We can however drop packets that are coming in too fast, which causes TCP/IP to slow down to just the rate we want. Because we don't want to drop traffic unnecessarily, we configure a 'burst' size we allow at higher speed. Now, once we have done this, we have eliminated the downstream queue totally (except for short bursts), and gain the ability to manage the upstream queue with all the power Linux offers. What remains to be done is to make sure interactive traffic jumps to the front of the upstream queue. To make sure that uploads don't hurt downloads, we also move ACK packets to the front of the queue. This is what normally causes the huge slowdown observed when generating bulk traffic both ways. The ACKnowledgements for downstream traffic must compete with upstream traffic, and get delayed in the process. If we do all this we get the following measurements using an excellent ADSL connection from xs4all in the Netherlands: +---------------------------------------------------------------------------+ |Baseline latency: | |round-trip min/avg/max = 14.4/17.1/21.7 ms | | | |Without traffic conditioner, while downloading: | |round-trip min/avg/max = 560.9/573.6/586.4 ms | | | |Without traffic conditioner, while uploading: | |round-trip min/avg/max = 2041.4/2332.1/2427.6 ms | | | |With conditioner, during 220kbit/s upload: | |round-trip min/avg/max = 15.7/51.8/79.9 ms | | | |With conditioner, during 850kbit/s download: | |round-trip min/avg/max = 20.4/46.9/74.0 ms | | | |When uploading, downloads proceed at ~80% of the available speed. Uploads | |at around 90%. Latency then jumps to 850 ms, still figuring out why. | +---------------------------------------------------------------------------+ What you can expect from this script depends a lot on your actual uplink speed. When uploading at full speed, there will always be a single packet ahead of your keystroke. That is the lower limit to the latency you can achieve - divide your MTU by your upstream speed to calculate. Typical values will be somewhat higher than that. Lower your MTU for better effects! Next, two versions of this script, one with Devik's excellent HTB, the other with CBQ which is in each Linux kernel, unlike HTB. Both are tested and work well. ----------------------------------------------------------------------------- 15.8.2. The actual script (CBQ) Works on all kernels. Within the CBQ qdisc we place two Stochastic Fairness Queues that make sure that multiple bulk streams don't drown each other out. Downstream traffic is policed using a tc filter containing a Token Bucket Filter. You might improve on this script by adding 'bounded' to the line that starts with 'tc class add .. classid 1:20'. If you lowered your MTU, also lower the allot & avpkt numbers! +-----------------------------------------------------------------------------+ |#!/bin/bash | | | |# The Ultimate Setup For Your Internet Connection At Home | |# | |# | |# Set the following values to somewhat less than your actual download | |# and uplink speed. In kilobits | |DOWNLINK=800 | |UPLINK=220 | |DEV=ppp0 | | | |# clean existing down- and uplink qdiscs, hide errors | |tc qdisc del dev $DEV root 2> /dev/null > /dev/null | |tc qdisc del dev $DEV ingress 2> /dev/null > /dev/null | | | |###### uplink | | | |# install root CBQ | | | |tc qdisc add dev $DEV root handle 1: cbq avpkt 1000 bandwidth 10mbit | | | |# shape everything at $UPLINK speed - this prevents huge queues in your | |# DSL modem which destroy latency: | |# main class | | | |tc class add dev $DEV parent 1: classid 1:1 cbq rate ${UPLINK}kbit \ | |allot 1500 prio 5 bounded isolated | | | |# high prio class 1:10: | | | |tc class add dev $DEV parent 1:1 classid 1:10 cbq rate ${UPLINK}kbit \ | | allot 1600 prio 1 avpkt 1000 | | | |# bulk and default class 1:20 - gets slightly less traffic, | |# and a lower priority: | | | |tc class add dev $DEV parent 1:1 classid 1:20 cbq rate $[9*$UPLINK/10]kbit \ | | allot 1600 prio 2 avpkt 1000 | | | |# both get Stochastic Fairness: | |tc qdisc add dev $DEV parent 1:10 handle 10: sfq perturb 10 | |tc qdisc add dev $DEV parent 1:20 handle 20: sfq perturb 10 | | | |# start filters | |# TOS Minimum Delay (ssh, NOT scp) in 1:10: | |tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \ | | match ip tos 0x10 0xff flowid 1:10 | | | |# ICMP (ip protocol 1) in the interactive class 1:10 so we | |# can do measurements & impress our friends: | |tc filter add dev $DEV parent 1:0 protocol ip prio 11 u32 \ | | match ip protocol 1 0xff flowid 1:10 | | | |# To speed up downloads while an upload is going on, put ACK packets in | |# the interactive class: | | | |tc filter add dev $DEV parent 1: protocol ip prio 12 u32 \ | | match ip protocol 6 0xff \ | | match u8 0x05 0x0f at 0 \ | | match u16 0x0000 0xffc0 at 2 \ | | match u8 0x10 0xff at 33 \ | | flowid 1:10 | | | |# rest is 'non-interactive' ie 'bulk' and ends up in 1:20 | | | |tc filter add dev $DEV parent 1: protocol ip prio 13 u32 \ | | match ip dst 0.0.0.0/0 flowid 1:20 | | | |########## downlink ############# | |# slow downloads down to somewhat less than the real speed to prevent | |# queuing at our ISP. Tune to see how high you can set it. | |# ISPs tend to have *huge* queues to make sure big downloads are fast | |# | |# attach ingress policer: | | | |tc qdisc add dev $DEV handle ffff: ingress | | | |# filter *everything* to it (0.0.0.0/0), drop everything that's | |# coming in too fast: | | | |tc filter add dev $DEV parent ffff: protocol ip prio 50 u32 match ip src \ | | 0.0.0.0/0 police rate ${DOWNLINK}kbit burst 10k drop flowid :1 | +-----------------------------------------------------------------------------+ If you want this script to be run by ppp on connect, copy it to /etc/ppp/ ip-up.d. If the last two lines give an error, update your tc tool to a newer version! ----------------------------------------------------------------------------- 15.8.3. The actual script (HTB) The following script achieves all goals using the wonderful HTB queue, see the relevant chapter. Well worth patching your kernel for! +-----------------------------------------------------------------------------+ |#!/bin/bash | | | |# The Ultimate Setup For Your Internet Connection At Home | |# | |# | |# Set the following values to somewhat less than your actual download | |# and uplink speed. In kilobits | |DOWNLINK=800 | |UPLINK=220 | |DEV=ppp0 | | | |# clean existing down- and uplink qdiscs, hide errors | |tc qdisc del dev $DEV root 2> /dev/null > /dev/null | |tc qdisc del dev $DEV ingress 2> /dev/null > /dev/null | | | |###### uplink | | | |# install root HTB, point default traffic to 1:20: | | | |tc qdisc add dev $DEV root handle 1: htb default 20 | | | |# shape everything at $UPLINK speed - this prevents huge queues in your | |# DSL modem which destroy latency: | | | |tc class add dev $DEV parent 1: classid 1:1 htb rate ${UPLINK}kbit burst 6k | | | |# high prio class 1:10: | | | |tc class add dev $DEV parent 1:1 classid 1:10 htb rate ${UPLINK}kbit \ | | burst 6k prio 1 | | | |# bulk & default class 1:20 - gets slightly less traffic, | |# and a lower priority: | | | |tc class add dev $DEV parent 1:1 classid 1:20 htb rate $[9*$UPLINK/10]kbit \ | | burst 6k prio 2 | | | |# both get Stochastic Fairness: | |tc qdisc add dev $DEV parent 1:10 handle 10: sfq perturb 10 | |tc qdisc add dev $DEV parent 1:20 handle 20: sfq perturb 10 | | | |# TOS Minimum Delay (ssh, NOT scp) in 1:10: | |tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \ | | match ip tos 0x10 0xff flowid 1:10 | | | |# ICMP (ip protocol 1) in the interactive class 1:10 so we | |# can do measurements & impress our friends: | |tc filter add dev $DEV parent 1:0 protocol ip prio 10 u32 \ | | match ip protocol 1 0xff flowid 1:10 | | | |# To speed up downloads while an upload is going on, put ACK packets in | |# the interactive class: | | | |tc filter add dev $DEV parent 1: protocol ip prio 10 u32 \ | | match ip protocol 6 0xff \ | | match u8 0x05 0x0f at 0 \ | | match u16 0x0000 0xffc0 at 2 \ | | match u8 0x10 0xff at 33 \ | | flowid 1:10 | | | |# rest is 'non-interactive' ie 'bulk' and ends up in 1:20 | | | | | |########## downlink ############# | |# slow downloads down to somewhat less than the real speed to prevent | |# queuing at our ISP. Tune to see how high you can set it. | |# ISPs tend to have *huge* queues to make sure big downloads are fast | |# | |# attach ingress policer: | | | |tc qdisc add dev $DEV handle ffff: ingress | | | |# filter *everything* to it (0.0.0.0/0), drop everything that's | |# coming in too fast: | | | |tc filter add dev $DEV parent ffff: protocol ip prio 50 u32 match ip src \ | | 0.0.0.0/0 police rate ${DOWNLINK}kbit burst 10k drop flowid :1 | +-----------------------------------------------------------------------------+ If you want this script to be run by ppp on connect, copy it to /etc/ppp/ ip-up.d. If the last two lines give an error, update your tc tool to a newer version! ----------------------------------------------------------------------------- 15.9. Rate limiting a single host or netmask Although this is described in stupendous details elsewhere and in our manpages, this question gets asked a lot and happily there is a simple answer that does not need full comprehension of traffic control. This three line script does the trick: +--------------------------------------------------------------------------------+ | tc qdisc add dev $DEV root handle 1: cbq avpkt 1000 bandwidth 10mbit | | | | tc class add dev $DEV parent 1: classid 1:1 cbq rate 512kbit \ | | allot 1500 prio 5 bounded isolated | | | | tc filter add dev $DEV parent 1: protocol ip prio 16 u32 \ | | match ip dst 195.96.96.97 flowid 1:1 | | | +--------------------------------------------------------------------------------+ The first line installs a class based queue on your interface, and tells the kernel that for calculations, it can be assumed to be a 10mbit interface. If you get this wrong, no real harm is done. But getting it right will make everything more precise. The second line creates a 512kbit class with some reasonable defaults. For details, see the cbq manpages and Chapter 9. The last line tells which traffic should go to the shaped class. Traffic not matched by this rule is NOT shaped. To make more complicated matches (subnets, source ports, destination ports), see Section 9.6.2. If you changed anything and want to reload the script, execute 'tc qdisc del dev $DEV root' to clean up your existing configuration. The script can further be improved by adding a last optional line 'tc qdisc add dev $DEV parent 1:1 sfq perturb 10'. See Section 9.2.3 for details on what this does. ----------------------------------------------------------------------------- Chapter 16. Building bridges, and pseudo-bridges with Proxy ARP Bridges are devices which can be installed in a network without any reconfiguration. A network switch is basically a many-port bridge. A bridge is often a 2-port switch. Linux does however support multiple interfaces in a bridge, making it a true switch. Bridges are often deployed when confronted with a broken network that needs to be fixed without any alterations. Because the bridge is a layer-2 device, one layer below IP, routers and servers are not aware of its existence. This means that you can transparently block or modify certain packets, or do shaping. Another good thing is that a bridge can often be replaced by a cross cable or a hub, should it break down. The bad news is that a bridge can cause great confusion unless it is very well documented. It does not appear in traceroutes, but somehow packets disappear or get changed from point A to point B ('this network is HAUNTED! '). You should also wonder if an organization that 'does not want to change anything' is doing the right thing. The Linux 2.4/2.5 bridge is documented on [ http://bridge.sourceforge.net/] this page. ----------------------------------------------------------------------------- 16.1. State of bridging and iptables As of Linux 2.4.14, bridging and iptables do not 'see' each other without help. If you bridge packets from eth0 to eth1, they do not 'pass' by iptables. This means that you cannot do filtering, or NAT or mangling or whatever. There are several projects going on to fix this, the truly right one is by the author of the Linux 2.4 bridging code, Lennert Buytenhek. He recently informed us that as of bridge-nf 0.0.2 (see the url above), the code is stable and usable in production environments. He is now asking the kernel people if and how the patch can be merged, stay tuned! ----------------------------------------------------------------------------- 16.2. Bridging and shaping This does work as advertised. Be sure to figure out which side each interface is on, otherwise you might be shaping outbound traffic in your internal interface, which won't work. Use tcpdump if needed. ----------------------------------------------------------------------------- 16.3. Pseudo-bridges with Proxy-ARP If you just want to implement a Pseudo-bridge, skip down a few sections to 'Implementing it', but it is wise to read a bit about how it works in practice. A Pseudo-bridge works a bit differently. By default, a bridge passes packets unaltered from one interface to the other. It only looks at the hardware address of packets to determine what goes where. This in turn means that you can bridge traffic that Linux does not understand, as long as it has an hardware address it does. A 'Pseudo-bridge' works differently and looks more like a hidden router than a bridge, but like a bridge, it has little impact on network design. An advantage of the fact that it is not a bridge lies in the fact that packets really pass through the kernel, and can be filtered, changed, redirected or rerouted. A real bridge can also be made to perform these feats, but it needs special code, like the Ethernet Frame Diverter, or the above mentioned patch. Another advantage of a pseudo-bridge is that it does not pass packets it does not understand - thus cleaning your network of a lot of cruft. In cases where you need this cruft (like SAP packets, or Netbeui), use a real bridge. ----------------------------------------------------------------------------- 16.3.1. ARP & Proxy-ARP When a host wants to talk to another host on the same physical network segment, it sends out an Address Resolution Protocol packet, which, somewhat simplified, reads like this 'who has 10.0.0.1, tell 10.0.0.7'. In response to this, 10.0.0.1 replies with a short 'here' packet. 10.0.0.7 then sends packets to the hardware address mentioned in the 'here' packet. It caches this hardware address for a relatively long time, and after the cache expires, it re-asks the question. When building a Pseudo-bridge, we instruct the bridge to reply to these ARP packets, which causes the hosts in the network to send its packets to the bridge. The bridge then processes these packets, and sends them to the relevant interface. So, in short, whenever a host on one side of the bridge asks for the hardware address of a host on the other, the bridge replies with a packet that says 'hand it to me'. This way, all data traffic gets transmitted to the right place, and always passes through the bridge. ----------------------------------------------------------------------------- 16.3.2. Implementing it In the bad old days, it used to be possible to instruct the Linux Kernel to perform 'proxy-ARP' for just any subnet. So, to configure a pseudo-bridge, you would have to specify both the proper routes to both sides of the bridge AND create matching proxy-ARP rules. This is bad in that it requires a lot of typing, but also because it easily allows you to make mistakes which make your bridge respond to ARP queries for networks it does not know how to route. With Linux 2.4/2.5 (and possibly 2.2), this possibility has been withdrawn and has been replaced by a flag in the /proc directory, called 'proxy_arp'. The procedure for building a pseudo-bridge is then: 1. Assign an IP address to both interfaces, the 'left' and the 'right' one 2. Create routes so your machine knows which hosts reside on the left, and which on the right 3. Turn on proxy-ARP on both interfaces, echo 1 > /proc/sys/net/ipv4/conf/ ethL/proxy_arp, echo 1 > /proc/sys/net/ipv4/conf/ethR/proxy_arp, where L and R stand for the numbers of your interfaces on the left and on the right side Also, do not forget to turn on the ip_forwarding flag! When converting from a true bridge, you may find that this flag was turned off as it is not needed when bridging. Another thing you might note when converting is that you need to clear the arp cache of computers in the network - the arp cache might contain old pre-bridge hardware addresses which are no longer correct. On a Cisco, this is done using the command 'clear arp-cache', under Linux, use 'arp -d ip.address'. You can also wait for the cache to expire manually, which can take rather long. You can speed this up using the wonderful 'arping' tool, which on many distributions is part of the 'iputils' package. Using 'arping' you can send out unsolicited ARP messages so as to update remote arp caches. This is a very powerful technique that is also used by 'black hats' to subvert your routing! Note On Linux 2.4, you may need to execute 'echo 1 > /proc/sys/net/ipv4/ ip_nonlocal_bind' before being able to send out unsolicited ARP messages! You may also discover that your network was misconfigured if you are/were of the habit of specifying routes without netmasks. To explain, some versions of route may have guessed your netmask right in the past, or guessed wrong without you noticing. When doing surgical routing like described above, it is *vital* that you check your netmasks! ----------------------------------------------------------------------------- Chapter 17. Dynamic routing - OSPF and BGP Once your network starts to get really big, or you start to consider 'the internet' as your network, you need tools which dynamically route your data. Sites are often connected to each other with multiple links, and more are popping up all the time. The Internet has mostly standardized on OSPF and BGP4 (rfc1771). Linux supports both, by way of gated and zebra While currently not within the scope of this document, we would like to point you to the definitive works: Overview: Cisco Systems [http://www.cisco.com/univercd/cc/td/doc/cisintwk/idg4/ nd2003.htm] Designing large-scale IP Internetworks For OSPF: Moy, John T. "OSPF. The anatomy of an Internet routing protocol" Addison Wesley. Reading, MA. 1998. Halabi has also written a good guide to OSPF routing design, but this appears to have been dropped from the Cisco web site. For BGP: Halabi, Bassam "Internet routing architectures" Cisco Press (New Riders Publishing). Indianapolis, IN. 1997. also Cisco Systems [http://www.cisco.com/univercd/cc/td/doc/cisintwk/ics/icsbgp4.htm] Using the Border Gateway Protocol for interdomain routing Although the examples are Cisco-specific, they are remarkably similar to the configuration language in Zebra :-) ----------------------------------------------------------------------------- Chapter 18. Other possibilities This chapter is a list of projects having to do with advanced Linux routing & traffic shaping. Some of these links may deserve chapters of their own, some are documented very well of themselves, and don't need more HOWTO. 802.1Q VLAN Implementation for Linux [http://scry.wanfear.com/~greear/ vlan.html] (site) VLANs are a very cool way to segregate your networks in a more virtual than physical way. Good information on VLANs can be found [ftp:// ftp.netlab.ohio-state.edu/pub/jain/courses/cis788-97/virtual_lans/ index.htm] here. With this implementation, you can have your Linux box talk VLANs with machines like Cisco Catalyst, 3Com: {Corebuilder, Netbuilder II, SuperStack II switch 630}, Extreme Ntwks Summit 48, Foundry: {ServerIronXL, FastIron}. A great HOWTO about VLANs can be found [http://scry.wanfear.com/~greear/ vlan/cisco_howto.html] here. Update: has been included in the kernel as of 2.4.14 (perhaps 13). Alternate 802.1Q VLAN Implementation for Linux [http://vlan.sourceforge.net ] (site) Alternative VLAN implementation for linux. This project was started out of disagreement with the 'established' VLAN project's architecture and coding style, resulting in a cleaner overall design. Linux Virtual Server [http://www.LinuxVirtualServer.org/] (site) These people are brilliant. The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system. The architecture of the cluster is transparent to end users. End users only see a single virtual server. In short whatever you need to load balance, at whatever level of traffic, LVS will have a way of doing it. Some of their techniques are positively evil! For example, they let several machines have the same IP address on a segment, but turn off ARP on them. Only the LVS machine does ARP - it then decides which of the backend hosts should handle an incoming packet, and sends it directly to the right MAC address of the backend server. Outgoing traffic will flow directly to the router, and not via the LVS machine, which does therefor not need to see your 5Gbit/s of content flowing to the world, and cannot be a bottleneck. The LVS is implemented as a kernel patch in Linux 2.0 and 2.2, but as a Netfilter module in 2.4/2.5, so it does not need kernel patches! Their 2.4 support is still in early development, so beat on it and give feedback or send patches. CBQ.init [ftp://ftp.equinox.gu.net/pub/linux/cbq/] (site) Configuring CBQ can be a bit daunting, especially if all you want to do is shape some computers behind a router. CBQ.init can help you configure Linux with a simplified syntax. For example, if you want all computers in your 192.168.1.0/24 subnet (on 10mbit eth1) to be limited to 28kbit/s download speed, put this in the CBQ.init configuration file: +---------------------------------------------------------------+ |DEVICE=eth1,10Mbit,1Mbit | |RATE=28Kbit | |WEIGHT=2Kbit | |PRIO=5 | |RULE=192.168.1.0/24 | +---------------------------------------------------------------+ By all means use this program if the 'how and why' don't interest you. We're using CBQ.init in production and it works very well. It can even do some more advanced things, like time dependent shaping. The documentation is embedded in the script, which explains why you can't find a README. Chronox easy shaping scripts [http://www.chronox.de] (site) Stephan Mueller (smueller@chronox.de) wrote two useful scripts, 'limit.conn' and 'shaper'. The first one allows you to easily throttle a single download session, like this: +---------------------------------------------------------------+ |# limit.conn -s SERVERIP -p SERVERPORT -l LIMIT | +---------------------------------------------------------------+ It works on Linux 2.2 and 2.4/2.5. The second script is more complicated, and can be used to make lots of different queues based on iptables rules, which are used to mark packets which are then shaped. Virtual Router Redundancy Protocol implementation [http://w3.arobas.net/ ~jetienne/vrrpd/index.html] (site) This is purely for redundancy. Two machines with their own IP address and MAC Address together create a third IP Address and MAC Address, which is virtual. Originally intended purely for routers, which need constant MAC addresses, it also works for other servers. The beauty of this approach is the incredibly easy configuration. No kernel compiling or patching required, all userspace. Just run this on all machines participating in a service: +---------------------------------------------------------------+ |# vrrpd -i eth0 -v 50 10.0.0.22 | +---------------------------------------------------------------+ And you are in business! 10.0.0.22 is now carried by one of your servers, probably the first one to run the vrrp daemon. Now disconnect that computer from the network and very rapidly one of the other computers will assume the 10.0.0.22 address, as well as the MAC address. I tried this over here and had it up and running in 1 minute. For some strange reason it decided to drop my default gateway, but the -n flag prevented that. This is a 'live' fail over: +---------------------------------------------------------------+ |64 bytes from 10.0.0.22: icmp_seq=3 ttl=255 time=0.2 ms | |64 bytes from 10.0.0.22: icmp_seq=4 ttl=255 time=0.2 ms | |64 bytes from 10.0.0.22: icmp_seq=5 ttl=255 time=16.8 ms | |64 bytes from 10.0.0.22: icmp_seq=6 ttl=255 time=1.8 ms | |64 bytes from 10.0.0.22: icmp_seq=7 ttl=255 time=1.7 ms | +---------------------------------------------------------------+ Not *one* ping packet was lost! Just after packet 4, I disconnected my P200 from the network, and my 486 took over, which you can see from the higher latency. ----------------------------------------------------------------------------- Chapter 19. Further reading [http://snafu.freedom.org/linux2.2/iproute-notes.html] http:// snafu.freedom.org/linux2.2/iproute-notes.html Contains lots of technical information, comments from the kernel [http://www.davin.ottawa.on.ca/ols/] http://www.davin.ottawa.on.ca/ols/ Slides by Jamal Hadi Salim, one of the authors of Linux traffic control [http://defiant.coinet.com/iproute2/ip-cref/] http://defiant.coinet.com/ iproute2/ip-cref/ HTML version of Alexeys LaTeX documentation - explains part of iproute2 in great detail [http://www.aciri.org/floyd/cbq.html] http://www.aciri.org/floyd/cbq.html Sally Floyd has a good page on CBQ, including her original papers. None of it is Linux specific, but it does a fair job discussing the theory and uses of CBQ. Very technical stuff, but good reading for those so inclined. Differentiated Services on Linux This [ftp://icaftp.epfl.ch/pub/linux/diffserv/misc/dsid-01.txt.gz] document by Werner Almesberger, Jamal Hadi Salim and Alexey Kuznetsov describes DiffServ facilities in the Linux kernel, amongst which are TBF, GRED, the DSMARK qdisc and the tcindex classifier. [http://ceti.pl/~kravietz/cbq/NET4_tc.html] http://ceti.pl/~kravietz/cbq/ NET4_tc.html Yet another HOWTO, this time in Polish! You can copy/paste command lines however, they work just the same in every language. The author is cooperating with us and may soon author sections of this HOWTO. [http://www.cisco.com/univercd/cc/td/doc/product/software/ios111/cc111/ car.htm] IOS Committed Access Rate >From the helpful folks of Cisco who have the laudable habit of putting their documentation online. Cisco syntax is different but the concepts are the same, except that we can do more and do it without routers the price of cars :-) Docum experimental site[http://www.docum.org] (site) Stef Coene is busy convincing his boss to sell Linux support, and so he is experimenting a lot, especially with managing bandwidth. His site has a lot of practical information, examples, tests and also points out some CBQ/tc bugs. TCP/IP Illustrated, volume 1, W. Richard Stevens, ISBN 0-201-63346-9 Required reading if you truly want to understand TCP/IP. Entertaining as well. ----------------------------------------------------------------------------- Chapter 20. Acknowledgements It is our goal to list everybody who has contributed to this HOWTO, or helped us demystify how things work. While there are currently no plans for a Netfilter type scoreboard, we do like to recognize the people who are helping.   * Junk Alins   * Joe Van Andel   * Michael T. Babcock   * Christopher Barton   * Ard van Breemen   * Ron Brinker   * ?ukasz Bromirski   * Lennert Buytenhek   * Esteve Camps   * Stef Coene   * Don Cohen   * Jonathan Corbet   * Gerry N5JXS Creager   * Marco Davids   * Jonathan Day   * Martin aka devik Devera   * Stephan "Kobold" Gehring   * Jacek Glinkowski   * Andrea Glorioso   * Nadeem Hasan   * Erik Hensema   * Vik Heyndrickx   * Spauldo Da Hippie   * Koos van den Hout   * Stefan Huelbrock   * Alexander W. Janssen   * Gareth John   * Martin Josefsson   * Andi Kleen   * Andreas J. Koenig   * Pawel Krawczyk   * Amit Kucheria   * Edmund Lau   * Philippe Latu   * Arthur van Leeuwen   * Jason Lunz   * Stuart Lynne   * Alexey Mahotkin   * Predrag Malicevic   * Patrick McHardy   * Andreas Mohr   * Andrew Morton   * Wim van der Most   * Stephan Mueller   * Togan Muftuoglu   * Chris Murray   * Patrick Nagelschmidt   * Ram Narula   * Jorge Novo   * Patrik   * P?l Osgy?ny   * Lutz Preßler   * Jason Pyeron   * Rusty Russell   * Mihai RUSU   * Jamal Hadi Salim   * David Sauer   * Sheharyar Suleman Shaikh   * Stewart Shields   * Nick Silberstein   * Konrads Smelkov   * William Stearns   * Andreas Steinmetz   * Jason Tackaberry   * Charles Tassell   * Glen Turner   * Tea Sponsor: Eric Veldhuyzen   * Song Wang   * Lazar Yanackiev GNU/Linux AI & Alife HOWTO by John Eikenberry v2.0, 17 Feb 2004 This howto mainly contains information about, and links to, various AI related software libraries, applications, etc. that work on the GNU/Linux platform. All of it is (at least) free for personal use. The new master page for this document is http://zhar.net/gnu-linux/howto/ ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Purpose 1.2 What's New 1.3 Where to find this software 1.4 Updates and comments 1.5 Copyright/License 2. Traditional Artificial Intelligence 2.1 AI class/code libraries 2.2 AI software kits, applications, etc. 3. Connectionism 3.1 Connectionist class/code libraries 3.2 Connectionist software kits/applications 4. Evolutionary Computing 4.1 EC class/code libraries 4.2 EC software kits/applications 5. Alife & Complex Systems 5.1 Alife & CS class/code libraries 5.2 Alife & CS software kits, applications, etc. 6. Agents 7. Programming languages 8. MIA ______________________________________________________________________ 1. Introduction 1.1. Purpose The GNU/Linux OS has evolved from its origins in hackerdom to a full blown UNIX, capable of rivaling any commercial UNIX. It now provides an inexpensive base to build a great workstation. It has shed its hardware dependencies, having been ported to DEC Alphas, Sparcs, PowerPCs, and many others. This potential speed boost along with its networking support will make it great for workstation clusters. As a workstation it allows for all sorts of research and development, including artificial intelligence and artificial life. The purpose of this Howto is to provide a source to find out about various software packages, code libraries, and anything else that will help someone get started working with (and find resources for) artificial intelligence, artificial life, etc. All done with GNU/Linux specifically in mind. 1.2. What's New · v2.0 - Ran linkchecker and for any bad links I either found a new link or removed the item. See the new section MIA for a list of the removed entries (please let me know if you know of a new home for them). New entries: ``'', ``'', ``'', ``'', ``'', ``'', ``'', ``'', and ``'' · v1.9 - One new entry (``'') and fixed the link below to the dynamic list. · v1.8 - Cleaned up bad links, finding new ones where possible and eliminating those that seem to have disappeared. Quite a few new entries as well. New entries: ``'', ``'', ``'', ``'', ``'', ``'', ``'', ``'', ``'', ``'', ``'', and ``'' · v1.7 - Another 9 new entries, a bunch of links fixed, and a few items removed that have vanished from the net. New entries: ``'', ``'', ``'', ``'', ``'', ``'', ``'', ``'', and ``UTCS Neural Nets Research Group Software'' · v1.6 - 9 new entries, a couple link fixes and one duplicate item removed. · v1.5 - 26 new entries plus a couple link fixes. · v1.4 - 10 new updates and fixed some lisp-related links. · v1.3 - Putting a dent in the backlog, I added 30+ new entries today and submitted it to the LDP. · No record for anything previous. :( 1.3. Where to find this software All this software should be available via the net (ftp || http). The links to where to find it will be provided in the description of each package. There will also be plenty of software not covered on these pages (which is usually platform independent) located on one of the resources listed on the links section of the Master Site (given above). 1.4. Updates and comments If you find any mistakes, know of updates to one of the items below, or have problems compiling any of the applications, please mail me at: jae@zhar.net and I'll see what I can do. If you know of any AI/Alife applications, class libraries, etc. Please email me about them. Include your name, ftp and/or http sites where they can be found, plus a brief overview/commentary on the software (this info would make things a lot easier on me... but don't feel obligated ;). I know that keeping this list up to date and expanding it will take quite a bit of work. So please be patient (I do have other projects). I hope you will find this document helpful. 1.5. Copyright/License Copyright (c) 1996-2000 John A. Eikenberry LICENSE This document may be reproduced and distributed in whole or in part, in any medium physical or electronic, provided that this license notice is displayed in the reproduction. Commercial redistribution is permitted and encouraged. Thirty days advance notice, via email to the author, of redistribution is appreciated, to give the authors time to provide updated documents. A. REQUIREMENTS OF MODIFIED WORKS All modified documents, including translations, anthologies, and partial documents, must meet the following requirements: · The modified version must be labeled as such. · The person making the modifications must be identified. · Acknowledgement of the original author must be retained. · The location of the original unmodified document be identified. · The original author's name(s) may not be used to assert or imply endorsement of the resulting document without the original author's permission. In addition it is requested (not required) that: · The modifications (including deletions) be noted. · The author be notified by email of the modification in advance of redistribution, if an email address is provided in the document. As a special exception, anthologies of LDP documents may include a single copy of these license terms in a conspicuous location within the anthology and replace other copies of this license with a reference to the single copy of the license without the document being considered "modified" for the purposes of this section. Mere aggregation of LDP documents with other documents or programs on the same media shall not cause this license to apply to those other works. All translations, derivative documents, or modified documents that incorporate this document may not have more restrictive license terms than these, except that you may require distributors to make the resulting document available in source format. 2. Traditional Artificial Intelligence Traditional AI is based around the ideas of logic, rule systems, linguistics, and the concept of rationality. At its roots are programming languages such as Lisp and Prolog. Expert systems are the largest successful example of this paradigm. An expert system consists of a detailed knowledge base and a complex rule system to utilize it. Such systems have been used for such things as medical diagnosis support and credit checking systems. 2.1. AI class/code libraries These are libraries of code or classes for use in programming within the artificial intelligence field. They are not meant as stand alone applications, but rather as tools for building your own applications. ACL2 · Web site: www.cliki.net/ACL2 ACL2 (A Computational Logic for Applicative Common Lisp) is a theorem prover for industrial applications. It is both a mathematical logic and a system of tools for constructing proofs in the logic. ACL2 works with GCL (GNU Common Lisp). AI Kernel · Web site: aikernel.sourceforge.net · Sourceforge site: sourceforge.net/projects/aikernel/ The AI Kernel is a re-usable artificial intelligence engine that uses natural language processing and an Activator / Context model to allow multi tasking between installed cells. AI Search II · WEB site: www.bell-labs.com/topic/books/ooai-book/ Basically, the library offers the programmer a set of search algorithms that may be used to solve all kind of different problems. The idea is that when developing problem solving software the programmer should be able to concentrate on the representation of the problem to be solved and should not need to bother with the implementation of the search algorithm that will be used to actually conduct the search. This idea has been realized by the implementation of a set of search classes that may be incorporated in other software through C++'s features of derivation and inheritance. The following search algorithms have been implemented: · depth-first tree and graph search. · breadth-first tree and graph search. · uniform-cost tree and graph search. · best-first search. · bidirectional depth-first tree and graph search. · bidirectional breadth-first tree and graph search. · AND/OR depth tree search. · AND/OR breadth tree search. This library has a corresponding book, "Object-Oriented Artificial Instelligence, Using C++". Aleph · Web site: web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/ This document provides reference information on A Learning Engine for Proposing Hypotheses (Aleph). Aleph is an Inductive Logic Programming (ILP) system. Aleph is intended to be a prototype for exploring ideas. Aleph is an ILP algorithm implemented in Prolog by Dr Ashwin Srinivasan at the Oxford University Computing Laboratory, and is written specifically for compilation with the YAP Prolog compiler Chess In Lisp (CIL) · Web site: *found as part of the CLOCC archive at: clocc.sourceforge.net The CIL (Chess In Lisp) foundation is a Common Lisp implementaion of all the core functions needed for development of chess applications. The main purpose of the CIL project is to get AI researchers interested in using Lisp to work in the chess domain. DAI · Web site: starship.python.net/crew/gandalf/DNET/AI/ A library for the Python programming language that provides an object oriented interface to the CLIPS expert system tool. It includes an interface to COOL (CLIPS Object Oriented Language) that allows: · Investigate COOL classes · Create and manipulate with COOL instances · Manipulate with COOL message-handler's · Manipulate with Modules FFLL · Web site: ffll.sourceforge.net The Free Fuzzy Logic Library (FFLL) is an open source fuzzy logic class library and API that is optimized for speed critical applications, such as video games. FFLL is able to load files that adhere to the IEC 61131-7 standard. HTK · Web site: htk.eng.cam.ac.uk The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems. The HTK release contains extensive documentation and examples. JACK · Web site: www.pms.informatik.uni-muenchen.de/software/jack/ JACK is a new library providing constraint programming and search for Java. · JACK consists of three components: · - JCHR: Java Constraint Handling Rules. A high-level language to write constraint solvers. · - JASE: Java Abstract Search Engine. A generic search engine for JCHR to solve constraint problems. · - VisualCHR: An interactive tool to visualize JCHR computations. Source and documentation available from link above. LK · Web site: www.cs.utoronto.ca/~neto/research/lk/ LK is an implementation of the Lin-Kernighan heuristic for the Traveling Salesman Problem and for the minimum weight perfect matching problem. It is tuned for 2-d geometric instances, and has been applied to certain instances with up to a million cities. Also included are instance generators and Perl scripts for munging TSPLIB instances. This implementation introduces ``efficient cluster compensation'', an experimental algorithmic technique intended to make the Lin-Kernighan heuristic more robust in the face of clustered data. Nyquist · Web site: www.cs.cmu.edu/afs/cs.cmu.edu/project/music/web/music.html The Computer Music Project at CMU is developing computer music and interactive performance technology to enhance human musical experience and creativity. This interdisciplinary effort draws on Music Theory, Cognitive Science, Artificial Intelligence and Machine Learning, Human Computer Interaction, Real-Time Systems, Computer Graphics and Animation, Multimedia, Programming Languages, and Signal Processing. A paradigmatic example of these interdisciplinary efforts is the creation of interactive performances that couple human musical improvisation with intelligent computer agents in real-time. OpenCyc · Web site: www.opencyc.org · Alt Web site: sourceforge.net/projects/opencyc/ OpenCyc is the open source version of Cyc, the largest and most complete general knowledge base and commonsense reasoning engine. An ontology based on 6000 concepts and 60000 assertions about them. PDKB · Web site: lynx.eaze.net/~pdkb/web/ · SourceForge site: sourceforge.net/projects/pdkb Public Domain Knowledge Bank (PDKB) is an Artificial Intelligence Knowledge Bank of common sense rules and facts. It is based on the Cyc Upper Ontology and the MELD language. Python Fuzzy Logic Module · FTP site: ftp://ftp.csh.rit.edu/pub/members/retrev/ A simple python module for fuzzy logic. The file is 'fuz.tar.gz' in this directory. The author plans to also write a simple genetic algorithm and a neural net library as well. Check the 00_index file in this directory for release info. QUANT1 · Web site: linux.irk.ru/projects/QUANT/ QUANT/1 stands for type QUANTifier. It aims to be an alternative to Prolog-like (Resulutional-like) systems. Main features include a lack of necessity for eliminating Quantifiers, scolemisation, ease of comprehension, large scale formulae operation, acceptance of nonHorn formulaes, and Iterative deeping. The actual library implemented in this project is called ATPPCF (Automatic Theorem Prover in calculus of Positively Constructed Formulae). ATPPCF will be a library (inference engine) and an extension of the Predicate Calculus Language as a new logical language. The library will be incorporable in another software such as TCL, Python, Perl. The engine's primary inference method will be the "search of inference in language of Positively Constructed Formulas (PCFs)" (a subset of Predicate Calculus well translated in both directions). The language will be used as scripting language to the engine. But there will be possibility to replace it with extensions languages of main software. Screamer · Web site: www.cis.upenn.edu/~screamer-tools/home.html · Latest version is part of CLOCC: clocc.sourceforge.net Screamer is an extension of Common Lisp that adds support for nondeterministic programming. Screamer consists of two levels. The basic nondeterministic level adds support for backtracking and undoable side effects. On top of this nondeterministic substrate, Screamer provides a comprehensive constraint programming language in which one can formulate and solve mixed systems of numeric and symbolic constraints. Together, these two levels augment Common Lisp with practically all of the functionality of both Prolog and constraint logic programming languages such as CHiP and CLP(R). Furthermore, Screamer is fully integrated with Common Lisp. Screamer programs can coexist and interoperate with other extensions to Common Lisp such as CLOS, CLIM and Iterate. SPASS · Web site: spass.mpi-sb.mpg.de SPASS: An Automated Theorem Prover for First-Order Logic with Equality If you are interested in first-order logic theorem proving, the formal analysis of software, systems, protocols, formal approaches to AI planning, decision procedures, modal logic theorem proving, SPASS may offer you the right functionality. ThoughtTreasure · Web site: www.signiform.com/tt/htm/tt.htm ThoughtTreasure is a project to create a database of commonsense rules for use in any application. It consists of a database of a little over 100K rules and a C API to integrate it with your applications. Python, Perl, Java and TCL wrappers are already available. Torch · Web site: www.torch.ch Torch is a machine-learning library, written in C++. Its aim is to provide the state-of-the-art of the best algorithms. It is, and it will be, in development forever. · Many gradient-based methods, including multi-layered perceptrons, radial basis functions, and mixtures of experts. Many small "modules" (Linear module, Tanh module, SoftMax module, ...) can be plugged together. · Support Vector Machine, for classification and regression. · Distribution package, includes Kmeans, Gaussian Mixture Models, Hidden Markov Models, and Bayes Classifier, and classes for speech recognition with embedded training. · Ensemble models such as Bagging and Adaboost. · Non-parametric models such as K-nearest-neighbors, Parzen Regression and Parzen Density Estimator. · Torch is an open library whose authors encourage everybody to develop new packages to be included in future versions on the official website. 2.2. AI software kits, applications, etc. These are various applications, software kits, etc. meant for research in the field of artificial intelligence. Their ease of use will vary, as they were designed to meet some particular research interest more than as an easy to use commercial package. ASA - Adaptive Simulated Annealing · Web site: www.ingber.com/#ASA-CODE · FTP site: ftp.ingber.com/ ASA (Adaptive Simulated Annealing) is a powerful global optimization C-code algorithm especially useful for nonlinear and/or stochastic systems. ASA is developed to statistically find the best global fit of a nonlinear non-convex cost-function over a D-dimensional space. This algorithm permits an annealing schedule for 'temperature' T decreasing exponentially in annealing-time k, T = T_0 exp(-c k^1/D). The introduction of re-annealing also permits adaptation to changing sensitivities in the multi-dimensional parameter-space. This annealing schedule is faster than fast Cauchy annealing, where T = T_0/k, and much faster than Boltzmann annealing, where T = T_0/ln k. Babylon · FTP site: ftp.gmd.de/gmd/ai-research/Software/Babylon/ BABYLON is a modular, configurable, hybrid environment for developing expert systems. Its features include objects, rules with forward and backward chaining, logic (Prolog) and constraints. BABYLON is implemented and embedded in Common Lisp. cfengine · Web site: www.iu.hio.no/cfengine/ Cfengine, or the configuration engine is a very high level language for building expert systems which administrate and configure large computer networks. Cfengine uses the idea of classes and a primitive form of intelligence to define and automate the configuration of large systems in the most economical way possible. Cfengine is design to be a part of computer immune systems. CLEARS · Web site: ???? (anyone know where to find this anymore) The CLEARS system is an interactive graphical environment for computational semantics. The tool allows exploration and comparison of different semantic formalisms, and their interaction with syntax. This enables the user to get an idea of the range of possibilities of semantic construction, and also where there is real convergence between theories. CLIPS · Web site: www.ghg.net/clips/CLIPS.html CLIPS is a productive development and delivery expert system tool which provides a complete environment for the construction of rule and/or object based expert systems. CLIPS provides a cohesive tool for handling a wide variety of knowledge with support for three different programming paradigms: rule-based, object-oriented and procedural. Rule- based programming allows knowledge to be represented as heuristics, or "rules of thumb," which specify a set of actions to be performed for a given situation. Object-oriented programming allows complex systems to be modeled as modular components (which can be easily reused to model other systems or to create new components). The procedural programming capabilities provided by CLIPS are similar to capabilities found in languages such as C, Pascal, Ada, and LISP. EMA-XPS - A Hybrid Graphic Expert System Shell · Web site: www.iai.uni-wuppertal.de/EMA-XPS/ EMA-XPS is a hybrid graphic expert system shell based on the ASCII-oriented shell Babylon 2.3 of the German National Research Center for Computer Sciences (GMD). In addition to Babylon's AI- power (object oriented data representation, forward and backward chained rules - collectible into sets, horn clauses, and constraint networks) a graphic interface based on the X11 Window System and the OSF/Motif Widget Library has been provided. FOOL & FOX · Web site: rhaug.de/fool/ · FTP site: ftp.informatik.uni-oldenburg.de/pub/fool/ FOOL stands for the Fuzzy Organizer OLdenburg. It is a result from a project at the University of Oldenburg. FOOL is a graphical user interface to develop fuzzy rulebases. FOOL will help you to invent and maintain a database that specifies the behavior of a fuzzy-controller or something like that. FOX is a small but powerful fuzzy engine which reads this database, reads some input values and calculates the new control value. FUF and SURGE · Web site: www.cs.bgu.ac.il/research/projects/surge/index.htm · FTP site: ftp.cs.bgu.ac.il/pub/fuf/ FUF is an extended implementation of the formalism of functional unification grammars (FUGs) introduced by Martin Kay specialized to the task of natural language generation. It adds the following features to the base formalism: · Types and inheritance. · Extended control facilities (goal freezing, intelligent backtracking). · Modular syntax. These extensions allow the development of large grammars which can be processed efficiently and can be maintained and understood more easily. SURGE is a large syntactic realization grammar of English written in FUF. SURGE is developed to serve as a black box syntactic generation component in a larger generation system that encapsulates a rich knowledge of English syntax. SURGE can also be used as a platform for exploration of grammar writing with a generation perspective. The Grammar Workbench · Web site: ??? www.cs.kun.nl/agfl/ Seems to be obsolete??? Its gone from the site, though its parent project is still ongoing. The Grammar Workbench, or GWB for short, is an environment for the comfortable development of Affix Grammars in the AGFL- formalism. Its purposes are: · to allow the user to input, inspect and modify a grammar; · to perform consistency checks on the grammar; · to compute grammar properties; · to generate example sentences; · to assist in performing grammar transformations. GSM Suite · Web site ???: www.slip.net/~andrewm/gsm/ · Alt site: www.ibiblio.org/pub/Linux/apps/graphics/draw/ The GSM Suite is a set of programs for using Finite State Machines in a graphical fashion. The suite consists of programs that edit, compile, and print state machines. Included in the suite is an editor program, gsmedit, a compiler, gsm2cc, that produces a C++ implementation of a state machine, a PostScript generator, gsm2ps, and two other minor programs. GSM is licensed under the GNU Public License and so is free for your use under the terms of that license. Isabelle · Web site: isabelle.in.tum.de Isabelle is a popular generic theorem prover developed at Cambridge University and TU Munich. Existing logics like Isabelle/HOL provide a theorem proving environment ready to use for sizable applications. Isabelle may also serve as framework for rapid prototyping of deductive systems. It comes with a large library including Isabelle/HOL (classical higher-order logic), Isabelle/HOLCF (Scott's Logic for Computable Functions with HOL), Isabelle/FOL (classical and intuitionistic first- order logic), and Isabelle/ZF (Zermelo-Fraenkel set theory on top of FOL). Jess, the Java Expert System Shell · Web site: herzberg.ca.sandia.gov/jess/ Jess is a clone of the popular CLIPS expert system shell written entirely in Java. With Jess, you can conveniently give your applets the ability to 'reason'. Jess is compatible with all versions of Java starting with version 1.0.2. Jess implements the following constructs from CLIPS: defrules, deffunctions, defglobals, deffacts, and deftemplates. learn · Web site: www.ibiblio.org/pub/Linux/apps/cai/ Learn is a vocable learning program with memory model. LISA · Web site: lisa.sourceforge.net LISA (Lisp-based Intelligent Software Agents) is a production- rule system heavily influenced by JESS (Java Expert System Shell). It has at its core a reasoning engine based on the Rete pattern matching algorithm. LISA also provides the ability to reason over ordinary CLOS objects. NICOLE · Web site: nicole.sourceforge.net NICOLE (Nearly Intelligent Computer Operated Language Examiner) is a theory or experiment that if a computer is given enough combinations of how words, phrases and sentences are related to one another, it could talk back to you. It is an attempt to simulate a conversation by learning how words are related to other words. A human communicates with NICOLE via the keyboard and NICOLE responds back with its own sentences which are automatically generated, based on what NICOLE has stored in it's database. Each new sentence that has been typed in, and NICOLE doesn't know about, is included into NICOLE's database, thus extending the knowledge base of NICOLE. Otter: An Automated Deduction System · Web site: www-unix.mcs.anl.gov/AR/otter/ Our current automated deduction system Otter is designed to prove theorems stated in first-order logic with equality. Otter's inference rules are based on resolution and paramodulation, and it includes facilities for term rewriting, term orderings, Knuth-Bendix completion, weighting, and strategies for directing and restricting searches for proofs. Otter can also be used as a symbolic calculator and has an embedded equational programming system. PVS · Web site: pvs.csl.sri.com/ PVS is a verification system: that is, a specification language integrated with support tools and a theorem prover. It is intended to capture the state-of-the-art in mechanized formal methods and to be sufficiently rugged that it can be used for significant applications. PVS is a research prototype: it evolves and improves as we develop or apply new capabilities, and as the stress of real use exposes new requirements. SNePS · Web site: www.cse.buffalo.edu/sneps/ · FTP site: ftp.cse.buffalo.edu/pub/sneps/ The long-term goal of The SNePS Research Group is the design and construction of a natural-language-using computerized cognitive agent, and carrying out the research in artificial intelligence, computational linguistics, and cognitive science necessary for that endeavor. The three-part focus of the group is on knowledge representation, reasoning, and natural-language understanding and generation. The group is widely known for its development of the SNePS knowledge representation/reasoning system, and Cassie, its computerized cognitive agent. Soar · Web site: sitemaker.umich.edu/soar Soar has been developed to be a general cognitive architecture. We intend ultimately to enable the Soar architecture to: · work on the full range of tasks expected of an intelligent agent, from highly routine to extremely difficult, open-ended problems · represent and use appropriate forms of knowledge, such as procedural, declarative, episodic, and possibly iconic · employ the full range of problem solving methods · interact with the outside world and · learn about all aspects of the tasks and its performance on them. In other words, our intention is for Soar to support all the capabilities required of a general intelligent agent. TCM · Web site: ??? wwwhome.cs.utwente.nl/~tcm/ · FTP site: ftp.cs.vu.nl/pub/tcm/ TCM (Toolkit for Conceptual Modeling) is our suite of graphical editors. TCM contains graphical editors for Entity-Relationship diagrams, Class-Relationship diagrams, Data and Event Flow diagrams, State Transition diagrams, Jackson Process Structure diagrams and System Network diagrams, Function Refinement trees and various table editors, such as a Function-Entity table editor and a Function Decomposition table editor. TCM is easy to use and performs numerous consistency checks, some of them immediately, some of them upon request. Yale · Web site: yale.cs.uni-dortmund.de/index.html.html YALE (Yet Another Learning Environment) is an environment for machine learning experiments. Experiments can be made up of a large number of arbitrarily nestable operators and their setup is described by XML files which can easily created with a graphical user interface. Applications of YALE cover both research and real-world learning tasks. WEKA · Web site: lucy.cs.waikato.ac.nz/~ml/ WEKA (Waikato Environment for Knowledge Analysis) is an state- of-the-art facility for applying machine learning techniques to practical problems. It is a comprehensive software "workbench" that allows people to analyse real-world data. It integrates different machine learning tools within a common framework and a uniform user interface. It is designed to support a "simplicity- first" methodology, which allows users to experiment interactively with simple machine learning tools before looking for more complex solutions. 3. Connectionism Connectionism is a technical term for a group of related techniques. These techniques include areas such as Artificial Neural Networks, Semantic Networks and a few other similar ideas. My present focus is on neural networks (though I am looking for resources on the other techniques). Neural networks are programs designed to simulate the workings of the brain. They consist of a network of small mathematical-based nodes, which work together to form patterns of information. They have tremendous potential and currently seem to be having a great deal of success with image processing and robot control. 3.1. Connectionist class/code libraries These are libraries of code or classes for use in programming within the Connectionist field. They are not meant as stand alone applications, but rather as tools for building your own applications. ANSI-C Neural Networks · Web site: www.geocities.com/CapeCanaveral/1624/ This site contains ANSC-C source code for 8 types of neural nets, including: · Adaline Network · Backpropagation · Hopfield Model · (BAM) Bidirectional Associative Memory · Boltzmann Machine · Counterpropagation · (SOM) Self-Organizing Map · (ART1) Adaptive Resonance Theory They were designed to help turn the theory of a particular network model into the design for a simulator implementation , and to help with embeding an actual application into a particular network model. Software for Flexible Bayesian Modeling · Web site: www.cs.utoronto.ca/~radford/fbm.software.html This software implements flexible Bayesian models for regression and classification applications that are based on multilayer perceptron neural networks or on Gaussian processes. The implementation uses Markov chain Monte Carlo methods. Software modules that support Markov chain sampling are included in the distribution, and may be useful in other applications. BELIEF · Web site: www.cs.cmu.edu/afs/cs/project/ai- repository/ai/areas/reasonng/probabl/belief/ BELIEF is a Common Lisp implementation of the Dempster and Kong fusion and propagation algorithm for Graphical Belief Function Models and the Lauritzen and Spiegelhalter algorithm for Graphical Probabilistic Models. It includes code for manipulating graphical belief models such as Bayes Nets and Relevance Diagrams (a subset of Influence Diagrams) using both belief functions and probabilities as basic representations of uncertainty. It uses the Shenoy and Shafer version of the algorithm, so one of its unique features is that it supports both probability distributions and belief functions. It also has limited support for second order models (probability distributions on parameters). bpnn.py · Web site: http://arctrix.com/nas/python/bpnn.py A simple back-propogation ANN in Python. CNNs · Web site: www.isi.ee.ethz.ch/~haenggi/CNNsim.html · Java Version: www.ce.unipr.it/research/pardis/CNN/cnn.html Cellular Neural Networks (CNN) is a massive parallel computing paradigm defined in discrete N-dimensional spaces. CONICAL · Web site: strout.net/conical/ CONICAL is a C++ class library for building simulations common in computational neuroscience. Currently its focus is on compartmental modeling, with capabilities similar to GENESIS and NEURON. A model neuron is built out of compartments, usually with a cylindrical shape. When small enough, these open-ended cylinders can approximate nearly any geometry. Future classes may support reaction-diffusion kinetics and more. A key feature of CONICAL is its cross-platform compatibility; it has been fully co-developed and tested under Unix, DOS, and Mac OS. Jet's Neural Architecture · Web site: www.voltar-confed.org/jneural/ Jet's Neural Architecture is a C++ framework for doing neural net projects. The goals of this project were to make a fast, flexible neural architecture that isn't stuck to one kind of net and to make sure that end users could easily write useful applications. All the documentation is also easily readable. Joone · Web site: joone.sourceforge.net Joone is a neural net framework to create, train and test neural nets. The aim is to create a distributed environment based on JavaSpaces both for enthusiastic and professional users, based on the newest Java technologies. Joone is composed of a central engine that is the fulcrum of all applications that already exist or will be developed. The neural engine is modular, scalable, multitasking and tensile. Everyone can write new modules to implement new algorithms or new architectures starting from the simple components distributed with the core engine. The main idea is to create the basis to promote a zillion of AI applications that revolve around the core framework. Matrix Class · FTP site: ftp.cs.ucla.edu/pub/ A simple, fast, efficient C++ Matrix class designed for scientists and engineers. The Matrix class is well suited for applications with complex math algorithms. As an demonstration of the Matrix class, it was used to implement the backward error propagation algorithm for a multi-layer feed-forward artificial neural network. Pulcinella · Web site: iridia.ulb.ac.be/pulcinella/Welcome.html Pulcinella is written in CommonLisp, and appears as a library of Lisp functions for creating, modifying and evaluating valuation systems. Alternatively, the user can choose to interact with Pulcinella via a graphical interface (only available in Allegro CL). Pulcinella provides primitives to build and evaluate uncertainty models according to several uncertainty calculi, including probability theory, possibility theory, and Dempster- Shafer's theory of belief functions; and the possibility theory by Zadeh, Dubois and Prade's. A User's Manual is available on request. scnANNlib · Web site: www.sentinelchicken.org/projects/scnANNlib/ SCN Artificial Neural Network Library provides a programmer with a simple object-oriented API for constructing ANNs. Currently, the library supports non-recursive networks with an arbitrary number of layers, each with an arbitrary number of nodes. Facilities exist for training with momentum, and there are plans to gracefully extend the functionality of the library in later releases. TresBel · FTP site: iridia.ulb.ac.be/pub/hongxu/software/ Libraries containing (Allegro) Common Lisp code for Belief Functions (aka. Dempster-Shafer evidential reasoning) as a representation of uncertainty. Very little documentation. Has a limited GUI. UTCS Neural Nets Research Group Software · Web site: nn.cs.utexas.edu/pages/software/software.html A bit different from the other entries, this is a reference to a collection of software rather than one application. It was all developed by the UTCS Neural Net Research Group. Here's a summary of the packages available: · Natural Language Processing · MIR - Tcl/Tk-based rapid prototyping for sentence processing · SPEC - Parsing complex sentences · DISCERN - Processing script-based stories, including · PROC - Parsing, generation, question answering · HFM - Episodic memory organization · DISLEX - Lexical processing · DISCERN - The full integrated model · FGREPNET - Learning distributed representations · Self-Organization · LISSOM - Maps with self-organizing lateral connections. · FM - Generic Self-Organizing Maps · Neuroevolution · Enforced Sub-Populations (ESP) for sequential decision tasks · Non-Markov Double Pole Balancing · Symbiotic, Adaptive NeuroEvolution (SANE; predecessor of ESP) · JavaSANE - Java software package for applying SANE to new tasks · SANE-C - C version, predecessor of JavaSANE · Pole Balancing - Neuron-level SANE on the Pole Balancing task · NeuroEvolution of Augmenting Topologies (NEAT) software for evolving neural networks using structure Various (C++) Neural Networks · Web site: www.dontveter.com/nnsoft/nnsoft.html Example neural net codes from the book, The Pattern Recognition Basics of AI. These are simple example codes of these various neural nets. They work well as a good starting point for simple experimentation and for learning what the code is like behind the simulators. The types of networks available on this site are: (implemented in C++) · The Backprop Package · The Nearest Neighbor Algorithms · The Interactive Activation Algorithm · The Hopfield and Boltzman machine Algorithms · The Linear Pattern Classifier · ART I · Bi-Directional Associative Memory · The Feedforward Counter-Propagation Network 3.2. Connectionist software kits/applications These are various applications, software kits, etc. meant for research in the field of Connectionism. Their ease of use will vary, as they were designed to meet some particular research interest more than as an easy to use commercial package. Aspirin - MIGRAINES (am6.tar.Z on ftp site) · FTP site: sunsite.unc.edu/pub/academic/computer-science/neural- networks/programs/Aspirin/ The software that we are releasing now is for creating, and evaluating, feed-forward networks such as those used with the backpropagation learning algorithm. The software is aimed both at the expert programmer/neural network researcher who may wish to tailor significant portions of the system to his/her precise needs, as well as at casual users who will wish to use the system with an absolute minimum of effort. DDLab · Web site: www.santafe.edu/~wuensch/ddlab.html · FTP site: ftp.santafe.edu/pub/wuensch/ DDLab is an interactive graphics program for research into the dynamics of finite binary networks, relevant to the study of complexity, emergent phenomena, neural networks, and aspects of theoretical biology such as gene regulatory networks. A network can be set up with any architecture between regular CA (1d or 2d) and "random Boolean networks" (networks with arbitrary connections and heterogeneous rules). The network may also have heterogeneous neighborhood sizes. GENESIS · Web site: www.genesis-sim.org/GENESIS/ · FTP site: genesis-sim.org/pub/genesis/ GENESIS (short for GEneral NEural SImulation System) is a general purpose simulation platform which was developed to support the simulation of neural systems ranging from complex models of single neurons to simulations of large networks made up of more abstract neuronal components. GENESIS has provided the basis for laboratory courses in neural simulation at both Caltech and the Marine Biological Laboratory in Woods Hole, MA, as well as several other institutions. Most current GENESIS applications involve realistic simulations of biological neural systems. Although the software can also model more abstract networks, other simulators are more suitable for backpropagation and similar connectionist modeling. JavaBayes · Web site: www.cs.cmu.edu/People/javabayes/index.html/ The JavaBayes system is a set of tools, containing a graphical editor, a core inference engine and a parser. JavaBayes can produce: · the marginal distribution for any variable in a network. · the expectations for univariate functions (for example, expected value for variables). · configurations with maximum a posteriori probability. · configurations with maximum a posteriori expectation for univariate functions. Jbpe · Web site: cs.felk.cvut.cz/~koutnij/studium/jbpe.html Jbpe is a back-propagation neural network editor/simulator. Features · Standart back-propagation networks creation. · Saving network as a text file, which can be edited and loaded back. · Saving/loading binary file · Learning from a text file (with structure specified below), number of learning periods / desired network energy can be specified as a criterion. · Network recall Neural Network Generator · Web site: www.idsia.ch/~rafal/research.html · FTP site: ftp.idsia.ch/pub/rafal The Neural Network Generator is a genetic algorithm for the topological optimization of feedforward neural networks. It implements the Semantic Changing Genetic Algorithm and the Unit- Cluster Model. The Semantic Changing Genetic Algorithm is an extended genetic algorithm that allows fast dynamic adaptation of the genetic coding through population analysis. The Unit- Cluster Model is an approach to the construction of modular feedforward networks with a ''backbone'' structure. NOTE: To compile this on Linux requires one change in the Makefiles. You will need to change '-ltermlib' to '-ltermcap'. Neureka ANS (nn/xnn) · FTP site: ftp.ii.uib.no/pub/neureka/ nn is a high-level neural network specification language. The current version is best suited for feed-forward nets, but recurrent models can and have been implemented, e.g. Hopfield nets, Jordan/Elman nets, etc. In nn, it is easy to change network dynamics. The nn compiler can generate C code or executable programs (so there must be a C compiler available), with a powerful command line interface (but everything may also be controlled via the graphical interface, xnn). It is possible for the user to write C routines that can be called from inside the nn specification, and to use the nn specification as a function that is called from a C program. Please note that no programming is necessary in order to use the network models that come with the system (`netpack'). xnn is a graphical front end to networks generated by the nn compiler, and to the compiler itself. The xnn graphical interface is intuitive and easy to use for beginners, yet powerful, with many possibilities for visualizing network data. NOTE: You have to run the install program that comes with this to get the license key installed. It gets put (by default) in /usr/lib. If you (like myself) want to install the package somewhere other than in the /usr directory structure (the install program gives you this option) you will have to set up some environmental variables (NNLIBDIR & NNINCLUDEDIR are required). You can read about these (and a few other optional variables) in appendix A of the documentation (pg 113). NEURON · Web site: www.neuron.yale.edu/ · FTP site: ftp.neuron.yale.edu/neuron/unix/ NEURON is an extensible nerve modeling and simulation program. It allows you to create complex nerve models by connecting multiple one-dimensional sections together to form arbitrary cell morphologies, and allows you to insert multiple membrane properties into these sections (including channels, synapses, ionic concentrations, and counters). The interface was designed to present the neural modeler with a intuitive environment and hide the details of the numerical methods used in the simulation. PDP++ · Web site: www.cnbc.cmu.edu/Resources/PDP++/ · FTP site (US): cnbc.cmu.edu/pub/pdp++/ · FTP site (Europe): unix.hensa.ac.uk/mirrors/pdp++/ As the field of Connectionist modeling has grown, so has the need for a comprehensive simulation environment for the development and testing of Connectionist models. Our goal in developing PDP++ has been to integrate several powerful software development and user interface tools into a general purpose simulation environment that is both user friendly and user extensible. The simulator is built in the C++ programming language, and incorporates a state of the art script interpreter with the full expressive power of C++. The graphical user interface is built with the Interviews toolkit, and allows full access to the data structures and processing modules out of which the simulator is built. We have constructed several useful graphical modules for easy interaction with the structure and the contents of neural networks, and we've made it possible to change and adapt many things. At the programming level, we have set things up in such a way as to make user extensions as painless as possible. The programmer creates new C++ objects, which might be new kinds of units or new kinds of processes; once compiled and linked into the simulator, these new objects can then be accessed and used like any other. RNS · Web site: www.cs.cmu.edu/afs/cs/project/ai- repository/ai/areas/neural/systems/rns/ RNS (Recurrent Network Simulator) is a simulator for recurrent neural networks. Regular neural networks are also supported. The program uses a derivative of the back-propagation algorithm, but also includes other (not that well tested) algorithms. Features include · freely choosable connections, no restrictions besides memory or CPU constraints · delayed links for recurrent networks · fixed values or thresholds can be specified for weights · (recurrent) back-propagation, Hebb, differential Hebb, simulated annealing and more · patterns can be specified with bits, floats, characters, numbers, and random bit patterns with Hamming distances can be chosen for you · user definable error functions · output results can be used without modification as input Simple Neural Net (in Python) · Web site: http://www.amk.ca/python/unmaintained/ Simple neural network code, which implements a class for 3-level networks (input, hidden, and output layers). The only learning rule implemented is simple backpropagation. No documentation (or even comments) at all, because this is simply code that I use to experiment with. Includes modules containing sample datasets from Carl G. Looney's NN book. Requires the Numeric extensions. SCNN · Web site: www.uni-frankfurt.de/fb13/iap/e_ag_rt/SCNN/ SCNN is an universal simulating system for Cellular Neural Networks (CNN). CNN are analog processing neural networks with regular and local interconnections, governed by a set of nonlinear ordinary differential equations. Due to their local connectivity, CNN are realized as VLSI chips, which operates at very high speed. Semantic Networks in Python · Web site: strout.net/info/coding/python/ai/index.html The semnet.py module defines several simple classes for building and using semantic networks. A semantic network is a way of representing knowledge, and it enables the program to do simple reasoning with very little effort on the part of the programmer. The following classes are defined: · Entity: This class represents a noun; it is something which can be related to other things, and about which you can store facts. · Relation: A Relation is a type of relationship which may exist between two entities. One special relation, "IS_A", is predefined because it has special meaning (a sort of logical inheritance). · Fact: A Fact is an assertion that a relationship exists between two entities. With these three object types, you can very quickly define knowledge about a set of objects, and query them for logical conclusions. SNNS · Web site: www-ra.informatik.uni-tuebingen.de/SNNS/ · FTP site: ftp.informatik.uni-stuttgart.de/pub/SNNS/ Stuttgart Neural Net Simulator (version 4.1). An awesome neural net simulator. Better than any commercial simulator I've seen. The simulator kernel is written in C (it's fast!). It supports over 20 different network architectures, has 2D and 3D X-based graphical representations, the 2D GUI has an integrated network editor, and can generate a separate NN program in C. SNNS is very powerful, though a bit difficult to learn at first. To help with this it comes with example networks and tutorials for many of the architectures. ENZO, a supplementary system allows you to evolve your networks with genetic algorithms. SPRLIB/ANNLIB · Web site: www.ph.tn.tudelft.nl/~sprlib/ SPRLIB (Statistical Pattern Recognition Library) was developed to support the easy construction and simulation of pattern classifiers. It consist of a library of functions (written in C) that can be called from your own program. Most of the well-known classifiers are present (k-nn, Fisher, Parzen, ....), as well as error estimation and dataset generation routines. ANNLIB (Artificial Neural Networks Library) is a neural network simulation library based on the data architecture laid down by SPRLIB. The library contains numerous functions for creating, training and testing feed-forward networks. Training algorithms include back-propagation, pseudo-Newton, Levenberg-Marquardt, conjugate gradient descent, BFGS.... Furthermore, it is possible - due to the datastructures' general applicability - to build Kohonen maps and other more exotic network architectures using the same data types. TOOLDIAG · Web site: www.inf.ufes.br/~thomas/home/soft.html · Alt site: http://www.cs.cmu.edu/afs/cs/project/ai- repository/ai/areas/neural/systems/tooldiag/0.html TOOLDIAG is a collection of methods for statistical pattern recognition. The main area of application is classification. The application area is limited to multidimensional continuous features, without any missing values. No symbolic features (attributes) are allowed. The program in implemented in the 'C' programming language and was tested in several computing environments. XNBC · Web site: www.b3e.jussieu.fr/xnbc/ XNBC v8 is a simulation tool for the neuroscientists interested in simulating biological neural networks using a user friendly tool. XNBC is a software package for simulating biological neural networks. Four neuron models are available, three phenomenologic models (xnbc, leaky integrator and conditional burster) and an ion- conductance based model. Inputs to the simulated neurons can be provided by experimental data stored in files, allowing the creation of `hybrid'' networks. 4. Evolutionary Computing Evolutionary computing is actually a broad term for a vast array of programming techniques, including genetic algorithms, complex adaptive systems, evolutionary programming, etc. The main thrust of all these techniques is the idea of evolution. The idea that a program can be written that will evolve toward a certain goal. This goal can be anything from solving some engineering problem to winning a game. 4.1. EC class/code libraries These are libraries of code or classes for use in programming within the evolutionary computation field. They are not meant as stand alone applications, but rather as tools for building your own applications. daga · Web site: garage.cps.msu.edu/software/software-index.html daga is an experimental release of a 2-level genetic algorithm compatible with the GALOPPS GA software. It is a meta-GA which dynamically evolves a population of GAs to solve a problem presented to the lower-level GAs. When multiple GAs (with different operators, parameter settings, etc.) are simultaneously applied to the same problem, the ones showing better performance have a higher probability of surviving and "breeding" to the next macro-generation (i.e., spawning new "daughter"-GAs with characteristics inherited from the parental GA or GAs. In this way, we try to encourage good problem- solving strategies to spread to the whole population of GAs. Ease · Web site: www.sprave.com/Ease/Ease.html Ease - Evolutionary Algorithms Scripting Evironment - is an extension to the Tcl scripting language, providing commands to create, modify, and evaluate populations of individuals represented by real number vectors and/or bit strings. EO · Web site: eodev.sourceforge.net EO is a templates-based, ANSI-C++ compliant evolutionary computation library. It contains classes for any kind of evolutionary computation (specially genetic algorithms) you might come up to. It is component-based, so that if you don't find the class you need in it, it is very easy to subclass existing abstract or concrete class. FORTRAN GA · Web site: cuaerospace.com/carroll/ga.html This program is a FORTRAN version of a genetic algorithm driver. This code initializes a random sample of individuals with different parameters to be optimized using the genetic algorithm approach, i.e. evolution via survival of the fittest. The selection scheme used is tournament selection with a shuffling technique for choosing random pairs for mating. The routine includes binary coding for the individuals, jump mutation, creep mutation, and the option for single-point or uniform crossover. Niching (sharing) and an option for the number of children per pair of parents has been added. More recently, an option for the use of a micro-GA has been added. GAlib: Matthew's Genetic Algorithms Library · Web Site: lancet.mit.edu/ga/ · FTP site: lancet.mit.edu/pub/ga/ · Register GAlib at: lancet.mit.edu/ga/Register.html GAlib contains a set of C++ genetic algorithm objects. The library includes tools for using genetic algorithms to do optimization in any C++ program using any representation and genetic operators. The documentation includes an extensive overview of how to implement a genetic algorithm as well as examples illustrating customizations to the GAlib classes. GALOPPS · Web site: garage.cps.msu.edu/software/software-index.html · FTP site: garage.cps.msu.edu/pub/GA/galopps/ GALOPPS is a flexible, generic GA, in 'C'. It was based upon Goldberg's Simple Genetic Algorithm (SGA) architecture, in order to make it easier for users to learn to use and extend. GALOPPS extends the SGA capabilities several fold: · (optional) A new Graphical User Interface, based on TCL/TK, for Unix users, allowing easy running of GALOPPS 3.2 (single or multiple subpopulations) on one or more processors. GUI writes/reads "standard" GALOPPS input and master files, and displays graphical output (during or after run) of user-selected variables. · 5 selection methods: roulette wheel, stochastic remainder sampling, tournament selection, stochastic universal sampling, linear-ranking-then-SUS. · Random or superuniform initialization of "ordinary" (non- permutation) binary or non-binary chromosomes; random initialization of permutation-based chromosomes; or user- supplied initialization of arbitrary types of chromosomes. · Binary or non-binary alphabetic fields on value-based chromosomes, including different user-definable field sizes. · 3 crossovers for value-based representations: 1-pt, 2-pt, and uniform, all of which operate at field boundaries if a non- binary alphabet is used. · 4 crossovers for order-based reps: PMX, order-based, uniform order-based, and cycle. · 4 mutations: fast bitwise, multiple-field, swap and random sublist scramble. · Fitness scaling: linear scaling, Boltzmann scaling, sigma truncation, window scaling, ranking. · Plus a whole lot more.... GAS · Web site: starship.skyport.net/crew/gandalf GAS means "Genetic Algorithms Stuff". GAS is freeware. Purpose of GAS is to explore and exploit artificial evolutions. Primary implementation language of GAS is Python. The GAS software package is meant to be a Python framework for applying genetic algorithms. It contains an example application where it is tried to breed Python program strings. This special problem falls into the category of Genetic Programming (GP), and/or Automatic Programming. Nevertheless, GAS tries to be useful for other applications of Genetic Algorithms as well. GAUL · Web site: gaul.sourceforge.net The Genetic Algorithm Utility Library (GAUL) is a flexible programming library designed to aid development of applications that require the use of genetic algorithms. Features include: · Darwinian, Lamarckian or Baldwinian evolutionary schemes. · Both steady-state and generation-based GAs included. · The island model of evolution is available. · Chromosome datatype agnostic. A selection of common chromosome types are built-in. · Allows user-defined crossover, mutation, selection, adaptation and replacement operators. · Support for multiple, simultaneously evolved,populations. · Choice of high-level or low-level interface functions. · Additional, non-GA, optimisation algorithms are built-in for local optimisation or comparative purposes. · Trivial to extend using external code via the built-in code hooks. · May be driven by, or extended by, powerful S-Lang scripts. · Support for multiprocessor calculations. · Written using highly portable C code. GECO · FTP site: ftp://ftp.aic.nrl.navy.mil/pub/galist/src/ GECO (Genetic Evolution through Combination of Objects), an extendible object-oriented tool-box for constructing genetic algorithms (in Lisp). It provides a set of extensible classes and methods designed for generality. Some simple examples are also provided to illustrate the intended use. Genetic · Web site: ??? · You can get it from the debian repository: packages.qa.debian.org/g/genetic.html This is a package for genetic algorythms and AI in Python. Genetic can typically solve ANY problem that consists to minimize a function. It also includes several demos / examples, like the TSP (traveling saleman problem). GPdata · FTP site: ftp.cs.bham.ac.uk/pub/authors/W.B.Langdon/gp-code/ · Documentation (GPdata-icga-95.ps): cs.ucl.ac.uk/genetic/papers/ GPdata-3.0.tar.gz (C++) contains a version of Andy Singleton's GP-Quick version 2.1 which has been extensively altered to support: · Indexed memory operation (cf. teller) · multi tree programs · Adfs · parameter changes without recompilation · populations partitioned into demes · (A version of) pareto fitness This ftp site also contains a small C++ program (ntrees.cc) to calculate the number of different there are of a given length and given function and terminal set. gpjpp Genetic Programming in Java · The code can be found in the tarball linked from "GP and Othello Java code and READMEs" on this page: http://www1.cs.columbia.edu/~evs/ml/hw4.html gpjpp is a Java package I wrote for doing research in genetic programming. It is a port of the gpc++ kernel written by Adam Fraser and Thomas Weinbrenner. Included in the package are four of Koza's standard examples: the artificial ant, the hopping lawnmower, symbolic regression, and the boolean multiplexer. Here is a partial list of its features: · graphic output of expression trees · efficient diversity checking · Koza's greedy over-selection option for large populations · extensible GPRun class that encapsulates most details of a genetic programming test · more robust and efficient streaming code, with automatic checkpoint and restart built into the GPRun class · an explicit complexity limit that can be set on each GP · additional configuration variables to allow more testing without recompilation · support for automatically defined functions (ADFs) · tournament and fitness proportionate selection · demetic grouping · optional steady state population · subtree crossover · swap and shrink mutation jaga · Web site: cs.felk.cvut.cz/~koutnij/studium/jaga/jaga.html Simple genetic algorithm package written in Java. lil-gp · Web site: GARAGe.cps.msu.edu/software/software-index.html#lilgp · FTP site: garage.cps.msu.edu/pub/GA/lilgp/ patched lil-gp * · Web site: www.cs.umd.edu/users/seanl/gp/ lil-gp is a generic 'C' genetic programming tool. It was written with a number of goals in mind: speed, ease of use and support for a number of options including: · Generic 'C' program that runs on UNIX workstations · Support for multiple population experiments, using arbitrary and user settable topologies for exchange, for a single processor (i.e., you can do multiple population gp experiments on your PC). · lil-gp manipulates trees of function pointers which are allocated in single, large memory blocks for speed and to avoid swapping. * The patched lil-gp kernel is strongly-typed, with modifications on multithreading, coevolution, and other tweaks and features. Lithos · Web site: www.esatclear.ie/~rwallace/lithos.html Lithos is a stack based evolutionary computation system. Unlike most EC systems, its representation language is computationally complete, while also being faster and more compact than the S- expressions used in genetic programming. The version presented here applies the system to the game of Go, but can be changed to other problems by simply plugging in a different evaluation function. ANSI C source code is provided. PGAPack Parallel Genetic Algorithm Library · Web site: www- fp.mcs.anl.gov/CCST/research/reports_pre1998/comp_bio/stalk/pgapack.html · FTP site: ftp.mcs.anl.gov/pub/pgapack/ PGAPack is a general-purpose, data-structure-neutral, parallel genetic algorithm library. It is intended to provide most capabilities desired in a genetic algorithm library, in an integrated, seamless, and portable manner. Key features are in PGAPack V1.0 include: · Callable from Fortran or C. · Runs on uniprocessors, parallel computers, and workstation networks. · Binary-, integer-, real-, and character-valued native data types. · Full extensibility to support custom operators and new data types. · Easy-to-use interface for novice and application users. · Multiple levels of access for expert users. · Parameterized population replacement. · Multiple crossover, mutation, and selection operators. · Easy integration of hill-climbing heuristics. · Extensive debugging facilities. · Large set of example problems. · Detailed users guide. PIPE · Web site: www.idsia.ch/~rafal/research.html · FTP site: ftp.idsia.ch/pub/rafal Probabilistic Incremental Program Evolution (PIPE) is a novel technique for automatic program synthesis. The software is written in C. It · is easy to install (comes with an automatic installation tool). · is easy to use: setting up PIPE_V1.0 for different problems requires a minimal amount of programming. User-written, application- independent program parts can easily be reused. · is efficient: PIPE_V1.0 has been tuned to speed up performance. · is portable: comes with source code (optimized for SunOS 5.5.1). · is extensively documented(!) and contains three example applications. · supports statistical evaluations: it facilitates running multiple experiments and collecting results in output files. · includes testing tool for testing generalization of evolved programs. · supports floating point and integer arithmetic. · has extensive output features. · For lil-gp users: Problems set up for lil-gp 1.0 can be easily ported to PIPE_v1.0. The testing tool can also be used to process programs evolved by lil-gp 1.0. pygp · Web site: pygp.sourceforge.net/ Your basic genetic algorithm package for python. Sugal · Web site: www.trajan-software.demon.co.uk/sugal.htm Sugal [soo-gall] is the SUnderland Genetic ALgorithm system. The aim of Sugal is to support research and implementation in Genetic Algorithms on a common software platform. As such, Sugal supports a large number of variants of Genetic Algorithms, and has extensive features to support customization and extension. 4.2. EC software kits/applications These are various applications, software kits, etc. meant for research in the field of evolutionary computing. Their ease of use will vary, as they were designed to meet some particular research interest more than as an easy to use commercial package. ADATE · Web site: www-ia.hiof.no/~rolando/adate_intro.html ADATE (Automatic Design of Algorithms Through Evolution) is a system for automatic programming i.e., inductive inference of algorithms, which may be the best way to develop artificial and general intelligence. The ADATE system can automatically generate non-trivial and novel algorithms. Algorithms are generated through large scale combinatorial search that employs sophisticated program transformations and heuristics. The ADATE system is particularly good at synthesizing symbolic, functional programs and has several unique qualities. esep & xesep · Web site(esep): www.iit.edu/~elrad/esep.html · Web site(xesep): www.iit.edu/~elrad/xesep.html This is a new scheduler, called Evolution Scheduler, based on Genetic Algorithms and Evolutionary Programming. It lives with original Linux priority scheduler.This means you don't have to reboot to change the scheduling policy. You may simply use the manager program esep to switch between them at any time, and esep itself is an all-in-one for scheduling status, commands, and administration. We didn't intend to remove the original priority scheduler; instead, at least, esep provides you with another choice to use a more intelligent scheduler, which carries out natural competition in an easy and effective way. Xesep is a graphical user interface to the esep (Evolution Scheduling and Evolving Processes). It's intended to show users how to start, play, and feel the Evolution Scheduling and Evolving Processes, including sub-programs to display system status, evolving process status, queue status, and evolution scheduling status periodically in as small as one mini-second. Corewars · Web site: corewars.sourceforge.net/ · SourceForge site: sourceforge.net/projects/corewars/ Corewars is a game which simulates a virtual machine with a number of programs. Each program tries to crash the others. The program that lasts the longest time wins. A number of sample programs are provided and new programs can be written by the player. Screenshots are available at the Corewars homepage. Corewar VM · Web site: www.jedi.claranet.fr/ This is a virtual machine written in Java (so it is a virtual machine for another virtual machine !) for a Corewar game. Grany-3 · Web site: guillaume.cottenceau.free.fr/html/grany.html Grany-3 is a full-featured cellular automaton simulator, made in C++ with Gtk--, flex++/bison++, doxygen and gettext, useful to granular media physicists. JCASim · Web site: www.jweimar.de/jcasim/ JCASim is a general-purpose system for simulating cellular automata in Java. It includes a stand-alone application and an applet for web presentations. The cellular automata can be specified in Java, in CDL, or using an interactive dialogue. The system supports many different lattice geometries (1-D, 2-D square, hexagonal, triangular, 3-D), neighborhoods, boundary conditions, and can display the cells using colors, text, or icons. JGProg · Web site: jgprog.sourceforge.net Genetic Programming (JGProg) is an open-source Java implementation of a strongly-typed Genetic Programming experimentation platform. Two example "worlds" are provided, in which a population evolves and solves the problem. 5. Alife & Complex Systems Alife takes yet another approach to exploring the mysteries of intelligence. It has many aspects similar to EC and Connectionism, but takes these ideas and gives them a meta-level twist. Alife emphasizes the development of intelligence through emergent behavior of complex adaptive systems. Alife stresses the social or group based aspects of intelligence. It seeks to understand life and survival. By studying the behaviors of groups of 'beings' Alife seeks to discover the way intelligence or higher order activity emerges from seemingly simple individuals. Cellular Automata and Conway's Game of Life are probably the most commonly known applications of this field. Complex Systems (abbreviated CS) are very similar to alife in the way the are approached, just more general in definition (ie. alife is a type of complex system). Usually complex system software takes the form of a simulator. 5.1. Alife & CS class/code libraries These are libraries of code or classes for use in programming within the artificial life field. They are not meant as stand alone applications, but rather as tools for building your own applications. AgentFarms · Web site: www.agentfarms.net Agent Farms is a system for modelling and simulation of complex, multi-agent based systems. The system can be used for: · Creating models of multi-agent systems · Interactive and distributed simulation · Observation and visualisation of the simulation · Population modification and migration Biome · Web site: wosx30.eco-station.uni-wuerzburg.de/~martin/biome/ Biome is a C++ library aimed at individual-based/agent-based simulations. It is somewhat similar to Swarm, EcoSim or Simex but tries to be more efficient and less monolithic without compromising object- oriented design. Currently there is an event based scheduling system, a C++ified Mersenne-Twister RNG, several general analysis classes, some Qt-based GUI classes, a very basic persistence/database framework (used also for parameter storage) and many other small useful things. CAGE · Web site: www.alcyone.com/software/cage/ CAGE is a fairy generic and complete cellular automaton simulation engine in Python. It supports both 1D and 2D automata, a variety of prepackaged rules, and the concept of "agents" which can move about independently on the map for implementing agent behavior. CASE · Web site: www.iu.hio.no/~cell/ · FTP site: ftp.iu.hio.no/pub/ CASE (Cellular Automaton Simulation Environment) is a C++ toolkit for visualizing discrete models in two dimensions: so- called cellular automata. The aim of this project is to create an integrated framework for creating generalized cellular automata using the best, standardized technology of the day. EcoSim · Web site: www.offis.de/projekte/ig/ecotools/index3_e.php In EcoSim an ecosystem is described by all static and dynamic properties of the individuals involved in the system as well as time varying properties of the environment. Individuals change their state over time or due to internal and external events. The environment is also defined via dynamic objects which can change. Supports on the fly analysis and animation of generated data. It is a C++ class library designed to support individual- oriented modelling and simulation of ecological systems. Integrating Modelling Toolkit · Web site: sourceforge.net/projects/imt/ The Integrating Modelling Toolkit (IMT) is a generic, comprehensive, and extensible set of abstractions allowing definition and use of interoperable model components. Modellers create an IMT "world" made of IMT "agents" that will perform each a particular phase of a modelling task. The core set of IMT agents can describe generic, modular, distributed model components, either native to the IMT or integrating existing simulation toolkits, specialized for tasks that range from simple calculation of functions in an interpreted language to spatially explicit simulation, model optimization, GIS analysis, visualization and advanced statistical analysis. IMT agents are designed to easily "glue" together in higher-level simulations integrating different modelling paradigms and toolkits. The IMT can be easily extended by users and developers through a convenient plug-in mechanism MAML · Web site: www.maml.hu The current version of MAML is basically an extension to Objective-C (using the Swarm libraries). It consists of a couple of remaining must be filled with pure swarm-code. A MAML-to- Swarm (named xmc) compiler is also being developed which compiles the source code into a swarm application. Swarm · Web site: www.swarm.org · FTP site: ftp.swarm.org/pub/swarm/ The swarm Alife simulation kit. Swarm is a simulation environment which facilitates development and experimentation with simulations involving a large number of agents behaving and interacting within a dynamic environment. It consists of a collection of classes and libraries written in Objective-C and allows great flexibility in creating simulations and analyzing their results. It comes with three demos and good documentation. 5.2. Alife & CS software kits, applications, etc. These are various applications, software kits, etc. meant for research in the field of artificial life. Their ease of use will vary, as they were designed to meet some particular research interest more than as an easy to use commercial package. Achilles · Web site: achilles.sourceforge.net Achilles is an evolution simulation based on Larry Yaeger's PolyWorld. It uses Hebbian neural networks, and an extremely simplified physical model that allows virtual organisms to interact freely in a simulated environment. Avida · Web site: dllab.caltech.edu/avida/ The computer program avida is an auto-adaptive genetic system designed primarily for use as a platform in Artificial Life research. The avida system is based on concepts similar to those employed by the tierra program, that is to say it is a population of self-reproducing strings with a Turing-complete genetic basis subjected to Poisson-random mutations. The population adapts to the combination of an intrinsic fitness landscape (self-reproduction) and an externally imposed (extrinsic) fitness function provided by the researcher. By studying this system, one can examine evolutionary adaptation, general traits of living systems (such as self-organization), and other issues pertaining to theoretical or evolutionary biology and dynamic systems. BugsX · FTP site: http://surf.de.uu.net/zooland/download/packages/bugsx/ Display and evolve biomorphs. It is a program which draws the biomorphs based on parametric plots of Fourier sine and cosine series and let's you play with them using the genetic algorithm. The Cellular Automata Simulation System · Web site: staff.vbi.vt.edu/dana/ca/cellular.shtml The system consists of a compiler for the Cellang cellular automata programming language, along with the corresponding documentation, viewer, and various tools. Cellang has been undergoing refinement for the last several years (1991-1995), with corresponding upgrades to the compiler. Postscript versions of the tutorial and language reference manual are available for those wanting more detailed information. The most important distinguishing features of Cellang, include support for: · any number of dimensions; · compile time specification of each dimensions size; cell neighborhoods of any size (though bounded at compile time) and shape; · positional and time dependent neighborhoods; · associating multiple values (fields), including arrays, with each cell; · associating a potentially unbounded number of mobile agents [ Agents are mobile entities based on a mechanism of the same name in the Creatures system, developed by Ian Stephenson (ian@ohm.york.ac.uk).] with each cell; and · local interactions only, since it is impossible to construct automata that contain any global control or references to global variables. Creatures Docking Station · Linux info: http://www.simons-rock.edu/~rlovison/ This is a free version of the Creatures3 ALife game. It has fewer species and a small 'space-station' world, but can connect to other worlds over the internet and (if you have the windows version of the game) can connect to your C3 world. The game itself revolves around breeding and training the alife creatures, 'Norns'. Its strikes a pretty nice balance between fun and science, or so I'm told. dblife & dblifelib · FTP site: ibiblio.org/pub/Linux/science/ai/life/ dblife: Sources for a fancy Game of Life program for X11 (and curses). It is not meant to be incredibly fast (use xlife for that:-). But it IS meant to allow the easy editing and viewing of Life objects and has some powerful features. The related dblifelib package is a library of Life objects to use with the program. dblifelib: This is a library of interesting Life objects, including oscillators, spaceships, puffers, and other weird things. The related dblife package contains a Life program which can read the objects in the Library. Drone · Web site: pscs.physics.lsa.umich.edu/Software/Drone/ Drone is a tool for automatically running batch jobs of a simulation program. It allows sweeps over arbitrary sets of parameters, as well as multiple runs for each parameter set, with a separate random seed for each run. The runs may be executed either on a single computer or over the Internet on a set of remote hosts. Drone is written in Expect (an extension to the Tcl scripting language) and runs under Unix. It was originally designed for use with the Swarm agent-based simulation framework, but Drone can be used with any simulation program that reads parameters from the command line or from an input file. EcoLab · Web site: parallel.hpc.unsw.edu.au/rks/ecolab/ EcoLab is a system that implements an abstract ecology model. It is written as a set of Tcl/Tk commands so that the model parameters can easily be changed on the fly by means of editing a script. The model itself is written in C++. Game Of Life (GOL) · FTP site: ibiblio.org/pub/Linux/science/ai/life/ GOL is a simulator for conway's game of life (a simple cellular automata), and other simple rule sets. The emphasis here is on speed and scale, in other words you can setup large and fast simulations. gant · Web site: gant.sourceforge.net This project is an ANSI C++ implementation of the Generalized Langton Ant, which lives on a torus. gLife · Web site: glife.sourceforge.net · SourceForge site: sourceforge.net/projects/glife/ This program is similiar to "Conway's Game of Life" but yet it is very different. It takes "Conway's Game of Life" and applies it to a society (human society). This means there is a very different (and much larger) ruleset than in the original game. Things need to be taken into account such as the terrain, age, sex, culture, movement, etc Langton's Ant · Web site: www.theory.org/software/ant/ Langton's Ant is an example of a finite-state cellular automata. The ant (or ants) start out on a grid. Each cell is either black or white. If the ant is on a black square, it turns right 90 and moves forward one unit. If the ant is on a white square, it turns left 90 and moves forward one unit. And when the ant leaves a square, it inverts the color. The neat thing about Langton's Ant is that no matter what pattern field you start it out on, it eventually builds a "road," which is a series of 117 steps that repeat indefinitely, each time leaving the ant displaced one pixel vertically and horizontally. LEE · Web site: www.informatics.indiana.edu/fil/LEE/ LEE (Latent Energy Environments) is both an Alife model and a software tool to be used for simulations within the framework of that model. We hope that LEE will help understand a broad range of issues in theoretical, behavioral, and evolutionary biology. The LEE tool described here consists of approximately 7,000 lines of C code and runs in both Unix and Macintosh platforms. MATREM · Web site: www.phys.uu.nl/~romans/ Matrem is a computer program that simulates life. It belongs to the emerging science of "artificial life", which studies evolution and complex systems in general by simulation. Matrem is also a game, where players compete to create the fittest lifeform. Their efforts are the driving force behind the program. Noble Ape · Web site: www.nobleape.com/sim/ The Noble Ape Simulation has been developed (as the Nervana Simulation) since 1996. The aim of the simulation is to create a detailed biological environment and a cognitive simulation. The Simulation is intended as a palette for open source development. It provides a stable means of simulating large scale environments and cognitive processes. For MacOS Classic and X, with Java, Windows and Linux(Motif) versions in beta. Features a non-polygonal graphics engine (Ocelot) and a command-line version POSES++ · Web site: www.gpc.de/eindex.html The POSES++ software tool supports the development and simulation of models. Regarding the simulation technique models are suitable reproductions of real or planned systems for their simulative investigation. In all industrial sectors or branches POSES++ can model and simulate any arbitrary system which is based on a discrete and discontinuous behaviour. Also continuous systems can mostly be handled like discrete systems e.g., by quantity discretion and batch processing. Tierra · Web site: www.his.atr.jp/~ray/tierra/ Tierra's written in the C programming language. This source code creates a virtual computer and its operating system, whose architecture has been designed in such a way that the executable machine codes are evolvable. This means that the machine code can be mutated (by flipping bits at random) or recombined (by swapping segments of code between algorithms), and the resulting code remains functional enough of the time for natural (or presumably artificial) selection to be able to improve the code over time. TIN · Web site: www.jetlag.demon.nl This program simulates primitive life-forms, equipped with some basic instincts and abilities, in a 2D environment consisting of cells. By mutation new generations can prove their success, and thus passing on "good family values". The brain of a TIN can be seen as a collection of processes, each representing drives or impulses to behave a certain way, depending on the state/perception of the environment ( e.g. presence of food, walls, neighbors, scent traces) These behavior process currently are : eating, moving, mating, relaxing, tracing others, gathering food and killing. The process with the highest impulse value takes control, or in other words: the tin will act according to its most urgent need. XLIFE · FTP site: surf.de.uu.net/zooland/download/packages/xlife/ This program will evolve patterns for John Horton Conway's game of Life. It will also handle general cellular automata with the orthogonal neighborhood and up to 8 states (it's possible to recompile for more states, but very expensive in memory). Transition rules and sample patterns are provided for the 8-state automaton of E. F. Codd, the Wireworld automaton, and a whole class of `Prisoner's Dilemma' games. Xtoys · Web site: www.physics.mun.ca/~johnw/xtoys.html xtoys contains a set of cellular automata simulators for X windows. Programs included are: · xising --- a two dimensional Ising model simulator, · xpotts --- the two dimensional Potts model, · xautomalab --- a totalistic cellular automaton simulator, · xsand --- for the Bak, Tang, Wiesenfeld sandpile model, · xwaves --- demonstrates three different wave equations, · schrodinger --- play with the Scrodinger equation in an adjustable potential. 6. Agents Also known as intelligent software agents or just agents, this area of AI research deals with simple applications of small programs that aid the user in his/her work. They can be mobile (able to stop their execution on one machine and resume it on another) or static (live in one machine). They are usually specific to the task (and therefore fairly simple) and meant to help the user much as an assistant would. The most popular (ie. widely known) use of this type of application to date are the web robots that many of the indexing engines (eg. webcrawler) use. Agent · FTP site: www.cpan.org/modules/by- category/23_Miscellaneous_Modules/Agent/ The Agent is a prototype for an Information Agent system. It is both platform and language independent, as it stores contained information in simple packed strings. It can be packed and shipped across any network with any format, as it freezes itself in its current state. agentTool · Web site: en.afit.af.mil/ai/agentool.htm · Download site: en.afit.af.mil/ai/_vti_bin/shtml.dll/registration.htm Another Java based agent development framework. Fairly unique in that it emphasizes the use of a GUI for designing the system which will "semi-automatically synthesize multiagent systems to meet those requirements". You need a java enabled browser to download. :P Aglets Workbench · Web site: www.trl.ibm.com/aglets/index_e.htm An aglet is a Java object that can move from one host on the Internet to another. That is, an aglet that executes on one host can suddenly halt execution, dispatch to a remote host, and resume execution there. When the aglet moves, it takes along its program code as well as its state (data). A built-in security mechanism makes it safe for a computer to host untrusted aglets. The Java Aglet API (J-AAPI) is a proposed public standard for interfacing aglets and their environment. J-AAPI contains methods for initializing an aglet, message handling, and dispatching, retracting, deactivating/activating, cloning, and disposing of the aglet. J-AAPI is simple, flexible, and stable. Application developers can write platform-independent aglets and expect them to run on any host that supports J-AAPI. A.L.I.C.E. · Web site: www.alicebot.org The ALICE software implements AIML (Artificial Intelligence Markup Language), a non-standard evolving markup language for creating chat robots. The primary design feature of AIML is minimalism. Compared with other chat robot languages, AIML is perhaps the simplest. The pattern matching language is very simple, for example permitting only one wild-card ('*') match character per pattern. AIML is an XML language, implying that it obeys certain grammatical meta-rules. The choice of XML syntax permits integration with other tools such as XML editors. Another motivation for XML is its familiar look and feel, especially to people with HTML experience. Ara · Web site: wwwagss.informatik.uni-kl.de/Projekte/Ara/index_e.html Ara is a platform for the portable and secure execution of mobile agents in heterogeneous networks. Mobile agents in this sense are programs with the ability to change their host machine during execution while preserving their internal state. This enables them to handle interactions locally which otherwise had to be performed remotely. Ara's specific aim in comparison to similar platforms is to provide full mobile agent functionality while retaining as much as possible of established programming models and languages. BattleBots · Web site: www.bluefire.nu/battlebots/ AI programming game where you design the bot by selecting hardware and programming its CPU, then competing with other bots. Competitions can have teams and special rules for a game. The hardware for use in your bot includes weapons, engine, scanners, CPU, etc. The programming lauguage is dependent on the CPU type and is similar to an assembly language. Bee-gent · Web site: www2.toshiba.co.jp/beegent/index.htm Bee-gent is a new type of development framework in that it is a 100% pure agent system. As opposed to other systems which make only some use of agents, Bee-gent completely "Agentifies" the communication that takes place between software applications. The applications become agents, and all messages are carried by agents. Thus, Bee-gent allows developers to build flexible open distributed systems that make optimal use of existing applications. Bond · Web site: bond.cs.ucf.edu Yet another java agent system... Bond is a Java based distributed object system and agent framework. It implements a message based middleware and associated services like directory, persistence, monitoring and security. Bond allows to easily build multi agent, distributed applications. Another application of Bond will be a Virtual Laboratory supporting data annotation and metacomputing. Cadaver · Web site: www.erikyyy.de/cadaver/ Cadaver is a simulated world of cyborgs and nature in realtime. The battlefield consists of forests, grain, water, grass, carcass (of course) and lots of other things. The game server manages the game and the rules. You start a server and connect some clients. The clients communicate with the server using a very primitive protocol. They can order cyborgs to harvest grain, attack enemies or cut forest. The game is not intended to be played by humans! There is too much to control. Only for die-hards: Just telnet to the server and you can enter commands by hand. Instead the idea is that you write artificial intelligence clients to beat the other artificial intelligences. You can choose a language (and operating system) of your choice to do that task. It is enough to write a program that communicates on standard input and standard output channels. Then you can use programs like "socket" to connect your clients to the server. It is NOT needed to write TCP/IP code, although i did so :) The battle shall not be boring, and so there is the so called spyboss client that displays the action graphically on screen. Cougaar · Web site: www.cougaar.org/ Cougaar is java-based architecture for the construction of large-scale distributed agent-based applications. It is the product of a multi-year DARPA research project into large scale agent systems and includes not only the core architecture but also a variety of demonstration, visualization and management components to simplify the development of complex, distributed applications. [Yet another java based agent system -- ed.] D'Agent (was AGENT TCL) · Web site: agent.cs.dartmouth.edu/software/agent2.0/ · FTP site: ftp.cs.dartmouth.edu/pub/agents/ A transportable agent is a program that can migrate from machine to machine in a heterogeneous network. The program chooses when and where to migrate. It can suspend its execution at an arbitrary point, transport to another machine and resume execution on the new machine. For example, an agent carrying a mail message migrates first to a router and then to the recipient's mailbox. The agent can perform arbitrarily complex processing at each machine in order to ensure that the message reaches the intended recipient. DIET Agents · Web site: diet-agents.sourceforge.net DIET Agents is a lightweight, scalable and robust multi-agent platform in Java. It is especially suitable for rapidly developing P2P prototype applications and/or adaptive, distributed applications that use bottom-up, nature-inspired techniques. Dunce · Web site: www.boswa.com/boswabits/ Dunce is a simple chatterbot (conversational AI) and a language for programming such chatterbots. It uses a basic regex pattern matching and a semi-neural rule/response firing mechanism (with excitement/decay cycles). Dunce is listed about halfway down the page. FIPA-OS · Web site: fipa-os.sourceforge.net · Secondary Web site: www.nortelnetworks.com/products/announcements/fipa/ FIPA-OS is an open source implementation of the mandatory elements contained within the FIPA specification for agent interoperability. In addition to supporting the FIPA interoperability concepts, FIPA-OS also provides a component based architecture to enable the development of domain specific agents which can utilise the services of the FIPA Platform agents. It is implemented in Java. FishMarket · Web site: www.iiia.csic.es/Projects/fishmarket/ FM - The FishMarket project conducted at the Artificial Intelligence Research Institute (IIIA-CSIC) attempts to contribute in that direction by developing FM, an agent-mediated electronic auction house which has been evolved into a test-bed for electronic auction markets. The framework, conceived and implemented as an extension of FM96.5 (a Java-based version of the Fishmarket auction house), allows to define trading scenarios based on fish market auctions (Dutch auctions). FM provides the framework wherein agent designers can perform controlled experimentation in such a way that a multitude of experimental market scenarios--that we regard as tournament scenarios due to the competitive nature of the domain-- of varying degrees of realism and complexity can be specified, activated, and recorded; and trading (buyer and seller) heterogeneous (human and software) agents compared, tuned and evaluated. GNU Robots · Web site: www.gnu.org/software/robots/robots.html GNU Robots is a game/diversion where you construct a program for a little robot, then watch him explore a world. The world is filled with baddies that can hurt you, objects that you can bump into, and food that you can eat. The goal of the game is to collect as many prizes as possible before are killed by a baddie or you run out of energy. Robots can be written in Guile scheme or using a GUI. Grasshopper · Web site: www.grasshopper.de/ Another Java agent system. Full featured and actively developed. Commercial, but free. Historically targeted at embedded systems. Hive · Web site: hive.sourceforge.net Hive is a Java software platform for creating distributed applications. Using Hive, programmers can easily create systems that connect and use data from all over the Internet. At its heart, Hive is an environment for distributed agents to live, communicating and moving to fulfill applications. We are trying to make the Internet alive. ICM · Web site: www.nar.fujitsulabs.com/ · SourceForge site: sourceforge.net/projects/networkagent/ The Inter-Agent Communication Model (ICM) is a communication mechanism that can be used for sending messages between agents in an asynchronous fashion. Its intended application area is as a transportation mechanism for agent communication languages (ACLs), such as KQML and FIPA's ACL. Jacomma · Web site: jacomma.sourceforge.net · SourceForge site: sourceforge.net/projects/jacomma/ Jacomma is an agent development platform/framework for developing distributed, mobile, and reactive information agents with heterogeneous communication capabilities, in Java and JPython. Jacomma provides a development framework and an execution environment, which sits on top of the Inter-Agent Communication Model infrastructure. The ICM defines a communication protocol, a store and forward messaging architecture, and low level communication infrastructure for message exchange. Communication is truly asynchronous, based on TCP sockets. ICM has an entry in this howto, or you can find it via a link off the site. Jade · Web site: sharon.cselt.it/projects/jade/ JADE (Java Agent DEvelopment Framework) is a software framework fully implemented in Java language. It simplifies the implementation of multi-agent systems through a middle-ware that claims to comply with the FIPA specifications and through a set of tools that supports the debugging and deployment phase. The agent platform can be distributed across machines (which not even need to share the same OS) and the configuration can be controlled via a remote GUI. The configuration can be even changed at run-time by moving agents from one machine to another one, as and when required. JAM Agent · Web site: www.marcush.net/IRS/irs_downloads.html JAM supports both top-down, goal-based reasoning and bottom-up data-driven reasoning. JAM selects goals and plans based on maximal priority if metalevel reasoning is not used, or user- developed metalevel reasoning plans if they exist. JAM's conceptualization of goals and goal achievement is more classically defined (UMPRS is more behavioral performance-based than truly goal-based) and makes the distinction between plans to achieve goals and plans that simply encode behaviors. Goal- types implemented include achievement (attain a specified world state), maintenance (re-attain a specified world state), and performance. Execution of multiple simultaneous goals are supported, with suspension and resumption capabilities for each goal (i.e., intention) thread. JAM plans have explicit precondition and runtime attributes that restrict their applicability, a postcondition attribute, and a plan attributes section for specifying plan/domain-specific plan features. Available plan constructs include: sequencing, iteration, subgoaling, atomic (i.e., non-interruptable) plan segments, n- branch deterministic and non-deterministic conditional execution, parallel execution of multiple plan segments, goal- based or world state-based synchronization, an explicit failure- handling section, and Java primitive function definition through building it into JAM as well as the invocation of predefined (i.e., legacy) class members via Java's reflection capabilities without having to build it into JAM. JASA · Web site: www.csc.liv.ac.uk/~sphelps/jasa · Alt Web site: sourceforge.net/projects/jasa/ JASA is a high performance auction simulator suitable for conducting experiments in agent-based computational economics. It implements various auction mechanisms, trading strategies and experiments described in the computational economics literature, and as the software matures we hope that it will become a repository for reference implementations of commonly used mechanisms, strategies and learning algorithms. Jason · Web site: jason.sourceforge.net A Java-based interpreter for an extended version of AgentSpeak. Unlike other BDI (Beliefs-Desires-Intentions) agent tools, Jason implements the operational semantics of AgentSpeak, a BDI logic programming language extensively discussed in the literature. It is available as Open Source under GNU LGPL. JATLite · Web site: java.stanford.edu/ JATLite is providing a set of java packages which makes easy to build multi-agent systems using Java. JATLite provides only light-weight, small set of packages so that the developers can handle all the packages with little efforts. For flexibility JATLite provides four different layers from abstract to Router implementation. A user can access any layer we are providing. Each layer has a different set of assumptions. The user can choose an appropriate layer according to the assumptions on the layer and user's application. The introduction page contains JATLite features and the set of assumptions for each layer. JATLiteBeans · Web site: waitaki.otago.ac.nz/JATLiteBean/ · Improved, easier-to-use interface to JATLite features including KQML message parsing, receiving, and sending. · Extensible architecture for message handling and agent "thread of control" management · Useful functions for parsing of simple KQML message content · JATLiteBean supports automatic advertising of agent capabilities to facilitator agents · Automatic, optional, handling of the "forward" performative · Generic configuration file parser · KQML syntax checker Java(tm) Agent Template · Web site: www-cdr.stanford.edu/ABE/JavaAgent.html The JAT provides a fully functional template, written entirely in the Java language, for constructing software agents which communicate peer-to-peer with a community of other agents distributed over the Internet. Although portions of the code which define each agent are portable, JAT agents are not migratory but rather have a static existence on a single host. This behavior is in contrast to many other "agent" technologies. (However, using the Java RMI, JAT agents could dynamically migrate to a foreign host via an agent resident on that host). Currently, all agent messages use KQML as a top-level protocol or message wrapper. The JAT includes functionality for dynamically exchanging "Resources", which can include Java classes (e.g. new languages and interpreters, remote services, etc.), data files and information inlined into the KQML messages. Khepera Simulator · Web site: diwww.epfl.ch/lami/team/michel/khep-sim/ Khepera Simulator is a public domain software package written by Olivier MICHEL during the preparation of his Ph.D. thesis, at the Laboratoire I3S, URA 1376 of CNRS and University of Nice- Sophia Antipolis, France. It allows to write your own controller for the mobile robot Khepera using C or C++ languages, to test them in a simulated environment and features a nice colorful X11 graphical interface. Moreover, if you own a Khepera robot, it can drive the real robot using the same control algorithm. It is mainly oriented toward to researchers studying autonomous agents. lyntin · Web site: lyntin.sourceforge.net/ Lyntin is an extensible Mud client and framework for the creation of autonomous agents, or bots, as well as mudding in general. Lyntin is centered around Python, a dynamic, object- oriented, and fun programming language and based on TinTin++ a lovely mud client. Mole · Web site: mole.informatik.uni-stuttgart.de/ Mole is an agent system supporting mobile agents programmed in Java. Mole's agents consist of a cluster of objects, which have no references to the outside, and as a whole work on tasks given by the user or another agent. They have the ability to roam a network of "locations" autonomously. These "locations" are an abstraction of real, existing nodes in the underlying network. They can use location-specific resources by communicating with dedicated agents representing these services. Agents are able to use services provided by other agents and to provide services as well. Narval · Web site: www.logilab.org Narval is the acronym of "Network Assistant Reasoning with a Validating Agent Language". It is a personal network assistant based on artificial intelligence and agent technologies. It executes recipes (sequences of actions) to perform tasks. It is easy to specify a new action using XML and to implement it using Python. Recipes can be built and debugged using a graphical interface. NeL · Web site: www.nevrax.org NeL is actually a game development library (for massive multi- player games), but I'm including it here as it (will) include a fairly sizable AI library. Here's a blurb from the whitepaper: The purpose of the AI library is to provide a pragmatic approach to creating a distributed agents platform. Its focus is agents; individual entities that communicate regardless of location, using an action-reaction model. OAA · Web site: www.ai.sri.com/~oaa/ The Open Agent Architecture is a framework in which a community of software agents running on distributed machines can work together on tasks assigned by human or non-human participants in the community. Distributed cooperation and high-level communication are two ideas central to the foundation of the OAA. It defines an interagent communication language and supports multiple platforms and programming languages. PAI · Web site: utenti.quipo.it/claudioscordino/pai.html AI (Programmable Artificial Intelligence) is a program capable of having a conversation in its mother tongue, English. Written in C++. Penguin! · FTP site: http://www.cpan.org/modules/by- category/23_Miscellaneous_Modules/Penguin/FSG/ Penguin is a Perl 5 module. It provides you with a set of functions which allow you to: · send encrypted, digitally signed Perl code to a remote machine to be executed. · receive code and, depending on who signed it, execute it in an arbitrarily secure, limited compartment. The combination of these functions enable direct Perl coding of algorithms to handle safe internet commerce, mobile information- gathering agents, "live content" web browser helper apps, distributed load-balanced computation, remote software update, distance machine administration, content-based information propagation, Internet-wide shared-data applications, network application builders, and so on. Ps-i · Web site: ps-i.sourceforge.net Ps-i is an environment for running agent-based simulations. It is cross-platform, with binaries available for Win32. Features include: · declarative language for model specification · industry standard Tcl/Tk scripting with built-in routine optimization, speculative evaluation and xf86 JIT compiler users can create complex models without sacrificing perfomance · user friendly interface · save and restore program runs · change model parameters on the fly · data visualization: field display with multiple agent shapes and color, statistics window, agent viewer, routine browser and highlight agents tool RealTimeBattle · Web site: www.lysator.liu.se/realtimebattle/ RealTimeBattle is a programming game, in which robots controlled by programs are fighting each other. The goal is to destroy the enemies, using the radar to examine the environment and the cannon to shoot. · Game progresses in real time, with the robot programs running as child processes to RealTimeBattle. · The robots communicate with the main program using the standard input and output. · Robots can be constructed in almost any programming language. · Maximum number of robots can compete simultaneously. · A simple messaging language is used for communication, which makes it easy to start constructing robots. · Robots behave like real physical object. · You can create your own arenas. · Highly configurable. Remembrance Agents · Web site: www.remem.org Remembrance Agents are a set of applications that watch over a user's shoulder and suggest information relevant to the current situation. While query-based memory aids help with direct recall, remembrance agents are an augmented associative memory. For example, the word-processor version of the RA continuously updates a list of documents relevant to what's being typed or read in an emacs buffer. These suggested documents can be any text files that might be relevant to what you are currently writing or reading. They might be old emails related to the mail you are currently reading, or abstracts from papers and newspaper articles that discuss the topic of your writing. RoboTournament · Web site: robotournament.sourceforge.net/ RoboTournament is a RoboRally inspired game where players program their robots to vanquish their opponents. RoboTournament features: Multiple Game Types: Death Match, Rally, and Capture The Flag. Multi-Player through TCP/IP, Six weapons including BFG, Map Editor, and a wide variety of board elements. SimRobot · Web site: www.informatik.uni-bremen.de/~simrobot/ · FTP site: ftp.uni-bremen.de/pub/ZKW/INFORM/simrobot/ SimRobot is a program for simulation of sensor based robots in a 3D environment. It is written in C++, runs under UNIX and X11 and needs the graphics toolkit XView. · Simulation of robot kinematics · Hierarchically built scene definition via a simple definition language · Various sensors built in: camera, facette eye, distance measurement, light sensor, etc. · Objects defined as polyeders · Emitter abstractly defined; can be interpreted e.g. as light or sound · Camera images computed according to the raytracing or Z-buffer algorithms known from computer graphics · Specific sensor/motor software interface for communicating with the simulation · Texture mapping onto the object surfaces: bitmaps in various formats · Comprehensive visualization of the scene: wire frame w/o hidden lines, sensor and actor values · Interactive as well as batch driven control of the agents and operation in the environment · Collision detection · Extendability with user defined object types · Possible socket communication to e.g. the Khoros image processing software Sulawesi · Web site ???: wearables.essex.ac.uk/sulawesi/ A framework called Sulawesi has been designed and implemented to tackle what has been considered to be important challenges in a wearable user interface. The ability to accept input from any number of modalities, and perform if necessary a translation to any number of modal outputs. It does this primarily through a set of proactive agents to act on the input. TclRobots · Web site: www.nyx.net/~tpoindex/ TclRobots is a programming game, similar to 'Core War'. To play TclRobots, you must write a Tcl program that controls a robot. The robot's mission is to survive a battle with other robots. Two, three, or four robots compete during a battle, each running different programs (or possibly the same program in different robots.) Each robot is equipped with a scanner, cannon, drive mechanism. A single match continues until one robot is left running. Robots may compete individually, or combine in a team oriented battle. A tournament can be run with any number of robot programs, each robot playing every other in a round-robin fashion, one-on-one. A battle simulator is available to help debug robot programs. The TclRobots program provides a physical environment, imposing certain game parameters to which all robots must adhere. TclRobots also provides a view on a battle, and a controlling user interface. TclRobots requirements: a wish interpreter built from Tcl 7.4 and Tk 4.0. TKQML · Web site: www.csee.umbc.edu/tkqml/ TKQML is a KQML application/addition to Tcl/Tk, which allows Tcl based systems to communicate easily with a powerful agent communication language. The Tocoma Project · Web site: www.tacoma.cs.uit.no/ An agent is a process that may migrate through a computer network in order to satisfy requests made by clients. Agents are an attractive way to describe network-wide computations. The TACOMA project focuses on operating system support for agents and how agents can be used to solve problems traditionally addressed by operating systems. We have implemented a series of prototype systems to support agents. TACOMA Version 1.2 is based on UNIX and TCP. The system supports agents written in C, Tcl/Tk, Perl, Python, and Scheme (Elk). It is implemented in C. This TACOMA version has been in public domain since April 1996. We are currently focusing on heterogeneity, fault-tolerance, security and management issues. Also, several TACOMA applications are under construction. We implemented StormCast 4.0, a wide-area network weather monitoring system accessible over the internet, using TACOMA and Java. We are now in the process of evaluating this application, and plan to build a new StormCast version to be completed by June 1997. UMPRS Agent · Web site: http://www.marcush.net/IRS/ UMPRS supports top-down, goal-based reasoning and selects goals and plans based on maximal priority. Execution of multiple simultaneous goals are supported, with suspension and resumption capabilities for each goal (i.e., intention) thread. UMPRS plans have an integrated precondition/runtime attribute that constrain their applicability. Available plan constructs include: sequencing, iteration, subgoaling, atomic (i.e., non- interruptable) blocks, n-branch deterministic conditional execution, explicit failure-handling section, and C++ primitive function definition. Virtual Secretary Project (ViSe) (Tcl/Tk) · Web site: www.vise.cs.uit.no/vise/ The motivation of the Virtual Secretary project is to construct user-model-based intelligent software agents, which could in most cases replace human for secretarial tasks, based on modern mobile computing and computer network. The project includes two different phases: the first phase (ViSe1) focuses on information filtering and process migration, its goal is to create a secure environment for software agents using the concept of user models; the second phase (ViSe2) concentrates on agents' intelligent and efficient cooperation in a distributed environment, its goal is to construct cooperative agents for achieving high intelligence. (Implemented in Tcl/TclX/Tix/Tk) VWORLD · Web site: zhar.net/projects/vworld/ Vworld is a simulated environment for research with autonomous agents written in prolog. It is currently in something of an beta stage. It works well with SWI-prolog, but should work with Quitnus-prolog with only a few changes. It is being designed to serve as an educational tool for class projects dealing with prolog and autonomous agents. It comes with three demo worlds or environments, along with sample agents for them. There are two versions now. One written for SWI-prolog and one written for LPA-prolog. Documentation is roughly done (with a student/professor framework in mind), and a graphical interface is planned. WebMate · Web site: www.cs.cmu.edu/~softagents/webmate/ WebMate is a personal agent for World-Wide Web browsing and searching. It accompanies you when you travel on the internet and provides you what you want. Features include: · Searching enhancement, including parallel search, searching keywords refinement using our relevant keywords extraction technology, relevant feedback, etc. · Browsing assistant, including learning your current interesting, recommending you new URLs according to your profile and selected resources, monitoring bookmarks of Netscape or IE, sending the current browsing page to your friends, etc. · Offline browsing, including downloading the following pages from the current page for offline browsing. · Filtering HTTP header, including recording http header and all the transactions between your browser and WWW servers, etc. · Checking the HTML page to find the errors or dead links, etc. · Programming in Java, independent of operating system, runing in multi-thread. Zeus · Web site: more.btexact.com/projects/agents/zeus/ The construction of multi-agent systems involves long development times and requires solutions to some considerable technical difficulties. This has motivated the development of the ZEUS toolkit, which provides a library of software components and tools that facilitate the rapid design, development and deployment of agent system 7. Programming languages While any programming language can be used for artificial intelligence/life research, these are programming languages which are used extensively for, if not specifically made for, artificial intelligence programming. Allegro CL · Web site: www.franz.com Franz Inc's free linux version of their lisp development environment. You can download it or they will mail you a CD free (you don't even have to pay for shipping). It is generally considered to be one of the better lisp platforms. APRIL · Web site: sourceforge.net/projects/networkagent/ APRIL is a symbolic programming language that is designed for writing mobile, distributed and agent-based systems especially in an Internet environment. It has advanced features such as a macro sub-language, asynchronous message sending and receiving, code mobility, pattern matching, higher-order functions and strong typing. The language is compiled to byte-code which is then interpreted by the APRIL runtime-engine. APRIL now requires the InterAgent Communications Model (ICM) to be installed before it can be installed. [Ed. ICM can be found at the same web site] Ciao Prolog · Web site: www.clip.dia.fi.upm.es/Software/Ciao/ Ciao is a complete Prolog system subsuming ISO-Prolog with a novel modular design which allows both restricting and extending the language. Ciao extensions currently include feature terms (records), higher-order, functions, constraints, objects, persistent predicates, a good base for distributed execution (agents), and concurrency. Libraries also support WWW programming, sockets, and external interfaces (C, Java, TCL/Tk, relational databases, etc.). An Emacs-based environment, a stand-alone compiler, and a toplevel shell are also provided. DHARMI · Web site: http://megazone.bigpanda.com/~wolf/DHARMI/ DHARMI is a high level spatial, tinker-toy like language who's components are transparently administered by a background process called the Habitat. As the name suggests, the language was designed to make modelling prototypes and handle living data. Programs can be modified while running. This is accomplished by blurring the distinction between source code, program, and data. ECLiPSe · Web site: www.icparc.ic.ac.uk/eclipse/ ECLiPSe is a software system for the cost-effective development and deployment of constraint programming applications, e.g. in the areas of planning, scheduling, resource allocation, timetabling, transport etc. It is also ideal for teaching most aspects of combinatorial problem solving, e.g. problem modelling, constraint programming, mathematical programming, and search techniques. It contains several constraint solver libraries, a high-level modelling and control language, interfaces to third-party solvers, an integrated development environment and interfaces for embedding into host environments. ECoLisp · Web site (???): www.di.unipi.it/~attardi/software.html ECoLisp (Embeddable Common Lisp) is an implementation of Common Lisp designed for being embeddable into C based applications. ECL uses standard C calling conventions for Lisp compiled functions, which allows C programs to easily call Lisp functions and viceversa. No foreign function interface is required: data can be exchanged between C and Lisp with no need for conversion. ECL is based on a Common Runtime Support (CRS) which provides basic facilities for memory managment, dynamic loading and dumping of binary images, support for multiple threads of execution. The CRS is built into a library that can be linked with the code of the application. ECL is modular: main modules are the program development tools (top level, debugger, trace, stepper), the compiler, and CLOS. A native implementation of CLOS is available in ECL: one can configure ECL with or without CLOS. A runtime version of ECL can be built with just the modules which are required by the application. The ECL compiler compiles from Lisp to C, and then invokes the GCC compiler to produce binaries. ESTEREL · Web site: www-sop.inria.fr/meije/esterel/ Esterel is both a programming language, dedicated to programming reactive systems, and a compiler which translates Esterel programs into finite-state machines. It is particularly well- suited to programming reactive systems, including real-time systems and control automata. Only the binary is available for the language compiler. :P Gödel · Web page: www.cs.bris.ac.uk/~bowers/goedel.html Gödel is a declarative, general-purpose programming language in the family of logic programming languages. It is a strongly typed language, the type system being based on many-sorted logic with parametric polymorphism. It has a module system. Gödel supports infinite precision integers, infinite precision rationals, and also floating-point numbers. It can solve constraints over finite domains of integers and also linear rational constraints. It supports processing of finite sets. It also has a flexible computation rule and a pruning operator which generalizes the commit of the concurrent logic programming languages. Considerable emphasis is placed on Gödel's meta- logical facilities which provide significant support for meta- programs that do analysis, transformation, compilation, verification, debugging, and so on. CLisp (Lisp) · Web page: clisp.sourceforge.net · Alt Web site: clisp.cons.org/ CLISP is a Common Lisp implementation by Bruno Haible and Michael Stoll. It mostly supports the Lisp described by Common LISP: The Language (2nd edition) and the ANSI Common Lisp standard. CLISP includes an interpreter, a byte-compiler, a large subset of CLOS (Object-Oriented Lisp) , a foreign language interface and, for some machines, a screen editor. The user interface language (English, German, French) is chosen at run time. Major packages that run in CLISP include CLX & Garnet. CLISP needs only 2 MB of memory. CMU Common Lisp · Web page: www.cons.org/cmucl/ · Linux Installation: www.telent.net/lisp/howto.html CMU Common Lisp is a public domain "industrial strength" Common Lisp programming environment. Many of the X3j13 changes have been incorporated into CMU CL. Wherever possible, this has been done so as to transparently allow the use of either CLtL1 or proposed ANSI CL. Probably the new features most interesting to users are SETF functions, LOOP and the WITH-COMPILATION-UNIT macro. GCL (Lisp) · FTP site: ftp.ma.utexas.edu/pub/gcl/ GNU Common Lisp (GCL) has a compiler and interpreter for Common Lisp. It used to be known as Kyoto Common Lisp. It is very portable and extremely efficient on a wide class of applications. It compares favorably in performance with commercial Lisps on several large theorem-prover and symbolic algebra systems. It supports the CLtL1 specification but is moving towards the proposed ANSI definition. GCL compiles to C and then uses the native optimizing C compilers (e.g., GCC). A function with a fixed number of args and one value turns into a C function of the same number of args, returning one value, so GCL is maximally efficient on such calls. It has a conservative garbage collector which allows great freedom for the C compiler to put Lisp values in arbitrary registers. It has a source level Lisp debugger for interpreted code, with display of source code in an Emacs window. Its profiling tools (based on the C profiling tools) count function calls and the time spent in each function. GNU Prolog · Web site: gnu-prolog.inria.fr · Web site: pauillac.inria.fr/~diaz/gnu-prolog/ GNU Prolog is a free Prolog compiler with constraint solving over finite domains developed by Daniel Diaz. GNU Prolog accepts Prolog+constraint programs and produces native binaries (like gcc does from a C source). The obtained executable is then stand-alone. The size of this executable can be quite small since GNU Prolog can avoid to link the code of most unused built-in predicates. The performances of GNU Prolog are very encouraging (comparable to commercial systems). Beside the native-code compilation, GNU Prolog offers a classical interactive interpreter (top-level) with a debugger. The Prolog part conforms to the ISO standard for Prolog with many extensions very useful in practice (global variables, OS interface, sockets,...). GNU Prolog also includes an efficient constraint solver over Finite Domains (FD). This opens contraint logic pogramming to the user combining the power of constraint programming to the declarativity of logic programming. lush · Web site: lush.sourceforge.net Lush is an object-oriented programming language designed for researchers, experimenters, and engineers interested in large- scale numerical and graphic applications. Lush is designed to be used in situations where one would want to combine the flexibility of a high-level, weakly-typed interpreted language, with the efficiency of a strongly-typed, natively-compiled language, and with the easy integration of code written in C, C++, or other languages. Maude · Web site: maude.cs.uiuc.edu Maude is a high-performance reflective language and system supporting both equational and rewriting logic specification and programming for a wide range of applications. Maude has been influenced in important ways by the OBJ3 language, which can be regarded as an equational logic sublanguage. Besides supporting equational specification and programming, Maude also supports rewriting logic computation. Mercury · Web page: www.cs.mu.oz.au/research/mercury/ Mercury is a new, purely declarative logic programming language. Like Prolog and other existing logic programming languages, it is a very high-level language that allows programmers to concentrate on the problem rather than the low-level details such as memory management. Unlike Prolog, which is oriented towards exploratory programming, Mercury is designed for the construction of large, reliable, efficient software systems by teams of programmers. As a consequence, programming in Mercury has a different flavor than programming in Prolog. Mozart · Web page: www.mozart-oz.org/ The Mozart system provides state-of-the-art support in two areas: open distributed computing and constraint-based inference. Mozart implements Oz, a concurrent object-oriented language with dataflow synchronization. Oz combines concurrent and distributed programming with logical constraint-based inference, making it a unique choice for developing multi-agent systems. Mozart is an ideal platform for both general-purpose distributed applications as well as for hard problems requiring sophisticated optimization and inferencing abilities. We have developed applications in scheduling and time-tabling, in placement and configuration, in natural language and knowledge representation, multi-agent systems and sophisticated collaborative tools. SWI Prolog · Web page: www.swi-prolog.org SWI is a free version of prolog in the Edinburgh Prolog family (thus making it very similar to Quintus and many other versions). With: a large library of built in predicates, a module system, garbage collection, a two-way interface with the C language, plus many other features. It is meant as a educational language, so it's compiled code isn't the fastest. Although it similarity to Quintus allows for easy porting. XPCE is freely available in binary form for the Linux version of SWI-prolog. XPCE is an object oriented X-windows GUI development package/environment. Kali Scheme · Web site: www.neci.nj.nec.com/PLS/Kali.html Kali Scheme is a distributed implementation of Scheme that permits efficient transmission of higher-order objects such as closures and continuations. The integration of distributed communication facilities within a higher-order programming language engenders a number of new abstractions and paradigms for distributed computing. Among these are user-specified load- balancing and migration policies for threads, incrementally- linked distributed computations, agents, and parameterized client-server applications. Kali Scheme supports concurrency and communication using first-class procedures and continuations. It integrates procedures and continuations into a message-based distributed framework that allows any Scheme object (including code vectors) to be sent and received in a message. RScheme · Web site:www.rscheme.org · FTP site: ftp.rscheme.org/pub/rscheme/ RScheme is an object-oriented, extended version of the Scheme dialect of Lisp. RScheme is freely redistributable, and offers reasonable performance despite being extraordinarily portable. RScheme can be compiled to C, and the C can then compiled with a normal C compiler to generate machine code. By default, however, RScheme compiles to bytecodes which are interpreted by a (runtime) virtual machine. This ensures that compilation is fast and keeps code size down. In general, we recommend using the (default) bytecode code generation system, and only compiling your time-critical code to machine code. This allows a nice adjustment of space/time tradeoffs. (see web site for details) Scheme 48 · Web site: s48.org/ Scheme 48 is a Scheme implementation based on a virtual machine architecture. Scheme 48 is designed to be straightforward, flexible, reliable, and fast. It should be easily portable to 32-bit byte-addressed machines that have POSIX and ANSI C support. In addition to the usual Scheme built-in procedures and a development environment, library software includes support for hygienic macros (as described in the Revised^4 Scheme report), multitasking, records, exception handling, hash tables, arrays, weak pointers, and FORMAT. Scheme 48 implements and exploits an experimental module system loosely derived from Standard ML and Scheme Xerox. The development environment supports interactive changes to modules and interfaces. SCM (Scheme) · Web site: www-swiss.ai.mit.edu/~jaffer/SCM.html SCM conforms to the Revised^4 Report on the Algorithmic Language Scheme and the IEEE P1178 specification. Scm is written in C. It uses the following utilities (all available at the ftp site). · SLIB (Standard Scheme Library) is a portable Scheme library which is intended to provide compatibility and utility functions for all standard Scheme implementations, including SCM, Chez, Elk, Gambit, MacScheme, MITScheme, scheme->C, Scheme48, T3.1, and VSCM, and is available as the file slib2c0.tar.gz. Written by Aubrey Jaffer. · JACAL is a symbolic math system written in Scheme, and is available as the file jacal1a7.tar.gz. · Interfaces to standard libraries including REGEX string regular expression matching and the CURSES screen management package. · Available add-on packages including an interactive debugger, database, X-window graphics, BGI graphics, Motif, and Open- Windows packages. · A compiler (HOBBIT, available separately) and dynamic linking of compiled modules. Shift · Web site: www.path.berkeley.edu/shift/ Shift is a programming language for describing dynamic networks of hybrid automata. Such systems consist of components which can be created, interconnected and destroyed as the system evolves. Components exhibit hybrid behavior, consisting of continuous-time phases separated by discrete-event transitions. Components may evolve independently, or they may interact through their inputs, outputs and exported events. The interaction network itself may evolve. YAP Prolog · Web site: www.ncc.up.pt/~vsc/Yap/ YAP is a high-performance Prolog compiler developed at LIACC/Universidade do Porto. Its Prolog engine is based in the WAM (Warren Abstract Machine), with several optimizations for better performance. YAP follows the Edinburgh tradition, and is largely compatible with DEC-10 Prolog, Quintus Prolog, and especially with C-Prolog. Work on the more recent version of YAP strives at several goals: · Portability: The whole system is now written in C. YAP compiles in popular 32 bit machines, such as Suns and Linux PCs, and in a 64 bit machines, the Alphas running OSF Unix and Linux. · Performance: We have optimised the emulator to obtain performance comparable to or better than well-known Prolog systems. In fact, the current version of YAP performs better than the original one, written in assembly language. · Robustness: We have tested the system with a large array of Prolog applications. · Extensibility: YAP was designed internally from the beginning to encapsulate manipulation of terms. These principles were used, for example, to implement a simple and powerful C-interface. The new version of YAP extends these principles to accomodate extensions to the unification algorithm, that we believe will be useful to implement extensions such as constraint programming. · Completeness: YAP has for a long time provided most builtins expected from a Edinburgh Prolog implementation. These include I/O functionality, data-base operations, and modules. Work on YAP aims now at being compatible with the Prolog standard. · Openess: We would like to make new development of YAP open to the user community. · Research: YAP has been a vehicle for research within and outside our group. Currently research is going on on parallelisation and tabulation, and we have started work to support constraint handling. 8. MIA These are entires for which I no longer have a valid home page. If you have any information regarding where I can find these now please let me know. CLIG · Web site: www.ags.uni-sb.de/~konrad/clig.html CLIG is an interactive, extendible grapher for visualizing linguistic data structures like trees, feature structures, Discourse Representation Structures (DRS), logical formulas etc. All of these can be freely mixed and embedded into each other. The grapher has been designed both to be stand-alone and to be used as an add-on for linguistic applications which display their output in a graphical manner. Illuminator · Web site: documents.cfar.umd.edu/resources/source/illuminator.html Illuminator is a toolset for developing OCR and Image Understanding applications. Illuminator has two major parts: a library for representing, storing and retrieving OCR information, heretofore called dafslib, and an X-Windows "DAFS" file viewer, called illum. Illuminator and DAFS lib were designed to supplant existing OCR formats and become a standard in the industry. They particularly are extensible to handle more than just English. The features of this release: · 5 magnification levels for images · flagged characters and words · unicode support -- American, British, French, German, Greek, Italian, MICR, Norwegian, Russian, Spanish, Swedish, keyboards · reads DAFS, TIFF's, PDA's (image only) · save to DAFS, ASCII/UTF or Unicode · Entity Viewer - shows properties, character choices, bounding boxes image fragment for a selected entity, change type, change content, hierarchy mode Symbolic Probabilistic Inference (SPI) · FTP site: ftp.engr.orst.edu/pub/dambrosi/spi/ · Paper (ijar-94.ps): ftp.engr.orst.edu/pub/dambrosi/ Contains Common Lisp function libraries to implement SPI type baysean nets. Documentation is very limited. Features: · Probabilities, Local Expression Language Utilities, Explanation, Dynamic Models, and a TCL/TK based GUI. IDEAL · Web site: yoda.cis.temple.edu:8080/ideal/ IDEAL is a test bed for work in influence diagrams and Bayesian networks. It contains various inference algorithms for belief networks and evaluation algorithms for influence diagrams. It contains facilities for creating and editing influence diagrams and belief networks. IDEAL is written in pure Common Lisp and so it will run in Common Lisp on any platform. The emphasis in writing IDEAL has been on code clarity and providing high level programming abstractions. It thus is very suitable for experimental implementations which need or extend belief network technology. At the highest level, IDEAL can be used as a subroutine library which provides belief network inference and influence diagram evaluation as a package. The code is documented in a detailed manual and so it is also possible to work at a lower level on extensions of belief network methods. IDEAL comes with an optional graphic interface written in CLIM. If your Common Lisp also has CLIM, you can run the graphic interface. Ummon · Web site: www.spacetide.com/projects/ummon/ Ummon is an advanced Open Source chatterbot. The main principle of the bot is that it has no initial knowledge of either words or grammar; it learns everything "on the fly." Numerous AI techniques will be explored in the development of Ummon to achieve realistic "human" communication with support for different, customizable personalities. Brief Introduction to Alpha Systems and Processors Neal Crook, Digital Equipment (Editor: David Mosberger ) V0.11, 6 June 1997 This document is a brief overview of existing Alpha CPUs, chipsets and systems. It has something of a hardware bias, reflecting my own area of expertese. Although I am an employee of Digital Equipment Corpora- tion, this is not an official statement by Digital and any opinions expressed are mine and not Digital's. ______________________________________________________________________ Table of Contents 1. What is Alpha 2. What is Digital Semiconductor 3. Alpha CPUs 4. 21064 performance vs 21066 performance 5. A Few Notes On Clocking 6. The chip-sets 7. The Systems 8. Bytes and all that stuff 9. PALcode and all that stuff 10. Porting 11. More Information 12. References ______________________________________________________________________ 1. What is Alpha "Alpha" is the name given to Digital's 64-bit RISC architecture. The Alpha project in Digital began in mid-1989, with the goal of providing a high-performance migration path for VAX customers. This was not the first RISC architecture to be produced by Digital, but it was the first to reach the market. When Digital announced Alpha, in March 1992, it made the decision to enter the merchant semicondutor market by selling Alpha microprocessors. Alpha is also sometimes referred to as Alpha AXP, for obscure and arcane reasons that aren't worth persuing. Suffice it to say that they are one and the same. 2. What is Digital Semiconductor Digital Semiconductor (DS) is the business unit within Digital Equipment Corporation (Digital - we don't like the name DEC) that sells semiconductors on the merchant market. Digital's products include CPUs, support chipsets, PCI-PCI bridges and PCI peripheral chips for comms and multimedia. 3. Alpha CPUs There are currently 2 generations of CPU core that implement the Alpha architecture: o EV4 o EV5 Opinions differ as to what "EV" stands for (Editor's note: the true answer is of course "Electro Vlassic" ``[1]''), but the number represents the first generation of Digital's CMOS technology that the core was implemented in. So, the EV4 was originally implemented in CMOS4. As time goes by, a CPU tends to get a mid-life performance kick by being optically shrunk into the next generation of CMOS process. EV45, then, is the EV4 core implemented in CMOS5 process. There is a big difference between shrinking a design into a particular technology and implementing it from scratch in that technology (but I don't want to go into that now). There are a few other wildcards in here: there is also a CMOS4S (optical shrink in CMOS4) and a CMOS5L. True technophiles will be interested to know that CMOS4 is a 0.75 micron process, CMOS5 is a 0.5 micron process and CMOS6 is a 0.35 micron process. To map these CPU cores to chips we get: 21064-150,166 EV4 (originally), EV4S (now) 21064-200 EV4S 21064A-233,275,300 EV45 21066 LCA4S (EV4 core, with EV4 FPU) 21066A-233 LCA45 (EV4 core, but with EV45 FPU) 21164-233,300,333 EV5 21164A-417 EV56 21264 EV6 The EV4 core is a dual-issue (it can issue 2 instructions per CPU clock) superpipelined core with integer unit, floating point unit and branch prediction. It is fully bypassed and has 64-bit internal data paths and tightly coupled 8Kbyte caches, one each for Instruction and Data. The caches are write-through (they never get dirty). The EV45 core has a couple of tweaks to the EV4 core: it has a slightly improved floating point unit, and 16KB caches, one each for Instruction and Data (it also has cache parity). (Editor's note: Neal Crook indicated in a separate mail that the changes to the floating point unit (FPU) improve the performance of the divider. The EV4 FPU divider takes 34 cycles for a single-precision divide and 63 cycles for a double-precision divide (non data-dependent). In constrast, the EV45 divider takes typically 19 cycles (34 cycles max) for single- precision and typically 29 cycles (63 cycles max) for a double- precision division (data-dependent).) The EV5 core is a quad-issue core, also superpipelined, fully bypassed etc etc. It has tightly-coupled 8Kbyte caches, one each for I and D. These caches are write-through. It also has a tightly-coupled 96Kbyte on-chip second-level cache (the Scache) which is 3-way set associative and write-back (it can be dirty). The EV4->EV5 performance increase is better than just the increase achieved by clock speed improvements. As well as the bigger caches and quad issue, there are microarchitectural improvements to reduce producer/consumer latencies in some paths. The EV56 core is fundamentally the same microarchitecture as the EV5, but it adds some new instructions for 8 and 16-bit loads and stores (see Section ``Bytes and all that stuff''). These are primarily intended for use by device drivers. The EV56 core is implemented in CMOS6, which is a 2.0V process. The 21064 was anounced in March 1992. It uses the EV4 core, with a 128-bit bus interface. The bus interface supports the 'easy' connection of an external second-level cache, with a block size of 256-bits (2 data beats on the bus). The Bcache timing is completely software configurable. The 21064 can also be configured to use a 64-bit external bus, (but I'm not sure if any shipping system uses this mode). The 21064 does not impose any policy on the Bcache, but it is usually configured as a write-back cache. The 21064 does contain hooks to allow external hardware to maintain cache coherence with the Bcache and internal caches, but this is hairy. The 21066 uses the EV4 core and integrates a memory controller and PCI host bridge. To save pins, the memory controller has a 64-bit data bus (but the internal caches have a block size of 256 bits, just like the 21064, therefore a block fill takes 4 beats on the bus). The memory controller supports an external Bcache and external DRAMs. The timing of the Bcache and DRAMs is completely software configurable, and can be controlled to the resolution of the CPU clock period. Having a 4-beat process to fill a cache block isn't as bad as it sounds because the DRAM access is done in page mode. Unfortunately, the memory controller doesn't support any of the new esoteric DRAMs (SDRAM, EDO or BEDO) or synchronous cache RAMs. The PCI bus interface is fully rev2.0 compliant and runs at upto 33MHz. The 21164 has a 128-bit data bus and supports split reads, with upto 2 reads outstanding at any time (this allows 100% data bus utilisation under best-case dream-on conditions, i.e., you can theoretically transfer 128-bits of data on every bus clock). The 21164 supports easy connection of an external 3-rd level cache (Bcache) and has all the hooks to allow external systems to maintain full cache coherence with all caches. Therefore, symmetric multiprocessor designs are 'easy'. The 21164A was announced in October, 1995. It uses the EV56 core. It is nominally pin-compatible with the 21164, but requires split power rails; all of the power pins that were +3.3V power on the 21164 have now been split into two groups; one group provided 2.0V power to the CPU core, the other group supplies 3.3V to the I/O cells. Unlike older implementations, the 21164 pins are not 5V-tolerant. The end result of this change is that 21164 systems are, in general, not upgradeable to the 21164A (though note that it would be relatively straightforward to design a 21164A system that could also accommodate a 21164). The 21164A also has a couple of new pins to support the new 8 and 16-bit loads and stores. It also improves the 21164 support for using synchronus SRAMs to implement the external Bcache. 4. 21064 performance vs 21066 performance The 21064 and the 21066 have the same (EV4) CPU core. If the same program is run on a 21064 and a 21066, at the same CPU speed, then the difference in performance comes only as a result of system Bcache/memory bandwidth. Any code thread that has a high hit-rate on the internal caches will perform the same. There are 2 big performance killers: 1. Code that is write-intensive. Even though the 21064 and the 21066 have write buffers to swallow some of the delays, code that is write-intensive will be throttled by write bandwidth at the system bus. This arises because the on-chip caches are write-through. 2. Code that wants to treat floats as integers. The Alpha architecture does not allow register-register transfers from integer registers to floating point registers. Such a conversion has to be done via memory (And therefore, because the on-chip caches are write- through, via the Bcache). (Editor's note: it seems that both the EV4 and EV45 can perform the conversion through the primary data cache (Dcache), provided that the memory is cached already. In such a case, the store in the conversion sequence will update the Dcache and the subsequent load is, under certain circumstances, able to read the updated d-cache value, thus avoiding a costly roundtrip to the Bcache. In particular, it seems best to execute the stq/ldt or stt/ldq instructions back-to-back, which is somewhat counter-intuitive.) If you make the same comparison between a 21064A and a 21066A, there is an additional factor due to the different Icache and Dcache sizes between the two chips. Now, the 21164 solves both these problems: it achieve much higher system bus bandwidths (despite having the same number of signal pins - yes, I know it's got about twice as many pins as a 21064, but all those extra ones are power and ground! (yes, really!!)) and it has write-back caches. The only remaining problem is the answer to the question "how much does it cost?" 5. A Few Notes On Clocking All of the current Alpha CPUs use high-speed clocks, because their microarchitectures have been designed as so-called short-tick designs. None of the sytem busses have to run at horrendous speeds as a result though: o on the 21066(A), 21064(A), 21164 the off-chip cache (Bcache) timing is completely programmable, to the resolution of the CPU clock. For example, on a 275MHz CPU, the Bcache read access time can be controller with a resolution of 3.6ns o on the 21066(A), the DRAM timing is completely programmable, to the resolution of the CPU clock (not the PCI clock, the CPU clock). o on the 21064(A), 21164(A), the system bus frequency is a sub- multiple of the CPU clock frequency. Most of the 21064 motherboards use a 33MHz system bus clock. o Systems that use the 21066 can run the PCI at any frequency relative to the CPU. Generally, the PCI runs at 33MHz. o Systems that use the APECs chipset (see Section ``'') always have their CPU system bus equal to their PCI bus frequency. This means that both busses tends to run at either 25MHz or 33MHz (since these are the frequencies that scale up to match the CPU frequencies). On APECs systems, the DRAM controller timings are software programmable in terms of the CPU system bus frequency Aside: someone suggested that they were getting bad performance on a 21066 because the 21066 memory controller was only running at 33MHz. Actually, it's the superfast 21064A systems that have memory controllers that 'only' run at 33MHz. 6. The chip-sets DS sells two CPU support chipsets. The 2107x chipset (aka APECS) is a 21064(A) support chiset. The 2117x chipset (aka ALCOR) is a 21164 support chipset. There will also be 2117xA chipset (aka ALCOR 2) as a 21164A support chipset. Both chipsets provide memory controllers and PCI host bridges for their CPU. APECS provides a 32-bit PCI host bridge, ALCOR provides a 64-bit PCI host bridge which (in accordance with the requirements of the PCI spec) can support both 32-bit and 64-bit PCI devices. APECS consists of 6, 208-pin chips (4, 32-bit data slices (DECADE), 1 system controller (COMANCHE), 1 PCI controller (EPIC)). It provides a DRAM controller (128-bit memory bus) and a PCI interface. It also does all the work to maintain memory coherence when a PCI device DMAs into (or out of) memory. ALCOR consists of 5 chips (4, 64-bit data slices (Data Switch, DSW) - 208-pin PQFP and 1 control (Control, I/O Address, CIA) - a 383 pin plastic PGA). It provides a DRAM controller (256-bit memory bus) and a PCI interface. It also does all the work required to support an external Bcache and to maintain memory coherence when a PCI device DMAs into (or out of) memory. There is no support chipset for the 21066, since the memory controller and PCI host bridge functionality are integrated onto the chip. 7. The Systems The applications engineering group in DS produces example designs using the CPUs and support chipsets. These are typically PC-AT size motherboards, with all the functionality that you'd typically find on a high-end Pentium motherboard. Originally, these example designs were intended to be used as starting points for third-parties to produce motherboard designs from. These first-generation designs were called Evaluation Boards (EBs). As the amount of engineering required to build a motherboard has increased (due to higher-speed clocks and the need to meet RF emission and susceptibility regulations) the emphasis has shifted towards providing motherboards that are suitable for volume manufacture. Digital's system groups have produced several generations of machines using Alpha processors. Some of these systems use support logic that is designed by the systems groups, and some use commodity chipsets from DS. In some cases, systems use a combination of both. Various third-parties build systems using Alpha processors. Some of these companies design systems from scratch, and others use DS support chipsets, clone/modify DS example designs or simply package systems using build and tested boards from DS. The EB64: Obsolete design using 21064 with memory controller implemented using programmable logic. I/O provided by using programmable logic to interface a 486<->ISA bridge chip. On-board Ethernet, SuperI/O (2S, 1P, FD), Ethernet and ISA. PC-AT size. Runs from standard PC power supply. The EB64+: Uses 21064 or 21064A and APECs. Has ISA and PCI expansion (3 ISA, 2 PCI, one pair are on a shared slot). Supports 36-bit DRAM SIMs. ISA bus generated by Intel SaturnI/O PCI-ISA bridge. On-board SCSI (NCR 810 on PCI) Ethernet (Digital 21040), KBD, MOUSE (PS2 style), SuperI/O (2S, 1P, FD), RTC/NVRAM. Boot ROM is EPROM. PC-AT size. Runs from standard PC power supply. The EB66: Uses 21066 or 21066A. I/O sub-system is identical to EB64+. Baby PC-AT size. Runs from standard PC power supply. The EB66 schematic was published as a marketing poster advertising the 21066 as "the first microprocessor in the world with embedded PCI" (for trivia fans: there are actually 2 versions of this poster - I drew the circuits and wrote the spiel for the first version, and some Americans mauled the spiel for the second version) The EB164: Uses 21164 and ALCOR. Has ISA and PCI expansion (3 ISA slots, 2 64-bit PCI slots (one is shared with an ISA slot) and 2 32-bit PCI slots. Uses plus-in Bcache SIMMs. I/O sub-system provides SuperI/O (2S, 1P, FD), KBD, MOUSE (PS2 style), RTC/NVRAM. Boot ROM is Flash. PC-AT-sized motherboard. Requires power supply with 3.3V output. The AlphaPC64 (aka Cabriolet): derived from EB64+ but now baby-AT with Flash boot ROM, no on-board SCSI or Ethernet. 3 ISA slots, 4 PCI slots (one pair are on a shared slot), uses plug-in Bcache SIMMs. Requires power supply with 3.3V output. The AXPpci33 (aka NoName), is based on the EB66. This design is produced by Digital's Technical OEM (TOEM) group. It uses the 21066 processor running at 166MHz or 233MHz. It is a baby-AT size, and runs from a standard PC power supply. It has 5 ISA slots and 3 PCI slots (one pair are a shared slot). There are 2 versions, with either PS/2 or large DIN connectors for the keyboard. Other 21066-based motherboards: most if not all other 21066-based motherboards on the market are also based on EB66 - there's really not many system options when designing a 21066 system, because all the control is done on-chip. Multia (aka the Universal Desktop Box): This is a very compact pedestal desktop system based on the 21066. It includes 2 PCMCIA sockets, 21030 (TGA) graphics, 21040 Ethernet and NCR 810 SCSI disk along with floppy, 2 serial ports and a parallel port. It has limited expansion capability (one PCI slot) due to its compact size. (There is some restriction on when you can use the PCI slot, can't remember what) (Note that 21066A-based and Pentium-based Multia's are also available). DEC PC 150 AXP (aka Jensen): This is a very old Digital system - one of the first-generation Alpha systems. It is only mentioned here because a number of these systems seem to be available on the second- hand market. The Jensen is a floor-standing tower system which used a 150MHz 21064 (later versions used faster CPUs but I'm not sure what speeds). It used programmable logic to interface a 486 EISA I/O bridge to the CPU. Other 21064(A) systems: There are 3 or 4 motherboard designs around (I'm not including Digital systems here) and all the ones I know of are derived from the EB64+ design. These include: o EB64+ (some vendors package the board and sell it unmodified); AT form-factor. o Aspen Systems motherboard: EB64+ derivative; baby-AT form-factor. o Aspen Systems server board: many PCI slots (includes PCI bridge). o AlphaPC64 (aka Cabriolet), baby AT form-factor. Other 21164(A) systems: The only one I'm aware of that isn't simply an EB164 clone is a system made by DeskStation. That system is implemented using a memory and I/O controller proprietary to Desk Station. I don't know what their attitude towards Linux is. 8. Bytes and all that stuff When the Alpha architecture was introduced, it was unique amongst RISC architectures for eschewing 8-bit and 16-bit loads and stores. It supported 32-bit and 64-bit loads and stores (longword and quadword, in Digital's nomenclature). The co-architects (Dick Sites, Rich Witek) justified this decision by citing the advantages: 1. Byte support in the cache and memory sub-system tends to slow down accesses for 32-bit and 64-bit quantities. 2. Byte support makes it hard to build high-speed error-correction circuitry into the cache/memory sub-system. Alpha compensates by providing powerful instructions for manipulating bytes and byte groups within 64-bit registers. Standard benchmarks for string operations (e.g., some of the Byte benchmarks) show that Alpha performs very well on byte manipulation. The absence of byte loads and stores impacts some software semaphores and impacts the design of I/O sub-systems. Digital's solution to the I/O problem is to use some low-order address lines to specify the data size during I/O transfers, and to decode these as byte enables. This so-called Sparse Addressing wastes address space and has the consequence that I/O space is non-contiguous (more on the intricacies of Sparse Addressing when I get around to writing it). Note that I/O space, in this context, refers to all system resources present on the PCI and therefore includes both PCI memory space and PCI I/O space. With the 21164A introduction, the Alpha archtecture was ECO'd to include byte addressing. Executing these new instructions on an earlier CPU will cause an OPCDEC PALcode exception, so that the PALcode will handle the access. This will have a performance impact. The ramifications of this are that use of these new instructions (IMO) should be restricted to device drivers rather than applications code. These new byte load and stores mean that future support chipsets will be able to support contiguous I/O space. 9. PALcode and all that stuff This is a placeholder for a section explaining PALcode. I will write it if there is sufficient interest. 10. Porting The ability of any Alpha-based machine to run Linux is really only limited by your ability to get information on the gory details of its innards. Since there are Linux ports for the E66, EB64+ and EB164 boards, all systems based on the 21066, 21064/APECS or 21164/ALCOR should run Linux with little or no modification. The major thing that is different between any of these motherboards is the way that they route interrupts. There are three sources of interrupts: o on-board devices o PCI devices o ISA devices All the systems use an Intel System I/O bridge (SIO) to act as a bridge between PCI and ISA (the main I/O bus is PCI, the ISA bus is a secondary bus used to support slow-speed and 'legacy' I/O devices). The SIO contains the traditional pair of daisy-chained 8259s. Some systems (e.g., the Noname) route all of their interrupts through the SIO and thence to the CPU. Some systems have a separate interrupt controller and route all PCI interrupts plus the SIO interrupt (8259 output) through that, and all ISA interrupts through the SIO. Other differences between the systems include: o how many slots they have o what on-board PCI devices they have o whether they have Flash or EPROM 11. More Information All of the DS evaluation boards and motherboard designs are license- free and the whole documentation kit for a design costs about \$50. That includes all the schematics, programmable parts sources, data sheets for CPU and support chipset. The doc kits are available from Digital Semiconductor distributors. I'm not suggesting that many people will want to rush out and buy this, but I do want to point out that the information is available. Hope that was helpful. Comments/updates/suggestions for expansion to Neal Crook . 12. References [1] Bill Hamburgen, Jeff Mogul, Brian Reid, Alan Eustace, Richard Swan, Mary Jo Doherty, and Joel Bartlett. Characterization of Organic Illumination Systems. DEC WRL, Technical Note 13, April 1989. Alsa-sound-mini-HOWTO Valentijn Sessink valentyn@alsa-project.org v2.0-pre1, 12 November 1999 Describes the installation of the ALSA sound drivers for Linux. These sound drivers can be used as a replacement for the regular sound drivers, as they are fully compatible. ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Acknowledgments 1.2 Revision History 1.3 New versions of this document 1.4 Feedback 1.5 Distribution Policy 2. NOWTO - a quick install guide 2.1 Installing ALSA for kernels 2.2.x 2.2 Playing and recording sound 2.3 Installing ALSA for 2.0.x 2.4 Playing and recording sound 3. Before you start 3.1 Introduction 3.2 General information about the ALSA drivers 3.3 Supported hardware 3.4 Other HOWTO's 3.4.1 Sound cards 3.4.2 Plug and Play cards 3.4.3 Loadable modules 3.4.4 Kerneld 4. How to install ALSA sound drivers 4.1 What you need 4.2 Getting the drivers 4.3 ALSA versions 4.4 Extracting 4.5 Compiling 4.6 Preparing the devices 5. Loading the driver 5.1 Inserting with modprobe 5.2 Which module for which card? 5.2.1 Gravis UltraSound Extreme 5.2.2 Gravis UltraSound MAX 5.2.3 ESS AudioDrive 5.2.4 ESS AudioDrive 18xx 5.2.5 Gravis UltraSound PnP 5.2.6 UltraSound 32-Pro 5.2.7 Soundblaster 5.2.8 Soundblaster 16 5.2.9 OAK Mozart 5.2.10 OPTi 82C9xx 5.2.11 AD1847/48 and CS4248 5.2.12 Yamaha OPL3-SA2/SA3 soundcards 5.2.13 S3 SonicVibes 5.2.14 Ensoniq/Soundblaster PCI64 5.2.15 CS4231 5.2.16 CS4232/4232A 5.2.17 4235 and higher 5.2.18 4610/4612/4615 and 4680 5.2.19 ESS Solo 1 5.2.20 Trident 4DWave DX/NX 5.2.21 ForteMedia FM801 5.3 modprobe for drivers without auto-probing 5.3.1 OPL3-SA2 and OPL3-SA3 5.3.2 CS4231 chips 5.3.3 CS4232/CS4232A chips 5.3.4 CS4235/CS4236/CS4236B/CS4237B/CS4238B/CS4239 chips 5.4 The kerneld approach 5.5 Backwards compatibility 6. Testing and using 6.1 The /proc filesystem 6.2 The mixer 6.2.1 Mixer settings for playing 6.2.2 Mixer parts 6.2.3 Mixer settings for recording 6.2.4 Other mixer settings 6.3 The /dev/snd/ devices 6.4 Additional information 6.4.1 /proc/asound/#/pcm#0 6.4.2 /proc/asound/#card#/sb16 7. Tips and Troubleshooting 7.1 Compiling the driver 7.1.1 Linux kernel sourcetree 7.1.2 Cannot create executables 7.2 Loading the driver 7.2.1 Sound devices 7.2.2 Sound card compatibility 7.2.3 ``Device busy'' or ``unresolved symbols'' 7.2.3.1 2.0 kernels 7.2.3.2 2.2 kernels 7.2.4 References to other drivers 7.2.5 Unresolved symbols revisited 7.2.6 Check the PnP setup 7.2.7 Are your parameters right ? 7.3 Driver loaded... but no (or hardly any) sound 7.3.1 Unmuting 7.3.2 Gain 7.3.3 OSS/Linux compatibility 7.3.4 Cannot open mixer 7.4 General suggestions 7.4.1 Try using ``insmod'' 7.4.2 Read the INSTALL file. 7.4.3 Debug messages 7.4.4 If all else fails... 7.5 Bug reports 7.6 Tip: playing CD's 7.7 Tip: installing the MIDI serial driver 7.8 Tip: new kernel? New modules! 7.9 Tip: KDE and ALSA drivers 7.10 Tip: use the ALSA devices 7.11 Tip: removing all modules ______________________________________________________________________ 1. Introduction This is the ALSA Sound drivers mini-HOWTO. It gives you information about installing and using the ALSA sound drivers for your soundcard. The ALSA drivers are fully modularized sound drivers that support kerneld and kmod. They are compatible with, but surpass the possibilities of, the current OSS API. In other words: compatible, but better. 1.1. Acknowledgments This documents contains information I got from the ALSA driver page. The structure was ripped off the SB-mini-HOWTO, mainly because it had about the structure I was looking for. Thanks to the SGML Tools package, this HOWTO is available in several formats, all generated from a common source file. Thanks to Erik Warmelink for proof reading, thanks to Alfred Munnikes for a couple of questions and helpful suggestions. Yamahata Isaku thanks for the Japanese translation, Miodrag Vallat for the translation in French. Later on, Steve Crowder did a great job by reading and editing the whole text. Thanks to Cserna Zsolt for the Hungarian translation and Marco Meloni for the Italian one. Thanks to Mohamed Ismail Mohamed-Ibrahim who sent me a document about the Trident 4DWave DX/NX soundcard with a lot of useful information, thanks to Gerard Haagh who sent me a lot of useful information and who also pointed out a few unclear sections. Thanks to Marc-Aur`ele Darche, Piotr Ingling, Juergen Kahrs, Tim Pearce, Patrick Stoddard, Rutger de Graaf, Shuly Wintner, Jyrki Saarela, Jonas Lofwander, Kumar Sankaran and many others for useful tips and additions. 1.2. Revision History Version 2.0-pre1 - November 12, 1999. Updating a couple of sections to ALSA 0.4.1e., added various links. Version 1.7 - July 29, 1999. A few fixes. Version 1.6 - July 26, 1999. Added a section about ALSA-versions Version 1.5 - May 21, 1999. Changed the mixer section, added a quick install section Version 1.4 - May 18, 1999. Included the URL to the French version, changed more URLs. Version 1.3 - May 16, 1999. Thanks to Jaroslav this HOWTO has found a home at the ALSA-project website. As a result of that, some updates in mail and web addresses. Version 1.2 - May 11, 1999. Several updates. Version 1.1 - March 11, 1999. Added a couple of sound cards from the new 0.3 series drivers, wrote a bit about the 2.2 series kernel. Version 1.0 - February 8, 1999. Added a few things to the troubleshooting section, but we seem fairly complete. Version 0.3 beta - January 20, 1999. A link on the ALSA-homepage. Ha, we're official! Version 0.2 alpha - Mid January 1999, first .sgml-version. Version 0.1 alpha - January 1999, first version, mostly HTML. Still: please submit any patches in plain English, you native speakers! There are a couple of additions that need to be added to the HOWTO now. Notably, Mohamed Ismail Mohamed-Ibrahim and Gerard Haagh wrote wonderful additions to the HOWTO, that will keep me off my regular work for some more time. So this is 2.0-pre1 and more pre's are to follow. 1.3. New versions of this document The latest version can be found at http://www.alsa- project.org./~valentyn Other formats (full size html, sgml, txt) are in the directory other- formats. Unfortunately, I have not succeeded in compiling a Postscript version, as the sgml2latex-script returns a bunch of errors. Yamahata Isaku has translated a Japanese version, which will be available at the Japanese ALSA site, http://plaza21.mbn.or.jp/~momokuri/alsa/index.html Miodrag Vallat translated a French version, which is available at http://www.freenix.fr/unix/linux/HOWTO/mini/Alsa.html. Cserna Zsolt has translated the Hungarian version of the ALSA-HOWTO. You can find it at http://kib4.vein.hu/~zsolt/alsa.html. Marco Meloni did an Italian version, you can get it at http://pluto.linux.it/ildp/index.html. If you make a translation of this document into another language, let me know and I'll include a reference to it here. Ook een Nederlandse versie is welkom, ik heb zelf geen tijd om deze te schrijven. Leve de koningin! 1.4. Feedback I rely on you, the reader, to make this HOWTO useful. If you have any suggestions, corrections or comments, please send them to me (alsa- howto@alsa-project.org), and I will try to incorporate them in the next revision. Please note: I do not get a lot of mail about the ALSA drivers and any addition is welcome. Even a ``thank you for'' is appreciated - maybe it's not too much work to add a ``I appreciated most'' or ``this-or- that was not immediately clear to me''-section. If you publish this document on a CD-ROM or in hardcopy form, a complimentary copy would be appreciated. Mail me for my postal address. Also consider making a donation to the Linux Documentation Project to help support free documentation for Linux. Contact the Linux HOWTO co-ordinator, Tim Bynum linux-howto@metalab.unc.edu, for more information. 1.5. Distribution Policy Copyright 1998/1999 Valentijn Sessink This HOWTO is free documentation; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This document is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. You can obtain a copy of the GNU General Public License by writing to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 2. NOWTO - a quick install guide If you want sound and you want it NOW! and not after reading this HOWTO, this quick tour through the ALSA driver installation might be of help. Please note: there are a couple of differences between the ALSA versions that support 2.0 kernels and those that support 2.2 kernels. 2.1. Installing ALSA for kernels 2.2.x You will probably want to use the ALSA 0.4.1e (or later) version if your kernel is 2.2.x. If your kernel is older, please use 0.3.0-pre4 and see below. Just the all time ``./configure - make - make install'' stuff. Do this for drivers, library and utilities. You need all three because the utilities help you to unmute your card. Kernels 2.2.x need to have general sound support in the kernel (without choosing a specific card). The ALSA drivers have their own devices, you can make them usinge the ./snddevices script. You need to load the module for your card (or use kmod) and if you want sound to be backwards compatible with the Linux kernel sound drivers (yes you want this) you need two other modules called snd- pcm1-oss and snd-mixer-oss. See the section ``Which module for which card'' to find out which module to load. After loading, you can look in /proc/asound for various information about the ALSA drivers. 2.2. Playing and recording sound A few remarks. ALSA has it's own devices in /dev/snd, for example /dev/snd/pcmC0D1 is Card 0, Device 1. You can use the old /dev/pcmXY devices if you loaded snd-pcm1-oss for backwards compatibility. You'll also want to use /dev/mixer, so load snd-mixer-oss as well. Before you can play any sound, you need to unmute the card with ``amixer''. Type ``amixer groups'', then try something like amixer set PCM 100 unmute Generally you can use options ``mute'' or ``unmute'', ``capture'' or ``nocapture'' and numbers. That's it! Now if it works, it works. If it doesn't work, you may need to actually read this HOWTO... 2.3. Installing ALSA for 2.0.x The ALSA drivers versions 0.3.0, 0.3.1 and 0.3.2 have various problems due to the restructuring of the mixer interface. Later versions do not support kernel 2.0.x, so you definately will want to use version 0.3.0-pre4 if you have a 2.0 version kernel. Just the all time ``./configure - make - make install'' stuff. Do this for drivers, library and utilities. You need all three because the utilities help you to unmute your card. Kernels 2.0.x need to have all sound support disabled in the kernel setup The ALSA drivers have their own devices, you can make them usinge the ./snddevices script. You need to load the module for your card (or use kmod) and if you want sound to be backwards compatible with the Linux kernel sound drivers (yes you want this) you need another modules called snd- pcm1-oss. See the section ``Which module for which card'' to find out which module to load. After loading, you can look in /proc/asound for various information about the ALSA drivers. 2.4. Playing and recording sound A few remarks. ALSA has it's own devices in /dev/snd, for example /dev/snd/pcmC0D1 is Card 0, Device 1. You can use the old /dev/pcmXY devices if you loaded snd-pcm1-oss for backwards compatibility. Before you can play any sound, you need to unmute the card with ``amixer''. Type ``amixer'', then try something like amixer pcm 100 unmute Generally you can use options ``mute'' or ``unmute'', ``rec'' or ``norec'', numbers or left:right. That's it! Now if it works, it works. If it doesn't work, you may need to actually read this HOWTO... 3. Before you start 3.1. Introduction This document tries to help you install and use the ALSA sound drivers in your Linux system. The reference system is a Slackware 4.0 distribution of Linux on an AMD/K6 computer (x86 compatible), but it should work with any other Linux distribution. I do not know if the ALSA drivers work on other platforms, according to the documentation, Alpha has been tested and proven to work. I have only x86 PC's here, so any additional information you may have would be appreciated. It might be handy to read the Linux Sound HOWTO (see section Other HOWTO's), but that HOWTO focuses on the built-in kernel drivers. 3.2. General information about the ALSA drivers The ALSA sound driver was originally written as a replacement for the Linux kernel sound for Gravis UltraSound (GUS) cards. As this GUS replacement proved to be a success, the author started the ALSA project for a generic driver for several sound chips, with fully modularized design. It is compatible with the OSS/Free and OSS/Linux sound drivers (the drivers in the kernel), but has its own interface that is even better than the OSS drivers. A list of features can be found at http://www.alsa-project.org/intro.html Please note that the ALSA drivers are still under development. Things may change over time, and some programs that rely on ALSA only work under specific versions of it. Apart from that: I think they're great. I use ALSA for 10 months now and will never go back to the dark ages of closed source sound drivers - hint ;) The main page of the ALSA project is http://www.alsa-project.org/ 3.3. Supported hardware The ALSA drivers support only a subset of all sound cards available. As the time of writing, the following cards are supported. · Cards with a Trident 4D Wave DX/NX chipset, thanks to Trident Microsystems who offered ALSA ``first cut'' GPL'd drivers (MIXER and PCM devices only) and documentation for their 4D Wave PCI audio chipsets. See http://www.tridentmicro.com/HTML/products%20folder/audio.htm for more information. Cards using this chipset include: Best Union Miss Melody 4DWave PCI, HIS 4DWave PCI, Warpspeed ONSpeed 4DWave PCI, AzTech PCI 64-Q3D, Addonics SV 750, CHIC True Sound 4Dwave, Shark Predator4D-PCI and Jaton SonicWave 4D. · Gravis Ultrasound (GUS): ``PnP'', Extreme, Classic/ACE, MAX · Cards with a GUS chipset: Dynasonic 3-D, STB Sound Rage 32, UltraSound 32-Pro (STB), ExpertColor MED3201 and others with AMD InterWave(TM) chip, notably some STB cards by Compaq · Soundblaster: 1.0, 2.0, Pro, 16, AWE32/64, PCI64 · ESS AudioDrive ESx688 · ESS ES968 chip based cards (PnP only). · ESS ES18xx (chipsets). Please note that I personally experienced a lot of trouble with the ESS1888. The developer of the driver for this card did his best, but to no avail. · ESS Solo-1 ES1938 and ES1946. Only one of the two channels works, which means that recording is not possible. The author of the ES1938 code ``is aware of the problem and is currently investigating it''. · Yamaha: OPL3-SA2, OPL3-SA3 (chipsets) · OAK Mozart · Schubert 32 PCI (PINE, S3 SonicVibes PCI chipset) · Ensoniq AudioPCI ES1370/1371 PCI soundcards (Soundblaster PCI64) · SonicVibes PCI soundcards (PINE Schubert 32 PCI) · ForteMedia FM801 based cards (in 0.3.2) · OPTi 82C9xx chipset based soundcards · AD1847, AD1848 and CS4248 chipset based cards · AZT2320 chip based soundcards (PnP only). · Advance Logic ALS100/ALS120 based cards · C-Media CMI8330 based cards Then a whole lot of Crystal Semiconductors-based sound boards are supported. These chips can be found in a lot of hardware, in separate cards (some Philips PCA series) and on motherboards (e.g. IBM Aptiva, Dell computers). Boards based on the following chipsets are supported: · 4231 · 4232 · 4232A · 4235 · 4236B · 4237B · 4238B · 4239 · 4280 · 4610 · 4612 · 4614 · 4615 · 4680 The best thing is: ALSA now supports computers without a soundcard to produce video! This is done with a dummy driver, that tricks programs like Realplayer into thinking that there is a sound card available. A more recent list may be found inside the driver package itself, that is in doc/SOUNDCARDS 3.4. Other HOWTO's This ALSA-sound-mini-HOWTO is just mini - although it seems to grow fast. Other HOWTO's may help you out in case this one is too terse. I will name a few things you may come across while trying to install the ALSA drivers. HOWTO's can generally, be found at mirrors of Metalab (the former Sunsite). So take a look at http://metalab.unc.edu/LDP/mirrors.html and pick out your closest mirror site. You can find HOWTO's in the directory LDP/HOWTO/. Please note: the links in this document will all be relative to /LDP/HOWTO/mini. If you look at this document from a reasonably good mirror site, you will find the HOWTO's. Then a note for the 2.2.x kernel series. For the 2.2.x kernel series, sound support is like any other support: it works, but it is different from what you used to do. This HOWTO (like any other HOWTO) will from version 2.0pre1 concentrate on the 2.2 series kernel, although I'll try to point out the differences. 3.4.1. Sound cards Perhaps you bought a sound card already, or maybe it has been installed in your computer for ages. And now you are going to use it! Have a look at the Sound-HOWTO to see if this is all worth the trouble. (You might want to buy this new Mega-Rumble-Blaster first, then try the ALSA drivers.) 3.4.2. Plug and Play cards Most modern sound cards for the Intel platform are ISA PnP cards, which is an abbreviation for ''Plug and Play''. This means, that the card has to be configured by the operation system. This has to be done through an initialization routine at boot time. You probably need to configure your card with the PnP-utils-package. Every recent Linux distribution includes these tools. For usage have a look at the Plug- and-Play-HOWTO The ALSA-drivers seem to have built in their own ISA-PnP-support for a couple of sound cards. Unfortunately, as I cannot find documentation about this, I cannot tell you how it works. If anyone out there wants to try ALSA sound support while deliberately not using the ISA-PnP- tools, please drop me a line. 3.4.3. Loadable modules The ALSA sound drivers are built as modules. You can find more information about modules in the Kernel-HOWTO. There is also a module-HOWTO, but that is unmaintained at the moment; take a look at the umaintained section of the Howto-HOWTO. There is a Modules-mini- HOWTO though that may be useful. 3.4.4. Kerneld Another HOWTO that will be useful for some, is the Kerneld-mini-HOWTO. Kerneld is a daemon that installs and removes kernel modules as needed. (I have zero experience with it, so additional information on the topic is welcome. The ALSA driver documentation contains some information about configuration of the kerneld, this has been included in this mini-HOWTO.) As the kernel module loader is included in kernel 2.2.x, things have changed. But as I am one of those guys that rather modprobes something than have some daemon handle it, I have no info on this. 4. How to install ALSA sound drivers 4.1. What you need · a functional Linux system (e.g. the Slackware distribution), with the "Development" packages installed (i.e. gcc, make etc.) · some knowledge about Linux (meaning you know how to use "ls", "cd", "tar" etc.) · a root-account The great thing is: you don't need a supported sound card anymore, as ALSA now has a dummy driver that does nothing! (No, it really does nothing, but some programs will work now that they believe there is a sound card available). If you have a PnP card, you will also need: · the isapnptools software package. The INSTALL text in the driver directory suggests that for some cards, PnP support is native. I also received a suggestion from Jaroslav about this. When I get further information about this topic I will add it to this mini-HOWTO. Please note that you should not have any sound drivers active when you want to use the ALSA drivers. If you have a kernel with sound drivers compiled in, you'll need a kernel recompilation. If you have the old "sound.o" module active, you need to deactivate it. If you use kerneld, this probably means deleting sound.o from the /lib/modules//misc directory. Newer RedHat systems have a different sound approach, with several sound modules active. You need to deactivate them all. The 2.2 series kernel has a new approach to sound. You should include sound support here ! Yep, that's right: you add sound support to the kernel, but do not include any sound card. Then compile and install the kernel and after that, compile the ALSA-drivers. 4.2. Getting the drivers The ALSA drivers are available from ftp://ftp.alsa-project.org/pub/ and there are mirrors at · US: ftp://ftp.silug.org/pub/alsa · US: ftp://ftp.eecs.umich.edu/pub/linux/alsa · Netherlands: ftp://linux.a2000.nl/alsa · Poland: ftp://ftp.task.gda.pl/pub/linux/misc/alsa · Germany: ftp://ftp.tu-clausthal.de/pub/linux/alsa · Slovakia: ftp://ftp.phacka.sk/pub/alsa · Australia: ftp://ftp.suburbia.com.au/pub/alsa For a fully functional ALSA-installation, you will need the driver, the libs and the utilities; e.g if you chose the A2000 mirror you would get ftp://linux.a2000.nl/alsa/driver/alsa-driver-0.4.1e.tar.gz, ftp://linux.a2000.nl/alsa/lib/alsa-lib-0.4.1d.tar.gz and ftp://linux.a2000.nl/alsa/utils/alsa-utils-0.4.1.tar.gz 4.3. ALSA versions The ALSA drivers have come a long way. Development started during the 2.0 version kernel, then the 2.2 series showed up (with their own sound kernel). As the 0.4 versions work perfectly for me, I think it is safe to use 0.4.1e (or newer, if you want). If you have a 2.0.x kernel, you will definately not want to use 0.3.0 or later. Instead, use alsa- driver-0.3.0-pre4, alsa-lib-0.3.0-pre4 and alsa-utils-0.3.0-pre3. The older versions, 0.2.0-pre10p3 and older do work under 2.0.x, but I cannot get them to work under 2.2.x (probably due to the lack of interfacing with the soundcore module of the kernel). 4.4. Extracting You extract the drivers by some reasonable command, like the all-time tar -zxf . Most likely you would do that in the /usr/src directory, so you need root priviliges for this. Type ``su'' and then the root password to become root. But please note: it is unwise to use your system as the ``root'' user if it is not necessary. So: cd /usr/src tar -zxf ~/alsa-driver-0.4.1e.tar.gz tar -zxf ~/alsa-lib-0.4.1d.tar.gz tar -zxf ~/alsa-utils-0.4.1.tar.gz Also working and more fun: find ~ -name alsa* -exec tar -zxf {} \; (Don't try this at home kids, it's just an example). Note that when downloading the drivers with Netscape, you may accidentally get unpacked drivers with a ".tgz" extension. If tar complains about the file format, you may get better results by leaving off the "z" in the tar options. 4.5. Compiling You need the drivers before you can compile and use the libs. You need the libs before you can compile or use the utils. So let's begin: cd alsa-driver-0.4.1e (and for those not so experienced: try typing a (the "tab"-key) after "alsa-d". That's called command line completion.) ./configure If you want to use the built-in PnP interfacing, you should use ./configure --with-isapnp=yes make Now you need to be 'root' to install the stuff (you probably were "root" already) make install If this tells you that something like ``version.h'' cannot be found, then you probably do not have a proper kernel source tree. You need a couple of files of your kernel source to be able to compile the ALSA- drivers. Unpack your favorite linux-2.x.y.tar.gz in /usr/src, and issue a make menuconfig. (Actually, make symlinks may be enough). Now compile the libraries: cd ../alsa-lib-0.4.1d ./configure make make install OK, you're getting it, the utilities: cd ../alsa-utils-0.4.1 ./configure make make install Note: you can leave out the "make install" for the utilities at first. You could even leave out the whole library-making and utility-making, just to check if the driver works. 4.6. Preparing the devices There is a script in the driver-directory that will install the ALSA- sound-devices in your /dev directory. Type ./snddevices from the driver-directory. There should be a /dev/snd subdirectory now (test if it is there. If you are not familiar with even the "ls" com­ mand, please consider reading other HOWTO's first. You should have some basic Linux knowledge to install these drivers). Now you're ready to insert the driver, so please turn over to the next paragraph. 5. Loading the driver There are two ways to use the ALSA-sound-modules. I personally prefer using the manual method, meaning that I insert the driver at startup. The ALSA-drivers were designed as loadable/unloadable modules - for instance they do not reset the mixer after loading - so you can easily use the kerneld approach. Please do read the section ``Backwards Compatibility''. You need it to have sound support ``the old way''. 5.1. Inserting with modprobe Note: If you have a PnP audio-card, you first need to set it to the right (or at least some known) IO/IRQ/DMA. See the Plug-and-Play- HOWTO. Did you configure your Plug-and-Play-soundcard? Ok, then read on please. The main part is: do a "modprobe snd-card-". This should do the trick. Please note that not all distributions do include /sbin in your path. If you get a "bash: modprobe: command not found", this will most likely mean that modprobe is not in your path. Try ``/sbin/modprobe snd-card-sb16'', or try to find the modprobe utility elsewhere. The most important difficulty is with the Crystal chipsets, for these the ALSA-drivers are not auto probing. More recent information may be acquired from the INSTALL file in the driver-directory. Two examples, then a list: Gravis UltraSound (GUS) and compatibles: /sbin/modprobe snd-card-gusclassic For all 16-bit Soundblaster-cards (SoundBlaster 16 (PnP), SoundBlaster AWE 32 (PnP), SoundBlaster AWE 64 (PnP): /sbin/modprobe snd-card-sb16 However, if you have a 0.3.0-pre4 package, the GUS Classic driver is called ``snd-gusclassic'' and the SoundBlaster 16 module is called ``snd-sb16'' (so, without the ``card'' part). 5.2. Which module for which card? Please note that ALSA versions before 0.4.x sometimes had different names. You need to leave out the ``card'' part for those drivers. This is indicated by an asterisk (*). 5.2.1. Gravis UltraSound Extreme``(*)'' modprobe snd-card-gusextreme 5.2.2. Gravis UltraSound MAX``(*)'' modprobe snd-card-gusmax 5.2.3. ESS AudioDrive``(*)'' ESS AudioDrive ES-1688 and ES-688 soundcards modprobe snd-card-audiodrive1688 5.2.4. ESS AudioDrive 18xx``(*)'' ESS AudioDrive ES-18xx based soundcards modprobe snd-card-audiodrive18xx 5.2.5. Gravis UltraSound PnP``(*)'' Gravis UltraSound PnP, Dynasonic 3-D/Pro, STB Sound Rage 32, ExpertColor MED3201 and other soundcards based on AMD InterWave(TM) chip. modprobe snd-card-interwave 5.2.6. UltraSound 32-Pro``(*)'' UltraSound 32-Pro (soundcard from STB used by Compaq) and other soundcards based on AMD InterWave (tm) chip with TEA6330T circuit for extended control of bass, treble and master volume modprobe snd-card-interwave-stb 5.2.7. Soundblaster``(*)'' 8-bit Soundblaster cards (SoundBlaster 1.0, SoundBlaster 2.0, SoundBlaster Pro) modprobe snd-card-sb8 5.2.8. Soundblaster 16``(*)'' 16-bit SoundBlaster cards (SoundBlaster 16 (PnP), SoundBlaster AWE 32 (PnP), SoundBlaster AWE 64 (PnP). Please note: this module does not support the SoundBlaster VibraX16 soundcard. modprobe snd-card-sb16 5.2.9. OAK Mozart``(*)'' modprobe snd-mozart 5.2.10. OPTi 82C9xx``(*)'' Various sound cards that use the OPTi 82C9xx chipset, like Audio 16 Pro EPC-SOUN9301 (82C930 based), ExpertColor MED-3931 v2.0 (82C931 based), ExpertMedia Sound 16 MED-1600 (82C928 based - AD1848), Mozart S601206-G (OPTI601 based - CS4231) and Sound Player S-928 modprobe snd-card-opti9xx 5.2.11. AD1847/48 and CS4248 modprobe snd-card-ad1848 5.2.12. Yamaha OPL3-SA2/SA3 soundcards``(*)'' Just "modprobe snd-opl3sa" will not work, this driver does not do autoprobing. See below. 5.2.13. S3 SonicVibes``(*)'' S3 SonicVibes PCI soundcards. (PINE Schubert 32 PCI) modprobe snd-card-sonicvibes 5.2.14. Ensoniq/Soundblaster PCI64``(*)'' Ensoniq AudioPCI ES1370/1371 PCI soundcards. (SoundBlaster PCI 64) modprobe snd-card-audiopci 5.2.15. CS4231 Just ``modprobe snd-card-cs4231'' will not work, no auto-probing. See below. 5.2.16. CS4232/4232A All soundcards based on CS4232/CS4232A chips. Just "modprobe snd- card-cs4232" will not work, no auto-probing. See below. 5.2.17. 4235 and higher All soundcards based on CS4235/CS4236/CS4236B/CS4237B/CS4238B/CS4239 chips. Just "modprobe snd-card-cs4236" will not work, no auto- probing. See below. 5.2.18. 4610/4612/4615 and 4680 modprobe snd-card-cs461x 5.2.19. ESS Solo 1``(*)'' ESS Solo-1, 128iPCI card (es1938, ESS-SOLO-1). Jonas Lofwander sent me a link to a document that will help you installing this card - which is, basically, nothing more than modprobe snd-card-esssolo1 ... but http://dice.shopcenter.nu/alsa/ can be of help. If you have an IBM Thinkpad 1412 you can also refer to http://www.geocities.com/SiliconValley/Peaks/3649/1412.html, thanks to Kumar Sankaran. 5.2.20. Trident 4DWave DX/NX``(**)'' Best Union Miss Melody 4DWave PCI, HIS 4DWave PCI, Warpspeed ONSpeed 4DWave PCI, AzTech PCI 64-Q3D, Addonics SV 750, CHIC True Sound 4Dwave, Shark Predator4D-PCI, Jaton SonicWave 4D. modprobe snd-card-trident 5.2.21. ForteMedia FM801 These are PCI cards based on the FM801 chip. modprobe snd-card-fm801 (*) For ALSA version 0.3.0-pre4, you need to leave out the ``card-'' part in most (not all!) of the drivernames. So ``snd-card-sb16'' becomes ``snd-sb16'', however, ``snd-card-cs4232'' remains ``snd-card- cs4232'' (modprobe snd-cs4232 will do something, but it will not produce any sound!) (**) In older ALSA versions this driver was called ``snd-card- trid4wave'' and ``snd-trid4wave''. 5.3. modprobe for drivers without auto-probing If you have a non-autoprobing driver, you need to supply additional info at startup to have the driver work. More information can be found in the file INSTALL in the driver directory. 5.3.1. OPL3-SA2 and OPL3-SA3 According to the INSTALL file you need to supply all the information for this driver. If you initialized the card with the isapnp-tools, you can probably get info from the /etc/isapnp.conf file for the following values: snd_port - control port # for OPL3-SA chip snd_wss_port - WSS port # for OPL3-SA chip (0x530,0xe80,0xf40,0x604) snd_midi_port - port # for MPU-401 UART (0x300,0x330), -1 = disable snd_fm_port - FM port # for OPL3-SA chip (0x388), -1 = disable snd_irq - IRQ # for OPL3-SA chip (5,7,9,10) snd_dma1 - first DMA # for Yamaha OPL3-SA chip (0,1,3) snd_dma1_size - max first DMA size in kB (4-64kB) snd_dma2 - second DMA # for Yamaha OPL3-SA chip (0,1,3), -1 = disable snd_dma2_size - max second DMA size in kB (4-64kB) You would do a "modprobe snd-card-opl3sa snd_port=0xNNN snd_wss_port=0x530 snd_midi_port=-1 snd_fm_port=0x388 snd_irq=5 snd_dma1=0 snd_dma1_size=NN snd_dma2=1 snd_dma2_size=NN" to load this driver (without midi-support. I am still convinced that midi-support is the thing you need when you have synthesizers and stuff and want to connect them to your Linux box. Never needed Midi-support even to play midi-files.) Note that the "NN" values need to be supplied, only I do not know what would be reasonable values. I do not know if the dma size option is really required. If you happen to have an IBM Thinkpad with this chipset, then http://www.cirs.org/patrick/index.html might be of help. If you use the driver from 0.3.0-pre4, then leave out the ``card-'' part in the name. 5.3.2. CS4231 chips According to the INSTALL file you need to supply the main port for this card. Note that with the driver for 3235/6/7/8/9 cards, the one below, I ended up supplying all information (except DMA-size), otherwise the driver did not work. So you may as well use the whole command line to insert the driver. If you initialized the card with the isapnp-tools, you can probably get info from the /etc/isapnp.conf file for the following values: snd_port - port # for CS4232 chip (PnP setup - 0x534) snd_mpu_port - port # for MPU-401 UART (PnP setup - 0x300), -1 = disable snd_irq - IRQ # for CS4232 chip (5,7,9,11,12,15) snd_mpu_irq - IRQ # for MPU-401 UART (9,11,12,15) snd_dma1 - first DMA # for CS4232 chip (0,1,3) snd_dma1_size - max first DMA size in kB (4-64kB) snd_dma2 - second DMA # for Yamaha CS4232 chip (0,1,3), -1 = disable snd_dma2_size - max second DMA size in kB (4-64kB) You would do a "modprobe snd-card-cs4231 snd_port=0x534 snd_mpu_port=-1 snd_irq=5 snd_dma1=0 snd_dma1_size=NN snd_dma2=1 snd_dma2_size=NN" to load the driver for a "standard configured" soundcard. (Without midi-support, see the note at Yamaha OPL-3). If you used different values in /etc/isapnp.conf, then you would use the values here also (Note: it can be wise to use your brains anyway ;) Note that the "NN" values need to be supplied, only I do not know what would be reasonable values. I do not know if the dma size option is really required. 5.3.3. CS4232/CS4232A chips According to the INSTALL file you need to supply the main port for this card. Note that with the driver for 3235/6/7/8/9 cards, the one below, I ended up supplying all information (except DMA-size), otherwise the driver did not work. So you may as well use the whole command line to insert the driver. If you initialized the card with the isapnp-tools, you can probably get info from the /etc/isapnp.conf file for the following values: snd_port - port # for CS4232 chip (PnP setup - 0x534) snd_cport - control port # for CS4232 chip (PnP setup - 0x120) snd_mpu_port - port # for MPU-401 UART (PnP setup - 0x300), -1 = disable snd_fm_port - FM port # for CS4232 chip (PnP setup - 0x388), -1 = disable snd_jport - joystick port for CS4232 chip (PnP setup - 0x200), -1 = disable snd_irq - IRQ # for CS4232 chip (5,7,9,11,12,15) snd_mpu_irq - IRQ # for MPU-401 UART (9,11,12,15) snd_dma1 - first DMA # for CS4232 chip (0,1,3) snd_dma1_size - max first DMA size in kB (4-64kB) snd_dma2 - second DMA # for Yamaha CS4232 chip (0,1,3), -1 = disable snd_dma2_size - max second DMA size in kB (4-64kB) You would do a "modprobe snd-card-cs4232 snd_port=0x534 snd_cport=0x120 snd_mpu_port=-1 snd_fm_port=0x388 snd_jport=-1 snd_irq=5 snd_dma1=0 snd_dma1_size=NN snd_dma2=1 snd_dma2_size=NN" to load the driver for a "standard configured" soundcard. (Without midi- support, see the note at Yamaha OPL-3, and no joystick support). If you used different values in /etc/isapnp.conf, then you would use the values here also (Note: it can be wise to use your brains anyway ;) Note that the "NN" values need to be supplied, only I do not know what would be reasonable values. I do not know if the dma size option is really required. 5.3.4. CS4235/CS4236/CS4236B/CS4237B/CS4238B/CS4239 chips According to the INSTALL file you need to supply the main port and control ports for this card. Note that with a CS4237B card, I ended up supplying all information (except DMA-size), otherwise the driver did not work. So you may as well use the whole command line to insert the driver, and not only supply snd_port and snd_cport. If you initialized the card with the isapnp-tools, you can probably get info from the /etc/isapnp.conf file for the following values: snd_port - port # for CS4232 chip (PnP setup - 0x534) snd_cport - control port # for CS4232 chip (PnP setup - 0x120) snd_mpu_port - port # for MPU-401 UART (PnP setup - 0x300), -1 = disable snd_fm_port - FM port # for CS4232 chip (PnP setup - 0x388), -1 = disable snd_jport - joystick port for CS4232 chip (PnP setup - 0x200), -1 = disable snd_irq - IRQ # for CS4232 chip (5,7,9,11,12,15) snd_mpu_irq - IRQ # for MPU-401 UART (9,11,12,15) snd_dma1 - first DMA # for CS4232 chip (0,1,3) snd_dma1_size - max first DMA size in kB (4-64kB) snd_dma2 - second DMA # for Yamaha CS4232 chip (0,1,3), -1 = disable snd_dma2_size - max second DMA size in kB (4-64kB) You would do a "modprobe snd-card-cs4236 snd_port=0x534 snd_cport=0x120 snd_mpu_port=-1 snd_fm_port=0x388 snd_jport=-1 snd_irq=5 snd_dma1=0 snd_dma1_size=NN snd_dma2=1 snd_dma2_size=NN" to load the driver. (Without midi-support, see the note at Yamaha OPL-3, and no joystick support). Notes: · the "NN" values need to be supplied, only I do not know what would be reasonable values. · my CS4237B works fine without explicit dma size option. 5.4. The kerneld approach kerneld is a daemon that inserts modules on request, and unloads them once they are not in use anymore. Since I have no experience with kerneld, I do not know if the information below is accurate. The info comes from the INSTALL file in the ALSA-drivers package. Excellent information about kerneld can be found in the kerneld-mini-HOWTO. Follow these steps: · Edit your /etc/conf.modules (see below for examples) · Run 'modprobe snd-card' where card is name of your card [Which I find rather strange, since kerneld is supposed to load them? VS] Example for /etc/conf.modules for Gravis UltraSound PnP soundcard: alias char-major-14 snd alias snd-minor-oss-0 snd-interwave alias snd-minor-oss-3 snd-pcm1-oss alias snd-minor-oss-4 snd-pcm1-oss alias snd-minor-oss-5 snd-pcm1-oss alias snd-minor-oss-12 snd-pcm1-oss alias snd-card-0 snd-interwave options snd snd_major=14 snd_cards_limit=1 options snd-interwave snd_index=1 snd_id="guspnp" snd_port=0x220 snd_irq=5 snd_dma1=5 snd_dma2=6 Example if you want use more soundcards in one machine (configuration below is for Sound Blaster 16 and Gravis UltraSound Classic): alias char-major-14 snd alias snd-minor-oss-0 snd-mixer alias snd-minor-oss-3 snd-pcm1-oss alias snd-minor-oss-4 snd-pcm1-oss alias snd-minor-oss-5 snd-pcm1-oss alias snd-minor-oss-12 snd-pcm1-oss alias snd-card-0 snd-sb16 alias snd-card-1 snd-gusclassic options snd snd_major=14 snd_cards_limit=2 options snd-sb16 snd_index=1 snd_port=0x220 snd_irq=5 snd_dma8=1 snd_dma16=5 options snd-gusclassic snd_index=2 snd_irq=11 snd_dma1=6 snd_dma2=7 Example if two Gravis UltraSound Classic soundcards are present in system: alias char-major-14 snd alias snd-minor-oss-0 snd-mixer alias snd-minor-oss-3 snd-pcm1-oss alias snd-minor-oss-4 snd-pcm1-oss alias snd-minor-oss-5 snd-pcm1-oss alias snd-minor-oss-12 snd-pcm1-oss alias snd-card-0 snd-gusclassic alias snd-card-1 snd-gusclassic options snd snd_major=14 snd_cards_limit=2 options snd-gusclassic snd_index=1,2 snd_port=0x220,0x260 snd_irq=5,11 snd_dma1=5,6 snd_dma2=7,3 5.5. Backwards compatibility If you want to preserve OSS/Free or OSS/Linux compatibility, you need to insert one more driver: the snd-pcm1-oss driver for OSS- compatibility. Issue a modprobe snd-pcm1-oss This will give you /dev/audio and /dev/dsp-support, just as the OSS/Free (kernel) drivers and OSS/Linux (the $25 ones) do. Note that this is only an emulation. 6. Testing and using Now you should test if the sound driver really is available, then try to use it. 6.1. The /proc filesystem You can find a lot of useful information about your system in the /proc subdirectory. /proc is a "virtual" filesystem, meaning that it does not exist in real life, but merely is a mapping to various processes and tasks in your computer. In order for /proc to work, you need to have support for it compiled into your kernel. Most linux distributions have this as a default, but if you compiled a kernel and left /proc out obviously there won't be anything in /proc. /proc/modules gives information about loaded modules. Once the ALSA sound drivers are loaded, if you type cat /proc/modules you should see something like: snd-pcm1-oss 4 0 snd-sb16 1 1 snd-sb-dsp 4 [snd-sb16] 0 snd-pcm1 4 [snd-pcm1-oss snd-sb-dsp] 0 snd-pcm 3 [snd-pcm1-oss snd-sb16 snd-sb-dsp snd-pcm1] 0 snd-mixer 3 [snd-pcm1-oss snd-sb16 snd-sb-dsp] 1 snd-mpu401-uart 1 [snd-sb16] 0 snd-midi 4 [snd-sb16 snd-sb-dsp snd-mpu401-uart] 0 snd-opl3 1 [snd-sb16] 0 snd-synth 1 [snd-sb16 snd-opl3] 0 snd-timer 1 [snd-opl3] 0 snd 8 [snd-pcm1-oss snd-sb16 snd-sb-dsp snd-pcm1 snd-pcm snd-mixer snd-mpu401-uart snd-midi snd-opl3 snd-synth snd-timer] 0 If something went wrong during the installation of the driver, you will still see a couple of "snd" devices, but there won't be sound support. For example (Note: you should never issue this command as follows, the cs4236 driver needs options): win3:~# modprobe snd-card-cs4236 /lib/modules/2.0.35/misc/snd-card-cs4236.o: init_module: Device or resource busy snd-mixer: Device or resource busy win3:~# cat /proc/modules snd-cs4236 2 0 snd-cs4231 3 [snd-cs4236] 0 snd-timer 1 [snd-cs4231] 0 snd-pcm1 4 [snd-cs4236 snd-cs4231] 0 snd-mixer 3 [snd-cs4236 snd-cs4231] 0 snd-pcm 3 [snd-cs4236 snd-cs4231 snd-pcm1] 0 snd-mpu401-uart 1 0 snd-midi 4 [snd-mpu401-uart] 0 snd-opl3 1 0 snd-synth 1 [snd-opl3] 0 snd-timer 1 [snd-cs4231 snd-opl3] 0 snd 8 [snd-cs4231 snd-timer snd-pcm1 snd-mixer snd-pcm] 0 You can check the existence of a soundcard by looking in /proc/asound/cards. For example: bash$ cat /proc/asound/cards 0 [card1 : SB16 - Sound Blaster 16 Sound Blaster 16 at 0x220, irq 5, dma 1&5 In the previous example (where I forgot the options) the output would have been: win3:~# cat /proc/asound/cards --- no soundcards --- A working CS4236 card would produce 0 [card1 ]: CS4236 - CS4237B CS4237B at 0x534, irq 7, dma 1&0 If you checked and doublechecked your settings and still see no sound card, take a look at the troubleshooting section. The /proc/asound/ virtual directory shows lots of other information about the driver. Please note that /proc/asound/ will only exist after you inserted the first ALSA module. If there is no /proc/asound, it simply means that the "snd" module was not loaded properly. You can find installed cards in /proc/asound/cards, then find information about card0 in /proc/asound/0, /proc/asound/1 for card1 etcetera. If cat /proc/asound/card1/pcm0 shows something like ES1370 DAC2/ADC Playback isn't active. Record isn't active. this means that your driver is ready to go, but is not doing anything right now. (So everything went well). For users of a 2.0.x kernel there is a third method to find information about the sound devices, namely if you inserted the OSS compatible driver there is a /dev/sndstat device. The ALSA drivers kindly request that you not to rely on this information as it is only there for compatibility with the OSS drivers and better information can easily be obtained from /proc/asound/. In kernel 2.2.x ALSA uses the kernel soundcore and therefor cannot emulate /dev/sndstat, since it would interfere with the OSS drivers. 6.2. The mixer Once the drivers for your sound card have been installed and your /proc filesystem tells you so, you can try to make a real sound. To do this, you need to set the mixer volumes to a reasonable value. You need the ``amixer'' from the alsa-utils package for this. First of all, install the utility package, or at least put the "amixer" command in some reasonable place (like /usr/local/bin). Version 0.3.2 and later have an interface that differs from the OSS drivers. If you type just ``amixer'' you will see the mixer elements and their value. One of these elements could be ``Master volume'' for example, and could look like: Group 'Master',0 Capabilities: volume Channels: Front-Left Front-Right Limits: min = 0, max = 31 Front-Left: 31 [100%] [on] [---] Front-Right: 26 [84%] [on] [---] Unfortunately, I do not know how to set left and right volumes independently. With amixer, you can change volumes with the ``amixer set'' command. For example, to change the Master volume, you would issue a amixer set Master 15 Please note that the names of the elements can be different for different types of sound cards. Also note that amixer is case dependent, so ``amixer set masteR 10'' will not work. For more information, please look in the amixer man page. If you have a 0.3.0-pre4 ALSA, then amixer works just like normal mixer programs. You can look at the mixer settings by typing ``amixer''. This command lists the ``mixer settings'', or as you would normally call it, the volume settings of the various parts of the soundcard. The output from amixer can greatly differ from card to card. My Soundblaster 16 shows: Master 0 % (-14.00dB) : 0 % (-14.00dB) Bass 0 % (-14.00dB) : 0 % (-14.00dB) Treble 0 % (-14.00dB) : 0 % (-14.00dB) Synth 0 % (-62.00dB) : 0 % (-62.00dB) PCM 0 % (-62.00dB) : 0 % (-62.00dB) Line-In 0 % (-62.00dB) : 0 % (-62.00dB) Mute MIC 0 % (-62.00dB) : 0 % (-62.00dB) Mute CD 0 % (-62.00dB) : 0 % (-62.00dB) Mute In-Gain 0 % (-18.00dB) : 0 % (-18.00dB) Out-Gain 0 % (-18.00dB) : 0 % (-18.00dB) PC Speaker 0 % (-18.00dB) : 0 % (-18.00dB) If you only get a message like ``amixer: Specify command...'', then you are using the ALSA 3.2 utilities. I suggest you to upgrade to 0.4.1e or later, or to go back to 0.3.0-pre4. 6.2.1. Mixer settings for playing You have noticed the "Mute" entry for some devices. This means that this particular device will be zeroed out, whatever volume setting you use. Some cards (the CS4237B in the example) even mute their master channel. So, for the CS4237B, I would have to type amixer set "Master d" unmute to even be able to produce any sound at all. The Soundblaster does not have muted output, but amixer set Master 100 unmute would set the volume to 100% - and unmute it if it would have been muted. You can use a number, a word like "mute" or "unmute", or both. Type amixer set "Master d" 100; amixer set PCM 100 unmute to set the CS4237B card to maximum master volume and unmute PCM volume and set it to maximum. If you use an older version of amixer, you need to leave out the ``set'' part of the command, so you would just type amixer "master d" 100 6.2.2. Mixer parts The various mixer parts may confuse you if you have no knowledge of digital sound production. The sound-HOWTO may help a bit, but a very short introduction is here. You will probably only need few mixer elements: one of them is the ``CD'' setting (this is analog sound of your CD player, most CD players are connected with a 3 or 4 wire red/white/black cable). The ``PCM'' setting is used for most applications. Programs like mpg123, xmms, speakfreely, realplayer and most others use the PCM channel. ``MIC'' stands for microphone, ``line-in'' is an (optional) extra input at the back of your sound card. The various ``gain'' parts offer extra amplification for various uses and are pretty self-explanatory. (Like: record-gain is extra amplification for the recording channel, which can be useful if you use a microphone). 6.2.3. Mixer settings for recording You would set the CD channel to record by typing amixer set CD capture and stop the recording setting again by typing amixer set CD nocapture. Note that older amixer programs use ``amixer CD rec'' and ``amixer CD norec'' for this. If you would like to record something from the microphone, you would probably use amixer set "Input Gain" 100; amixer set Mic 100 capture mute. (Using the microphone input unmuted will produce loud high-pitched sound if your mic picks up its own signal from the speakers again). Most microphones have a ``gain'' setting to boost the microphone volume; you are most likely going to need it to pick up any sound from the microphone at all. Again, older amixer programs use ``amixer "input gain" 100; amixer mic 100 rec mute''. 6.2.4. Other mixer settings Unfortunately I have not been able to change the volume of the "3d center" and "3d space" settings with amixer 0.3.0-pre4. I haven't tried yet with 0.4.1e (this particular machine is still running 2.0.38). If anyone succeeds please let me know. I can use alsamixer for this job, but alsamixer was not ported to the 0.4.1e version yet. The ALSA FAQ says that it is possible to restore mixer settings with cat > /proc/asound/#/mixerC0D0, where was obtained from /proc/asound/#/mixerC0D0. I have not been able to reproduce this as my system complains about non-existing devices. Then there is the ``alsactl'' program, which I don't use. I invite you (yes, you!) to write this section. 6.3. The /dev/snd/ devices The alsa drivers have native sound-devices in the /dev/snd/ directory. If you have one card you might see the following devices: /dev/snd/pcmC0D0 - the raw audio device for the card /dev/snd/mixerC0D0 - the mixer for card 0 /dev/snd/controlC0D0 - the control device for card 0 The first number means the number of the soundcard, the second number (if any) is the number of the device. A sound card with two PCM devices would have a pcmC0D0 and pcmC0D1 device. Please note: the ALSA devices have changed between the previous version. Older ALSA drivers use /dev/snd/pcm00 (first number is the card, second number is the device). If this HOWTO uses the older notation, please drop me a line so I can correct it. Now you are ready to put any soundfile you want into the PCM device of the first card. So try to cat any textfile (any file) to /dev/snd/pcmC0D0, like this: cat > /dev/snd/pcmC0D0. The filename can be any file, as long as it has some length. If you have a soundfile lying around somewhere, you could try that. You could also get the file at http://www.ldp.org/sounds/english.au this is Linus Torvalds saying how to pronounce Linux. The default setting of your sound device is 8000 Hz, 8 bit. That means that the "english.au" file mentioned above will produce speech, other test files will probably just produce noise. If you do not hear anything, check your speakers, try to run "amixer" again or consult a doctor. (Later on you can easily use the full 48 KHz, 16 bit features of your sound card, by using your favourite sound player like sox or mpg123). If you loaded the ``snd-pcm1-oss'' module, you can also use the OSS- compatibility to access your sound card. The following mappings are made: /dev/snd/pcmC0D0 -> /dev/audio0 (/dev/audio) -> minor 4 /dev/snd/pcmC0D0 -> /dev/dsp0 (/dev/dsp) -> minor 3 /dev/snd/pcmC0D1 -> /dev/adsp0 (/dev/adsp) -> minor 12 /dev/snd/pcmC1D0 -> /dev/audio1 -> minor 4+16 = 20 /dev/snd/pcmC1D0 -> /dev/dsp1 -> minor 3+16 = 19 /dev/snd/pcmC1D1 -> /dev/adsp1 -> minor 12+16 = 28 /dev/snd/pcmC2D0 -> /dev/audio2 -> minor 4+32 = 36 /dev/snd/pcmC2D0 -> /dev/dsp2 -> minor 3+32 = 35 /dev/snd/pcmC2D1 -> /dev/adsp2 -> minor 12+32 = 44 You probaly want to use the ``snd-mixer-oss'' module as wel, so you can use the backwards compatible mixer. 6.4. Additional information The INSTALL file in the ALSA driver directory mentions some tricks to tell the driver which settings to use. If you need these commands it will depend on the application you use to play sound. Regular sound playing applications, like mpg123, sox (mostly called with the ``play'' command), or X11 applications like RealPlayer will probably do fine without these. I never used these anyway. 6.4.1. /proc/asound/#/pcm#0 "Playback erase" - erase all additional informations about OSS applications "Playback []" "Record erase" - erase all additional informations about OSS applications "Record []" - name of application with (highter priority) or without path - number of fragments or zero if auto - size of fragment in bytes or zero if auto - optional parameters WR_ONLY - if application tries open pcm device with O_RDWR driver rewrites this to O_WRONLY (playback) - good for Quake etc... Examples: echo "Playback x11amp 128 16384" > /proc/asound/0/pcm0o echo "Playback squake 0 0 WR_ONLY" > /proc/asound/0/pcm0o 6.4.2. /proc/asound/#card#/sb16 "Playback 8" -> driver will use always 8-bit DMA channel for playback. "Playback 16" -> driver will use always 16-bit DMA channel for playback. "Playback auto" (default) -> driver will use auto mode (first opened direction will use 16-bit DMA channel). "Record 8" -> driver will use always 8-bit DMA channel for record. "Record 16" -> driver will use always 16-bit DMA channel for record. "Record auto" (default) -> driver will use auto mode (first opened direction will use 16-bit DMA channel). Example: echo "Record 16" > /proc/asound/0/sb16 For further reference, please consult the INSTALL file. 7. Tips and Troubleshooting Please take a look at the FAQ file in the sound driver directory. This section is still under construction. 7.1. Compiling the driver 7.1.1. Linux kernel sourcetree If your ALSA drivers do not compile correctly and tell you things about ``version.h'' or other header-files that cannot be found, this can mean that you do not have the kernel header files. Take a look at the kernel-HOWTO, unpack a recent kernel in /usr/src and issue a make config. 7.1.2. Cannot create executables The utils also contain code written in c++. Most of us have a c++ compiler either from gcc or egcs but make sure you also have the libstdc++-devel package installed, else when you run the configure script for the utils, your system will stump you with an error message saying your ``c++ compiler cannot create executables''. 7.2. Loading the driver Please check the following items. 7.2.1. Sound devices ALSA uses special devices in the /dev-tree. Make sure you have run the ./snddevices script in the alsa-drivers source directory. 7.2.2. Sound card compatibility Are you 100% sure that your sound card IS supported ? Do check it again. Sometimes an X123 is not exactly an X123b and you might be wasting time. On the other hand, even a supported card can give you troubles - it took me two hours to figure out the installation of a CS4237B which was, after all, just a fine example of RTFM. 7.2.3. ``Device busy'' or ``unresolved symbols'' You might have a 2.0.x kernel with sound support compiled in, or the OSS/Lite (kernel) sound driver could be loaded (check with cat /proc/modules). Remove the driver or recompile the kernel (have a look at the Kernel-HOWTO). The sound module in the 2.0 series kernel is called ``sound.o'' and should not be active. (The ALSA driver ``snd.o'' is OK, though). If you have a 2.2.x series kernel without sound driver compiled in, the ALSA drivers will not work, too. 7.2.3.1. 2.0 kernels I know it this is confusing, so let me try to explain it one more time. If you have a 2.0.x series kernel (the command ``uname -a'' tells you something like ``Linux penguin 2.0.35 #6 Wed Sep 23 10:19:16 CEST 1998 i686 unknown'') then you need to leave out sound drivers in the kernel. ALSA 0.4.x and later do not work with the 2.0 series kernel 7.2.3.2. 2.2 kernels If you have a 2.2.x series kernel you do need the sound drivers. A 2.2 series kernel should be compiled with sound support, but without any sound card driver. So you select sound support but make sure that no specific sound card driver will be compiled. 7.2.4. References to other drivers Another reason why the driver complains that the device is busy could be that the file /etc/conf.modules still has references to the soundcard drivers. You should delete these and leave only the references to the ALSA-driver. (If there are other non-sound-related drivers there, then you can probably leave these as-is). 7.2.5. Unresolved symbols revisited Another source of ``unresolved symbols'' messages could be a new kernel with older drivers. Please recompile the ALSA drivers after you recompile a new kernel. This will make sure that the drivers match your new kernel. 7.2.6. Check the PnP setup Are you sure that your card is active? Take another look at the PnP- HOWTO and check if you activated your sound card correctly. 7.2.7. Are your parameters right ? Check, doublecheck your sound card parameters. Please note: 534 is not 543, nor is 0x534 the same as 534. Also, some sound cards must be loaded by a different name than might be expected. Take a break, a beer or whatever, and look again at your ``modprobe'' command. For example the Crystal 4232 driver should be inserted by modprobe snd-card-cs4232, not ``snd-cs4231'', and the SoundBlaster PCI 64 should be loaded with ``snd-card-audiopci'', not snd-es1370. (It's all in the docs, and even though I wrote the HOWTO, I once spent an evening trying to persuade snd-cs4231 to make sound). 7.3. Driver loaded... but no (or hardly any) sound 7.3.1. Unmuting The ALSA drivers can use the ``muting'' facilities that most soundcards have. If you loaded the sound drivers and everything is fine but you get nothing but silence, then you probably forgot to unmute your card. You need ``amixer'' or ``alsamixer'' for this, both from the ALSA-util package. Just typing amixer set -c 1 Master 70 unmute amixer set -c 1 PCM 70 unmute amixer set -c 1 CD 70 unmute should do for most applications. Please note that for the older amixer command you need to leave out the ``set'' in the command line. 7.3.2. Gain Most sound cards have a separate mixer part for extra input or output boosting. This entry is most likely called the ``gain'', ``in-gain'' for input and ``out-gain'' for output. Setting this gain to an appropriate level will greatly help you getting the maximum volume out of your speakers (think about your parents/neighbours/ears though). So a command like amixer set out-gain 100 unmute will probably help. 7.3.3. OSS/Linux compatibility If this is the first time you use the ALSA drivers and you used the built-in sound drivers before, you probably want to have backwards compatible sound (i.e. use the /dev/pcmX devices). You need to load the ``OSS compatibility driver'' for this. Do a modprobe snd-pcm1-oss && modprobe snd-mixer-oss. (See the end of the section about loading the driver). Please note: snd-pcm1-oss is not equal to snd-pcm1, you need snd-pcm1-oss for old-fashioned sound support and snd-mixer-oss for (you guessed it) the mixer. 7.3.4. Cannot open mixer If you have tried to install a couple of different ALSA versions, then sometimes the mixer cannot be opened anymore. This happens if you have tried 0.3.2 and want to downgrade to 0.3.0-pre4 (IIRC). You should delete all libasound files and links from /usr/lib and then recompile libraries and utils: rm /usr/lib/libasound.* Just to be safe, remove all ALSA sound drivers afterwards, then recom­ pile and install and reload the drivers. 7.4. General suggestions 7.4.1. Try using ``insmod'' It can always be useful to start with "insmod" instead of kerneld. Maybe you actually see the error on screen. 7.4.2. Read the INSTALL file. A lot of information can be found in the INSTALL file in the drivers directory. If your driver won't work check if there is additional information available. 7.4.3. Debug messages As a last resort, you can rebuild the driver and tell it to send debug information to /var/log/messages. Go to the driver-directory with cd /usr/src/alsa-driver-.... and type: ./configure --with-debug=detect; make clean; make Remove the driver (as far as it is active, see below for a general remove statement). Then use the "modprobe" statement you used before to insert the newly compiled driver. Look in /var/log/messages if there are any messages. 7.4.4. If all else fails... If these messages doesn't help you, send a message to the ALSA users mailing list, alsa-user@alsa-project.org. Include the following information: · soundcard name + chip names present on your soundcard · relevant sections in your isapnp.conf if you have ISA PnP soundcard · your conf.modules or line which you activate ALSA driver · all messages from /var/log/messages which should be relevant to the ALSA driver 7.5. Bug reports If you found a bug, the ALSA developers would like to know the following things (at minimum) 1. driver + kernel version: 'cat /proc/asound/version' 2. soundcard info · soundcard name provided by manufacture · list of chips which soundcard have onboard · contents of 'cat /proc/asound/cards' 3. all messages from /var/log/messages which should be relevant to ALSA driver 4. problem description 7.6. Tip: playing CD's If you use kmod/kerneld and the ALSA drivers to play CD's, then kmod/kerneld probably do not load the drivers as expected. This is due to the fact that a command line CD player only tells the CD player to start playing without using any of the devices that tell kmod/kerneld that there is sound to occur. Using modprobe may be your only solution to this problem. 7.7. Tip: installing the MIDI serial driver Normally, the IO port of the serial device is owned by the standard serial device driver. So before you can do ``modprobe snd-serial'' we have to tell the driver to release the serial device. Here is the procedure. setserial /dev/ttyS0 uart none modprobe snd-serial (Replace /dev/ttyS0 with the appropriate /dev/ttySx device if your MIDI device uses a different serial device). 7.8. Tip: new kernel? New modules! After you upgrade your kernel, you probably need to recompile the ALSA drivers. If they are still in the original /usr/src directory, then please do not forget to issue a make clean before you do the ./configure, make, make install thing. Oh, and then there is this anomaly in kernel numbering: a ``2.2.0ac1'' kernel that is ``not a number'' - says the configure script. I think this was resolved in newer scripts, otherwise you should maybe change the kernel version in the source. 7.9. Tip: KDE and ALSA drivers Suppose you have KDE up and running but you cannot get system sounds to work, like for opening windows, changing desktops, etc. Sound works in general. If your cd player and mp3 player and mixer all do work, then it's probably just "kwmsound" that's lacking. So: make sure "kwmsound" is in your startscript ($KDEDIR/bin/startkde) 7.10. Tip: use the ALSA devices If you had sound support in your Linux before, then your applications will probably all point to /dev/pcm0, /dev/audio and /dev/mixer. This is fine, if you use OSS compatibility with the snd-pcm1-oss module. It might be better, however, to use the real ALSA devices, those found in /dev/snd/. 7.11. Tip: removing all modules Removing 10+ modules one by one is not the way to go. Luckily, all modules start with the "snd-" prefix, so a little command line programming will do. You can easily remove ALSA sound by issuing a command like: cat /proc/modules|gawk '/^snd-/{print $1}'|xargs -i rmmod {} Juergen Kahrs wrote: ``I have a script that also removes soundcore and soundlow and sound if present and if they are not in use. This script processes /proc/modules three times so there should not be too many modules left after processing''. His solution is awk '/^snd/||/^sound/&&($3==0){system("rmmod " $1)}' /proc/modules /proc/modules /proc/modules Please note: if some module is dependent on another module you cannot just remove the "higher" one. This means that you might need to issue a second removal statement. (I never encountered this situation though, it seems that you can remove the ALSA modules in the order they appear in /proc/modules). Antares RAID SparcLinux Howto Thom Coates1 , Carl Munio, Jim Ludemann Revised 25 May 2002 -------------------------------------- 1 Thomas D. Coates, Jr. PhD, c/o Neuropunk.Org, PO Box 910385, Lexington KY 40591-0385, tcoates@neuropunk.org Table of Contents   * Chapter 1  Preliminaries   + 1.1  Preamble   + 1.2  Acknowledgements and Thanks   + 1.3  New Versions   * Chapter 2  Introduction   + 2.1  5070 features5070 Main Features   + 2.2  Background   o 2.2.1  RAID levelsRaid Levels   o 2.2.2  RAID linearRAID Linear   o 2.2.3  RAID 1Level 1   o 2.2.4  stripingStriping   o 2.2.5  RAID 0Level 0   o 2.2.6  RAID 2RAID 3Level 2 and 3   o 2.2.7  RAID 4Level 4   o 2.2.8  RAID 5Level 5   * Chapter 3  Installation   + 3.1  compatibilitySBUS Controller Compatibility   + 3.2  hardware installationHardware Installation Procedure   o 3.2.1  Serial Terminal   o 3.2.2  Hard Drive Plant   + 3.3  5070 Onboard Configuration   o 3.3.1  Main Screen Options   o 3.3.2  [Q]uit   o 3.3.3  [R]aidSets:   o 3.3.4  [H]ostports:   o 3.3.5  [S]pares:   o 3.3.6  [M]onitor:   o 3.3.7  [G]eneral:   o 3.3.8  [P]robe   o 3.3.9  Example RAID Configuration Session   + 3.4  Linux Configuration   o 3.4.1  Existing Linux Installation   o 3.4.2  New Linux Installation   + 3.5  Maintenance   o 3.5.1  spares, activatingActivating a spare   o 3.5.2  re-integrating repaired driveRe-integrating a repaired drive into the RAID (levels 3 and 5)   + 3.6  Troubleshooting / Error Messages   o 3.6.1  Out of band temperature detected...   o 3.6.2  ... failed ... cannot have more than 1 faulty backend.   o 3.6.3  When booting I see: ... Sun disklabel: bad magic 0000 ... unknown partition table.   + 3.7  Bugs   + 3.8  Frequently Asked Questions   o 3.8.1  How do I reset/erase the onboard configuration?   o 3.8.2  How can I tell if a drive in my RAID has failed?   + 3.9  command referenceAdvanced Topics: 5070 Command Reference   o 3.9.1  autobootAUTOBOOT - script to automatically create all raid sets and scsi monitors   o 3.9.2  AUTOFAULT - script to automatically mark a backend faulty after a drive failure   o 3.9.3  AUTOREPAIR - script to automatically allocate a spare and reconstruct a raid set   o 3.9.4  BIND - combine elements of the namespace   o 3.9.5  BUZZER - get the state or turn on or off the buzzer   o 3.9.6  CACHE - display information about and delete cache ranges   o 3.9.7  CACHEDUMP - Dump the contents of the write cache to battery backed-up ram   o 3.9.8  CACHERESTORE - Load the cache with data from battery backed-up ram   o 3.9.9  CAT - concatenate files and print on the standard output   o 3.9.10  CMP - compare the contents of 2 files   o 3.9.11  CONS - console device for Husky   o 3.9.12  DD - copy a file (disk, etc)   o 3.9.13  DEVSCMP - Compare a file's size against a given value   o 3.9.14  DFORMAT- Perform formatting functions on a backend disk drive   o 3.9.15  DIAGS - script to run a diagnostic on a given device   o 3.9.16  DPART - edit a scsihd disk partition table   o 3.9.17  DUP - open file descriptor device   o 3.9.18  ECHO - display a line of text   o 3.9.19  ENV- environment variables file system   o 3.9.20  ENVIRON - RaidRunner Global environment variables - names and effects   o 3.9.21  EXEC - cause arguments to be executed in place of this shell   o 3.9.22  EXIT - exit a K9 process   o 3.9.23  EXPR - evaluation of numeric expressions   o 3.9.24  FALSE - returns the K9 false status   o 3.9.25  FIFO - bi-directional fifo buffer of fixed size   o 3.9.26  GET - select one value from list   o 3.9.27  GETIV - get the value an internal RaidRunner variable   o 3.9.28  HELP - print a list of commands and their synopses   o 3.9.29  HUSKY - shell for K9 kernel   o 3.9.30  HWCONF - print various hardware configuration details   o 3.9.31  HWMON - monitoring daemon for temperature, fans, PSUs.   o 3.9.32  INTERNALS - Internal variables used by RaidRunner to change dynamics of running kernel   o 3.9.33  KILL - send a signal to the nominated process   o 3.9.34  LED- turn on/off LED's on RaidRunner   o 3.9.35  LFLASH- flash a led on RaidRunner   o 3.9.36  LINE - copies one line of standard input to standard output   o 3.9.37  LLENGTH - return the number of elements in the given list   o 3.9.38  LOG - like zero with additional logging of accesses   o 3.9.39  LRANGE - extract a range of elements from the given list   o 3.9.40  LS - list the files in a directory   o 3.9.41  LSEARCH - find the a pattern in a list   o 3.9.42  LSUBSTR - replace a character in all elements of a list   o 3.9.43  MEM - memory mapped file (system)   o 3.9.44  MDEBUG - exercise and display statistics about memory allocation   o 3.9.45  MKDIR - create directory (or directories)   o 3.9.46  MKDISKFS - script to create a disk filesystem   o 3.9.47  MKHOSTFS - script to create a host port filesystem   o 3.9.48  MKRAID - script to create a raid given a line of output of rconf   o 3.9.49  MKRAIDFS - script to create a raid filesystem   o 3.9.50  MKSMON - script to start the scsi monitor daemon smon   o 3.9.51  MKSTARGD - script to initialize a scsi target daemon for a given raid set   o 3.9.52  MSTARGD - monitor for stargd   o 3.9.53  NICE - Change the K9 run-queue priority of a K9 process   o 3.9.54  NULL- file to throw away output in   o 3.9.55  PARACC - display information about hardware parity accelerator   o 3.9.56  PEDIT - Display/modify SCSI backend Mode Parameters Pages   o 3.9.57  PIPE - two way interprocess communication   o 3.9.58  PRANKS - print or set the accessible backend ranks for the current controller   o 3.9.59  PRINTENV - print one or all GLOBAL environment variables   o 3.9.60  PS - report process status   o 3.9.61  PSCSIRES - print SCSI-2 reservation table for all or specific monikers   o 3.9.62  PSTATUS - print the values of hardware status registers   o 3.9.63  RAIDACTION- script to gather/reset stats or stop/start a raid set's stargd   o 3.9.64  RAID0 - raid 0 device   o 3.9.65  RAID1 - raid 1 device   o 3.9.66  RAID3 - raid 3 device   o 3.9.67  RAID4 - raid 4 device   o 3.9.68  RAID5 - raid 5 device   o 3.9.69  RAM - ram based file system   o 3.9.70  RANDIO - simulate random reads and writes   o 3.9.71  RCONF, SPOOL, HCONF, MCONF, CORRUPT-CONFIG - raid configuration and spares management   o 3.9.72  REBOOT - exit K9 on target hardware + return to monitor   o 3.9.73  REBUILD - raid set reconstruction utility   o 3.9.74  REPAIR - script to allocate a spare to a raid set's failed backend   o 3.9.75  REPLACE - script to restore a backend in a raid set   o 3.9.76  RM - remove the file (or files)   o 3.9.77  RMON - Power-On Diagnostics and Bootstrap   o 3.9.78  RRSTRACE - disassemble scsihpmtr monitor data   o 3.9.79  RSIZE - estimate the memory usage for a given raid set   o 3.9.80  SCN2681 - access a scn2681 (serial IO device) as console   o 3.9.81  SCSICHIPS - print various details about a controller's scsi chips   o 3.9.82  SCSIHD - SCSI hard disk device (a SCSI initiator)   o 3.9.83  SCSIHP - SCSI target device   o 3.9.84  SET - set (or clear) an environment variable   o 3.9.85  SCSIHPMTR - turn on host port debugging   o 3.9.86  SETENV - set a GLOBAL environment variable   o 3.9.87  SDLIST - Set or display an internal list of attached disk drives   o 3.9.88  SETIV - set an internal RaidRunner variable   o 3.9.89  SHOWBAT - display information about battery backed-up ram   o 3.9.90  SHUTDOWN - script to place the RaidRunner into a shutdown or quiescent state   o 3.9.91  SLEEP - sleep for the given number of seconds   o 3.9.92  SMON - RaidRunner SCSI monitor daemon   o 3.9.93  SOS - pulse the buzzer to emit sos's   o 3.9.94  SPEEDTST - Generate a set number of sequential writes then reads   o 3.9.95  SPIND - Spin up or down a disk device   o 3.9.96  SPINDLE - Modify Spindle Synchronization on a disk device   o 3.9.97  SRANKS - set the accessible backend ranks for a controller   o 3.9.98  STARGD - daemon for SCSI-2 target   o 3.9.99  STAT - get status information on the named files (or stdin)   o 3.9.100     STATS - Print cumulative performance statistics on a Raid Set or Cache Range   o 3.9.101     STRING - perform a string operation on a given value   o 3.9.102     SUFFIX - Suffixes permitted on some big decimal numbers   o 3.9.103     SYSLOG - device to send system messages for logging   o 3.9.104     SYSLOGD - initialize or access messages in the system log area   o 3.9.105     TEST - condition evaluation command   o 3.9.106     TIME - Print the number of seconds since boot (or reset of clock)   o 3.9.107     TRAP - intercept a signal and perform some action   o 3.9.108     TRUE - returns the K9 true status   o 3.9.109     STTY or TTY - print the user's terminal mount point or terminfo status   o 3.9.110     UNSET - delete one or more environment variables   o 3.9.111     UNSETENV - unset (delete) a GLOBAL environment variable   o 3.9.112     VERSION - print out the version of the RaidRunner kernel   o 3.9.113     WAIT - wait for a process (or my children) to terminate   o 3.9.114     WARBLE - periodically pulse the buzzer   o 3.9.115     XD- dump given file(s) in hexa-decimal to standard out   o 3.9.116     ZAP - write zeros to a file   o 3.9.117     ZCACHE - Manipulate the zone optimization IO table of a Raid Set's cache   o 3.9.118     ZERO - file when read yields zeros continuously   o 3.9.119     ZLABELS - Write zeros to the front and end of Raid Sets   + 3.10  Advanced Topics: SCSI Monitor Daemon (SMON)   + 3.11  Further Reading Chapter 1  Preliminaries This document describes how to install, configure, and maintain a hardware RAID built around the 5070 SBUS host based RAID controller by Antares Microsystems. Other topics of discussion include RAID levels, the 5070 controller GUI, and 5070 command line. A complete command reference for the 5070's K9 kernel and Bourne-like shell is included. 1.1  Preamble Copyright 2000 by Thomas D. Coates, Jr. This document's source is licensed under the terms if the GNU general public license agreement. Permission to use, copy, modify, and distribute this document without fee for any purpose commercial or non-commercial is hereby granted, provided that the author's names and this notice appear in all copies and/or supporting documents; and that the location where a freely available unmodified version of this document may be obtained is given. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, either expressed or implied. While every effort has been taken to ensure the accuracy of the information documented herein, the author(s)/editor(s)/maintainer(s)/ contributor(s) assumes NO RESPONSIBILITY for any errors, or for any damages, direct or consequential, as a result of the use of the information documented herein. A complete copy of the GNU Public License agreement may be obtained from: Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Portions of this document are adapted and/or re-printed from the 5070 installation guide and man pages with permission of Antares Microsystems, Inc., Campbell CA. 1.2  Acknowledgements and Thanks   * Carl and Jim at Antares for the hardware, man pages, and other support/ contributions they provided during the writing of this document.   * Penn State University - Hershey Medical Center, Department of Radiology, Section of Clinical Image Management (My home away from my home away from home).   * The software-raid-HOWTO Copyright 1997 by Linas Vepstas under the GNU public license agreement. The software-raid-HOWTO is Available from : http://www.linuxdoc.org 1.3  New Versions   * The location of the most recent version of this document is posted on my homepage: http://members.iglou.com/tcoates/   * Other versions may be found in different formats at the LDP homepage: http://www.linuxdoc.org and mirror sites. Chapter 2  Introduction The Antares 5070 is a high performance, versatile, yet relatively inexpensive host based RAID controller. Its embedded operating system (K9 kernel) is modelled on the Plan 9 operating system whose design is discussed in several papers from AT&T (see the "Further Reading" section). K9 is a kernel targeted at embedded controllers of small to medium complexity (e.g. ISDN-ethernet bridges, RAID controllers, etc). It supports multiple lightweight processes (i.e. without memory management) on a single CPU with a non-preemptive scheduler. Device driver architecture is based on Plan 9 (and Unix SVR4) streams. Concurrency control mechanisms include semaphores and signals. The 5070 has three single ended ultra 1 SCSI channels and two onboard serial interfaces one of which provides command line access via a connected serial terminal or modem. The other is used to upgrade the firmware. The command line is robust, implementing many of the essential Unix commands (e.g. dd, ls, cat, etc.) and a scaled down Bourne shell for scripting. The Unix command set is augmented with RAID specific configuration commands and scripts. In addition to the command line interface an ASCII text based GUI is provided to permit easy configuration of level 0, 1, 3, 4, and 5 RAIDs. 2.1  5070 features5070 Main Features   * RAID levels 0, 1, 3, 4, and 5 are supported.   * Text based GUI for easy configuration for all supported RAID levels.   * A Multidisk RAID volume appears as an individual SCSI drive to the operating system and can be managed with the standard utilities (fdisk, mkfs, fsck,etc.). RAID Volumes may be assigned to different SCSI IDs or the same SCSI IDs but different LUNs.   * No special RAID drivers required for the host operating system.   * Multiple RAID volumes of different levels can be mixed among the drives forming the physical plant. For example in a hypothetical drive plant consisting of 9 drives:   + 2 drives form a level 3 RAID assigned to SCSI ID 5, LUN 0   + 2 drives form a level 0 RAID assigned to SCSI ID 5, LUN 1   + 5 drives form a level 5 RAID assigned to SCSI ID 6, LUN 0   * Three single ended SCSI channels which can accommodate 6 drives each (18 drives total).   * Two serial interfaces. The first permits configuration/control/monitoring of the RAID from a local serial terminal. The second serial port is used to upload new programming into the 5070 (using PPP and TFTP).   * Robust Unix-like command line and NVRAM based file system.   * Configurable ASCII SCSI communication channel for passing commands to the 5070's command line interpreter. Allows programming running on host OS to directly configure/control/monitor all parameters of the 5070. 2.2  Background Much of the information/knowledge pertaining to RAID levels in this section is adapted from the software-raid-HOWTO by Linas Vepstas . See the acknowledgements section for the URL where the full document may be obtained. RAID is an acronym for "Redundant Array of Inexpensive Disks" and is used to create large, reliable disk storage systems out of individual hard disk drives. There are two basic ways of implementing a RAID, software or hardware. The main advantage of a software RAID is low cost. However, since the OS of the host system must manage the RAID directly there is a substantial penalty in performance. Furthermore if the RAID is also the boot device, a drive failure could prove disastrous since the operating system and utility software needed to perform the recovery is located on the RAID. The primary advantages of hardware RAID is performance and improved reliability. Since all RAID operations are handled by a dedicated CPU on the controller, the host system's CPU is never bothered with RAID related tasks. In fact the host OS is completely oblivious to the fact that its SCSI drives are really virtual RAID drives. When a drive fails on the 5070 it can be replaced on-the-fly with a drive from the spares pool and its data reconstructed without the host's OS ever knowing anything has happened. 2.2.1  RAID levelsRaid Levels The different RAID levels have different performance, redundancy, storage capacity, reliability and cost characteristics. Most, but not all levels of RAID offer redundancy against drive failure. There are many different levels of RAID which have been defined by various vendors and researchers. The following describes the first 7 RAID levels in the context of the Antares 5070 hardware RAID implementation. 2.2.2  RAID linearRAID Linear RAID-linear is a simple concatenation of drives to create a larger virtual drive. It is handy if you have a number small drives, and wish to create a single, large drive. This concatenation offers no redundancy, and in fact decreases the overall reliability: if any one drive fails, the combined drive will fail. SUMMARY   * Enables construction of a large virtual drive from a number of smaller drives   * No protection, less reliable than a single drive   * RAID 0 is a better choice due to better I/O performance 2.2.3  RAID 1Level 1 Also referred to as "mirroring". Two (or more) drives, all of the same size, each store an exact copy of all data, disk-block by disk-block. Mirroring gives strong protection against drive failure: if one drive fails, there is another with the an exact copy of the same data. Mirroring can also help improve performance in I/O-laden systems, as read requests can be divided up between several drives. Unfortunately, mirroring is also one of the least efficient in terms of storage: two mirrored drives can store no more data than a single drive. SUMMARY   * Good read/write performance   * Inefficient use of storage space (half the total space available for data)   * RAID 6 may be a better choice due to better I/O performance. 2.2.4  stripingStriping Striping is the underlying concept behind all of the other RAID levels. A stripe is a contiguous sequence of disk blocks. A stripe may be as short as a single disk block, or may consist of thousands. The RAID drivers split up their component drives into stripes; the different RAID levels differ in how they organize the stripes, and what data they put in them. The interplay between the size of the stripes, the typical size of files in the file system, and their location on the drive is what determines the overall performance of the RAID subsystem. 2.2.5  RAID 0Level 0 Similar to RAID-linear, except that the component drives are divided into stripes and then interleaved. Like RAID-linear, the result is a single larger virtual drive. Also like RAID-linear, it offers no redundancy, and therefore decreases overall reliability: a single drive failure will knock out the whole thing. However, the 5070 hardware RAID 0 is the fastest of any of the schemes listed here. SUMMARY:   * Use RAID 0 to combine smaller drives into one large virtual drive.   * Best Read/Write performance of all the schemes listed here.   * No protection from drive failure.   * ADVICE: Buy very reliable hard disk drives if you plan to use this scheme. 2.2.6  RAID 2RAID 3Level 2 and 3 RAID-2 is seldom used anymore, and to some degree has been made obsolete by modern hard disk technology. RAID-2 is similar to RAID-4, but stores ECC information instead of parity. Since all modern disk drives incorporate ECC under the covers, this offers little additional protection. RAID-2 can offer greater data consistency if power is lost during a write; however, battery backup and a clean shutdown can offer the same benefits. RAID-3 is similar to RAID-4, except that it uses the smallest possible stripe size. SUMMARY   * RAID 2 is largely obsolete   * Use RAID 3 to combine separate drives together into one large virtual drive.   * Protection against single drive failure,   * Good read/write performance. 2.2.7  RAID 4Level 4 RAID-4 interleaves stripes like RAID-0, but it requires an additional drive to store parity information. The parity is used to offer redundancy: if any one of the drives fail, the data on the remaining drives can be used to reconstruct the data that was on the failed drive. Given N data disks, and one parity disk, the parity stripe is computed by taking one stripe from each of the data disks, and XOR'ing them together. Thus, the storage capacity of a an (N+1)-disk RAID-4 array is N, which is a lot better than mirroring (N+1) drives, and is almost as good as a RAID-0 setup for large N. Note that for N= 1, where there is one data disk, and one parity disk, RAID-4 is a lot like mirroring, in that each of the two disks is a copy of each other. However, RAID-4 does NOT offer the read-performance of mirroring, and offers considerably degraded write performance. In brief, this is because updating the parity requires a read of the old parity, before the new parity can be calculated and written out. In an environment with lots of writes, the parity disk can become a bottleneck, as each write must access the parity disk. SUMMARY   * Similar to RAID 0   * Protection against single drive failure.   * Poorer I/O performance than RAID 3   * Less of the combined storage space is available for data [than RAID 3] since an additional drive is needed for parity information. 2.2.8  RAID 5Level 5 RAID-5 avoids the write-bottleneck of RAID-4 by alternately storing the parity stripe on each of the drives. However, write performance is still not as good as for mirroring, as the parity stripe must still be read and XOR'ed before it is written. Read performance is also not as good as it is for mirroring, as, after all, there is only one copy of the data, not two or more. RAID-5's principle advantage over mirroring is that it offers redundancy and protection against single-drive failure, while offering far more storage capacity when used with three or more drives. SUMMARY   * Use RAID 5 if you need to make the best use of your available storage space while gaining protection against single drive failure.   * Slower I/O performance than RAID 3 Chapter 3  Installation NOTE: The installation procedure given here for the SBUS controller is similar to that found in the manual. It has been modified so minor variations in the SPARCLinux installation may be included. 3.1  compatibilitySBUS Controller Compatibility The 5070 / Linux 2.2 combination was tested on SPARCstation (5, 10, & 20), Ultra 1, and Ultra 2 Creator. The 5070 was also tested on Linux with Symmetrical Multiprocessing (SMP) support on a dual processor Ultra 2 creator 3D with no problems. Other 5070 / Linux / hardware combinations may work as well. 3.2  hardware installationHardware Installation Procedure If your system is already up and running, you must halt the operating system. GNOME: 1. From the login screen right click the "Options" button. 2. On the popup menu select System -> Halt. 3. Click "Yes" when the verification box appears KDE: 1. From the login screen right click shutdown. 2. On the popup menu select shutdown by right clicking its radio button. 3. Click OK XDM: 1. login as root 2. Left click on the desktop to bring up the pop-up menu 3. select "New Shell" 4. When the shell opens type "halt" at the prompt and press return Console Login (systems without X windows): 1. Login as root 2. Type "halt" All Systems: Wait for the message "power down" or "system halted" before proceeding. Turn off your SPARCstation system (Note: Your system may have turned itself off following the power down directive), its video monitor, external disk expansion boxes, and any other peripherals connected to the system. Be sure to check that the green power LED on the front of the system enclosure is not lit and that the fans inside the system are not running. Do not disconnect the system power cord. SPARCstation 4, 5, 10, 20 & UltraSPARC Systems: 1. Remove the top cover on the CPU enclosure. On a SPARCstation 10, this is done by loosening the captive screw at the top right corner of the back of the CPU enclosure, then tilting the top of the enclosure forward while using a Phillips screwdriver to press the plastic tab on the top left corner. 2. Decide which SBUS slot you will use. Any slot will do. Remove the filler panel for that slot by removing the two screws and rectangular washers that hold it in. 3. Remove the SBUS retainer (commonly called the handle) by pressing outward on one leg of the retainer while pulling it out of the hole in the printed circuit board. 4. Insert the board into the SBUS slot you have chosen. To insert the board, first engage the top of the 5070 RAIDium backpanel into the backpanel of the CPU enclosure, then rotate the board into a level position and mate the SBUS connectors. Make sure that the SBUS connectors are completely engaged. 5. Snap the nylon board retainers inside the SPARCstation over the 5070 RAIDium board to secure it inside the system. 6. Secure the 5070 RAIDium SBUS backpanel to the system by replacing the rectangular washers and screws that held the original filler panel in place. 7. Replace the top cover by first mating the plastic hooks on the front of the cover to the chassis, then rotating the cover down over the unit until the plastic tab in back snaps into place. Tighten the captive screw on the upper right corner. Ultra Enterprise Servers, SPARCserver 1000 & 2000 Systems, SPARCserver 6XO MP Series: 1. Remove the two Allen screws that secure the CPU board to the card cage. These are located at each end of the CPU board backpanel. 2. Remove the CPU board from the enclosure and place it on a static-free surface. 3. Decide which SBUS slot you will use. Any slot will do. Remove the filler panel for that slot by removing the two screws and rectangular washers that hold it in. Save these screws and washers. 4. Remove the SBUS retainer (commonly called the handle) by pressing outward on one leg of the retainer while pulling it out of the hole in the printed circuit board. 5. Insert the board into the SBUS slot you have chosen. To insert the board, first engage the top of the 5070 RAIDium backpanel into the backpanel of the CPU enclosure, then rotate the board into a level position and mate the SBUS connectors. Make sure that the SBUS connectors are completely engaged. 6. Secure the 5070 RAIDium board to the CPU board with the nylon screws and standoffs provided on the CPU board. The standoffs may have to be moved so that they match the holes used by the SBUS retainer, as the standoffs are used in different holes for an MBus module. Replace the screws and rectangular washers that originally held the filler panel in place, securing the 5070 RAIDium SBus backpanel to the system enclosure. 7. Re-insert the CPU board into the CPU enclosure and re-install the Allen-head retaining screws that secure the CPU board. All Systems: 1. Mate the external cable adapter box to the 5070 RAIDium and gently tighten the two screws that extend through the cable adapter box. 2. Connect the three cables from your SCSI devices to the three 68-pin SCSI-3 connectors on the Antares 5070 RAIDium. The three SCSI cables must always be reconnected in the same order after a RAID set has been established, so you should clearly mark the cables and disk enclosures for future disassembly and reassembly. 3. Configure the attached SCSI devices to use SCSI target IDs other than 7, as that is taken by the 5070 RAIDium itself. Configuring the target number is done differently on various devices. Consult the manufacturer's installation instructions to determine the method appropriate for your device. 4. As you are likely to be installing multiple SCSI devices, make sure that all SCSI buses are properly terminated. This means a terminator is installed only at each end of each SCSI bus daisy chain. Verifying the Hardware Installation: These steps are optional but recommended. First, power-on your system and interrupt the booting process by pressing the "Stop" and "a" keys (or the "break" key if you are on a serial terminal) simultaneously as soon as the Solaris release number is shown on the screen. This will force the system to run the Forth Monitor in the system EPROM, which will display the "ok" prompt. This gives you access to many useful low-level commands, including: ok show-devs . . . /iommu@f,e0000000/sbus@f,e000100SUNW, isp@1,8800000 . . . The first line in the response shown above means that the 5070 RAIDium host adapter has been properly recognized. If you don't see a line like this, you may have a hardware problem. Next, to see a listing of all the SCSI devices in your system, you can use the probe-scsi-all command, but first you must prepare your system as follows: ok setenv auto-boot? False ok reset ok probe-scsi-all This will tell you the type, target number, and logical unit number of every SCSI device recognized in your system. The 5070 RAIDium board will report itself attached to an ISP controller at target 0 with two Logical Unit Numbers (LUNs): 0 for the virtual hard disk drive, and 7 for the connection to the Graphical User Interface (GUI). Note: the GUI communication channel on LUN 7 is currently unused under Linux. See the discussion under "SCSI Monitor Daemon (SMON)" in the "Advanced Topics" section for more information. REQUIRED: Perform a reconfiguration boot of the operating system: ok boot -r If no image appears on your screen within a minute, you most likely have a hardware installation problem. In this case, go back and check each step of the installation procedure. This completes the hardware installation procedure. 3.2.1  Serial Terminal If you have a serial terminal at your disposal (e.g. DEC-VT420) it may be connected to the controller's serial port using a 9 pin DIN male to DB25 male serial cable. Otherwise you will need to supplement the above cable with a null modem adapter to connect the RAID controller's serial port to the serial port on either the host computer or a PC. The terminal emulators I have successfully used include Minicom (on Linux), Kermit (on Caldera's Dr. DOS), and Hyperterminal (on a windows CE palmtop), however, any decent terminal emulation software should work. The basic settings are 9600 baud , no parity, 8 data bits, and 1 stop bit. 3.2.2  Hard Drive Plant Choosing the brand and capacity of the drives that will form the hard drive physical plant is up to you. I do have some recommendations:   * Remember, you generally get what you pay for. I strongly recommend paying the extra money for better (i.e. more reliable) hardware especially if you are setting up a RAID for a mission critical project. For example, consider purchasing drive cabinets with redundant hot-swappable power supplies, etc.   * You will also want a UPS for your host system and drive cabinets. Remember, RAID levels 3 and 5 protect you from data loss due to drive failure NOT power failure.   * The drive cabinet you select should have hot swappable drive bays, these cost more but are definitely worth it when you need to add/change drives.   * Make sure the cabinet(s) have adequate cooling when fully loaded with drives.   * Keep your SCSI cables (internal and external) as short as possible   * Mark the drives/cabinet(s) in such a way that you will be able to reconnect them to the controller in their original configuration. Once the RAID is configured you cannot re-organize you drives without re-configuring the RAID (and subsequently erasing the data stored on it).   * Keep in mind that although it is physically possible to connect/configure up to 6 drives per channel, performance will sharply decrease for RAIDs with more than three drives per channel. This is due to the 25 MHz bandwidth limitation of the SBUS. Therefore, if read/write performance is an issue go with a small number of large drives. If you need a really large RAID (~ 1 terabyte) then you will have no other choice but to load the channels to capacity and pay the performance penalty. NOTE: if you are serving files over a 10/100 Base T network you may not notice the performance decrease since the network is usually the bottleneck not the SBUS. 3.3  5070 Onboard Configuration Before diving into the RAID configuration I need to define a few terms.   * "RaidRunner" is the name given to the the 5070 controller board.   * "Husky" is the name given to the shell which produces the ":raid;" command prompt. It is a command language interpreter that executes commands read from the standard input or from a file. Husky is a scaled down model of Unix's Bourne shell (sh). One major difference is that husky has no concept of current working directory. For more information on the husky shell and command prompt see the "Advanced Topics" section   * The "host port" is the SCSI ID assigned to the controller card itself. This is usually ID 7.   * A "backend" is a drive attached to the controller on a given channel.   * A "rank" is a collection of all the backends from each channel with the same SCSI ID (i.e. rank 0 would consist of all the drives with SCSI ID 0 on each channel)   * Each of the backends is identified by a three digit number where the first digit is the channel, the second the SCSI ID of the drive, and the third the LUN of the drive. The numbers are separated by a period. The identifier is prefixed with a "D" if it is a disk or "T" if it is a tape (e.g. D0.1.0). This scheme is referred to as in the following documentation.   * A "RAID set" consists of given number of backends (there are certain requirements which I'll come to later)   * A "spare" is a drive which is unused until there is a failure in one of the RAID drives. At that time the damaged drive is automatically taken offline and replaced with the spare. The data is then reconstructed on the spare and the RAID resumes normal operation.   * Spares may either be "hot" or "warm" depending on user configuration. Hot spares are spun up when the RAID is started, which shortens the replacement time when a drive failure occurs. Warm spares are spun up when needed, which saves wear on the drive. The test based GUI can be started by typing "agui" : raid; agui  at the husky prompt on the serial terminal (or emulator). Agui is a simple ASCII based GUI that can be run on the RaidRunner console port which enables one to configure the RaidRunner. The only argument agui takes is the terminal type that is connected to the RaidRunner console. Current supported terminals are dtterm, vt100 and xterm. The default is dtterm. Each agui screen is split into two areas, data and menu. The data area, which generally uses all but the last line of the screen, displays the details of the information under consideration. The menu area, which generally is the bottom line of the screen, displays a strip menu with a title then list of options or sub-menus. Each option has one character enclosed in square brackets (e.g. [Q]uit) which is the character to type to select that option. Each menu line allows you to refresh the screen data (in case another process on the RaidRunner writes to the console). The refresh character may also be used during data entry if the screen is overwritten. The refresh character is either or . When agui starts, it reads the configuration of the RaidRunner and probes for every possible backend. As it probes for each backend, it's "name" is displayed in the bottom left corner of the screen. 3.3.1  Main Screen Options The Main screen (Figure 3.1) is the first screen displayed. It provides a summary of the RaidRunner configuration. At the top is the RaidRunner model, version and serial number. Next is a line displaying, for each controller, the SCSI ID's for each host port (labeled A, B, C, etc) and total and currently available amounts of memory. The next set of lines display the ranks of devices on the RaidRunner. Each device follows the nomenclature of < device_type_c.s.l> where device_type_ can be D for disk or T for tape, c is the internal channel the device is attached to, s is the SCSI ID (Rank) of the device on that channel, and l is the SCSI LUN of the device (typically 0). --------------------------------------------------------- [antares-RAID-SparcLinux-HOWTO001] Figure 3.1: The main screen of the 5070 onboard configuration utility --------------------------------------------------------- The next set of lines provide a summary of the Raid Sets configured on the RaidRunner. The summary includes the raid set name, it's type, it's size, the amount of cache allocated to it and a comma separated list of it's backends. See rconf in the "Advanced Topics" section for a full description of the above. Next the spare devices are configured. Each spare is named (device_type_c.s.l format), followed by it's size (in 512-byte blocks), it's spin state (Hot or Warm), it's controller allocation , and finally it's current status (Used/ Unused, Faulty/Working). If used, the raid set that uses it is nominated. At the bottom of the data area, the number of controllers, channels, ranks and devices are displayed. The menu line allows one to quit agui or select further actions or sub-menus.   * [Q]uit: Exit the main screen and return to the husky prompt.   * [R]aidSets: Enter the RaidSet configuration screen.   * [H]ostports Enter the Host Port configuration screen.   * [S]pares Enter the Spare Device configuration screen.   * [M]onitor Enter the SCSI Monitor configuration screen.   * [G]eneral Enter the General configuration/information screen.   * [P]robe Re-probe the device backends on the RaidRunner. As each backend is probed it's "name" (c.s.l format) is displayed in the bottom left corner of the screen. These selections are described in detail below. 3.3.2  [Q]uit Exit the agui main screen and return to the husky ( :raid; ) prompt. 3.3.3  [R]aidSets: The Raid Set Configuration screen (Figure 3.2) displays a Raid Set in the data area and provides a menu which allows you to Add, Delete, Modify, Install (changes) and Scroll through all other raid sets (First, Last, Next and Previous). If no raid sets have been configured, only the screen title and menu is displayed. All attributes of the raid set are displayed. For information on each attribute of the raid set, see the rconf command in the "Advanced Topics" section. The menu line allows one to leave the Raid Set Configuration screen or select further actions. --------------------------------------------------------- [antares-RAID-SparcLinux-HOWTO002] Figure 3.2: The RAIDSet configuration screen. ---------------------------------------------------------   * [Q]uit: Exit the Raid Set Configuration screen and return to the Main screen. If you have modified, deleted or added a raid set and have not installed the changes you will be asked to confirm this. If you select Yes to continue the exit, all changes made since the last install action will be discarded.   * [I]nst: This action installs (into the RaidRunner configuration area) any changes that may have been made to raid sets, be that deletion, addition or modification. If you exit prior to installing, all changes made since the last installation will be discarded. The installation process takes time. It is complete once the typed "i" character, is cleared from the menu line.   * [M]od: This action allows you to modify the displayed raid set. You will be prompted for each Raid Set attribute that can be changed. The prompt includes allowable options or formats required. If you don't wish to change a particular attribute, then press the RETURN or TAB key. The attributes you can change are the raid set name, I/O mode, status (Active to Inactive), bootmode, spares usage, backend zone table usage, IO size (if raid set has never been used - i.e. just added), cache size, I/O queues length, host interfaces and additional stargd arguments. If you wish to change a single attribute then use the RETURN or TAB key to skip all other options. The changed attribute will be re-displayed as soon as you press the RETURN key. When specifying cache size, you may suffix the number with 'm' or 'M' to indicate the number is in Megabytes or with 'k' or 'K' to indicate the number is in Kilobytes. Note you can only enter whole integer values. When specifying io size, you may suffix the number with 'k' or 'K' to indicate the number is in Kilobytes. When you enter data, it is checked for correctness and if incorrect, a message is displayed and all changes are discarded and you will have to start again. Remember you must install ([I]nst.) any changes.   * [A]dd: When this option is selected you will be prompted for various attributes of the new raid set. These attributes are the raid set name, the raid set type, the initial host interface the raid set is to appear on (in c.h.l format where c is the controller number, h is the host port (0, 1, 2 etc) and l is the SCSI LUN) and finally a list of backends. When backends are to be entered, the screen displays a list of available backends, each with a numeric index (commencing at 0). You select each backend by entering the index and once complete enter q for Quit. As each backend index is entered, it's backend name is displayed in a comma separated list. When you enter data, it is checked for correctness and if incorrect, a message is displayed and the addition will be ignored and you will have to start again. Once the backends are complete, the newly created raid set will be displayed on the screen with supplied and default attributes. You can then modify the raid set to change other attributes. Remember you must install ([I]nst.) any new raid sets.   * [D]elete: This action will delete the currently displayed raid set. If this raid set is Active, then you will not be allowed to delete it. You will have to make it Inactive (via the [M]od. option) then delete it. You will be prompted to confirm the deletion. Once you confirm the deletion, the screen will be cleared and the next raid set will be displayed, if configured. Remember you must install ([I]nst.) any changes.   * [F]irst, [L]ast, [N]ext and [P]rev allow you to scroll through the configured raid sets. 3.3.4  [H]ostports: The Host Port Configuration screen (Figure 3.3) displays for each controller, each host port (labelled A, B, C, etc for port number 0, 1, 2, etc) and the assigned SCSI ID. If the RaidRunner you use, has external switches for host port SCSI ID selection, you may only exit ([Q]uit) from this screen. If the RaidRunner you use, does NOT have external switches for host port SCSI ID selection, then you may modify (and hence install) the SCSI ID for any host port. The menu line allows one to leave the Host Port Configuration screen or select further actions (if NO external host). --------------------------------------------------------- [antares-RAID-SparcLinux-HOWTO003] Figure 3.3: The host port configuration screen. ---------------------------------------------------------   * [Q]uit: Exit the Host Port Configuration screen and return to the Main screen. If you have modified a host port SCSI ID assignment and have not installed the changes you will be asked to confirm this. If you select Yes to continue the exit, all changes made since the last install action will be discarded.   * [I]nstall: This action installs (into the RaidRunner configuration area) any changes that may have been made to host port SCSI ID assign­ ments. If you exit prior to installing, all changes made since the last installation will be discarded. The installation process takes time. It is complete once the typed "i" character, is cleared from the menu line.   * [M]odify: This action allows you to modify the host port SCSI ID assignments for each host port on each controller (if NO external host port SCSI ID switches). You will be prompted for the SCSI ID for each host port. You can enter either a SCSI ID (0 thru 15), the minus "-" character to clear the SCSI ID assignment or RETURN to SKIP. As you enter data, it is checked for correctness and if incorrect, a message will be printed although previously correctly entered data will be retained. Remember you must install ([I]nst.) any changes. 3.3.5  [S]pares: The Spare Device Configuration screen (Figure 3.4) displays all configured spare devices in the data area and provides a menu which allows you to Add, Delete, Mod­ ify and Install (changes) spare devices. If no spare devices have been configured, only the screen title and menu is displayed. Each spare device displayed, shows it's name (in device_type_c.s.l format), it's size in 512-byte blocks, it's spin status (Hot or Warm), it's controller allocation, finally it's current status (Used/Unused, Faulty/Working). If used, the raid set that uses it is nominated. For information on each attribute of a spare device, see the rconf command in the "Advanced Topics" section. The menu line allows one to leave the Spare Device Configuration screen or select further actions. --------------------------------------------------------- [antares-RAID-SparcLinux-HOWTO004] Figure 3.4: The spare device configuration screen. ---------------------------------------------------------   * [Q]uit: Exit the Spare Device Configuration screen and return to the Main screen. If you have modified, deleted or added a spare device and have not installed the changes you will be asked to confirm this. If you select Yes to continue the exit, all changes made since the last install action will be discarded.   * [I]nstall: This action installs (into the RaidRunner configuration area) any changes that may have been made to the spare devices, be that deletion, addition or modification. If you exit prior to installing, all changes made since the last installation will be discarded. The installation process takes time. It is complete once the typed "i" character, is cleared from the menu line.   * [M]odify: This action allows you to modify the unused spare devices. You will be prompted for each spare device attribute that can be changed. The prompt includes allowable options or formats required. If you don't wish to change a particular attribute, then press the RETURN key. The attributes you can change are the new size (in 512-byte blocks), the spin state (H or hot or W for Warm), and the controller allocation (A for any, 0 for controller 0, 1 for controller 1, etc). If you wish to change a single attribute of a spare device, then use the RETURN key to skip all other attributes for each spare device. The changed attribute will not be re-displayed until the last prompted attribute is entered (or skipped). When you enter data, it is checked for cor­ rectness and if incorrect, a message is dis­ played and all changes are discarded and you will have to start again. Remember you must install ([I]nstall) any changes.   * [A]dd: When adding a spare device, the list of available devices is displayed and you are required to type in the device name. Once entered, the spare is added with defaults which you can change, if required, via the [M]odify option. Remember you must install ([I]nstall) any changes.   * [D]elete: When deleting a spare device, the list of spare devices allowed to be deleted is displayed and you are required to type in the required device name. Once entered, the spare is deleted from the screen. Remember you must install ([I]nstall) any changes. 3.3.6  [M]onitor: The SCSI Monitor Configuration screen (Figure 3.5) displays a table of SCSI monitors configured for the RaidRunner. Up to four SCSI monitors may be configured. The table columns are entitled Controller, Host Port, SCSI LUN and Protocol and each line of the table shows the appropriate SCSI Monitor attribute. For details on SCSI Monitor attributes, see the rconf command in the "Advanced Topics" section. The menu line allows one to leave the SCSI Monitor Configuration screen or modify and install the table. --------------------------------------------------------- [antares-RAID-SparcLinux-HOWTO005] Figure 3.5: The SCSI monitor configuration screen. ---------------------------------------------------------   * [Q]uit: Exit the SCSI Monitor Configuration screen and return to the Main screen. If you have made changes and have not installed them you will be asked to confirm this. If you select Yes to continue the exit, all changes made since the last install action will be discarded.   * [I]nstall: This action installs (into the RaidRunner configuration area) any changes that may have been made to SCSI Monitor configuration. If you exit prior to installing, all changes made since the last installation will be discarded. The installation process takes time. It is complete once the typed "i" character, is cleared from the menu line.   * [M]odify: This action allows you to modify the SCSI Monitor configuration. The cursor will be moved around the table, prompting you for input. If you do not want to change an attribute, enter RETURN to skip. If you want to delete a SCSI monitor then enter the minus "-" character when prompted for the controller number. If you want to use the default protocol list, then enter RETURN at the Protocol List prompt. As you enter data, it is checked for correctness and if incorrect, a message will be printed and any previously entered data is discarded. You will have to re-enter the data again. Remember you must install ([I]nstall) any changes. 3.3.7  [G]eneral: The General screen (Figure 3.6) has a blank data area and a menu which allows one to Quit and return to the main screen, or to select further sub-menus which provide information about Devices, the System Message Logger, Global Environment variables and throughput Statistics. --------------------------------------------------------- [antares-RAID-SparcLinux-HOWTO006] Figure 3.6: The General Screen. The options accessible from here allow you to view information on the attached devices (SCSI hard drives and tape units), browse the system logs, and examine environment variables. ---------------------------------------------------------   * [Q]uit: Exit the General screen and return to the Main screen.   * [D]evices: Enter the Device information screen (Figure 3.7). The Devices screen displays the name of all devices on the RaidRunner. The menu line allows one to Quit and return to the General screen or display information about the devices. ----------------------------------------------------- [antares-RAID-SparcLinux-HOWTO007] Figure 3.7: The device information screen. -----------------------------------------------------   + [Q]uit: Exit the Devices screen and return to the General screen.   + Device information[I]nformation: The Device Information screen (Figure 3.8) displays information about each device (Figure ). You can scroll through the devices. For disks, information displayed includes, the device name, serial number, vendor name, product id, speed, version, sector size, sector count, total device size in MB, number of cylinders, heads and sectors per track and the zone/notch partitions. The menu line allows one the leave the Device Information screen or browse through devices. ------------------------------------------------- [antares-RAID-SparcLinux-HOWTO008] Figure 3.8: Example of the information displayed for a hard drive device. -------------------------------------------------   o [Q]uit: Exit the Device Information screen and return to the Devices screen.   o [F]irst, [L]ast, [N]ext and [P]rev allow you to scroll through the devices and hence display their current data .   * System LogSys[L]og: Enter the System Logger Messages screen (Figure 3.9). ----------------------------------------------------- [antares-RAID-SparcLinux-HOWTO009] Figure 3.9: The system logger messages screen. An example message is shown, there is one message per screen. -----------------------------------------------------   + [Q]uit: Exit the System Logger Messages screen and return to the General screen.   + [F]irst, [L]ast, [N]ext and [P]rev allow you to scroll through the system log.   * Environment variable configuration[E]nvironment: Enter the Global Environment Variable configuration screen (Figure 3.10). The Environment Variable Configuration screen dis­ plays all configured Global Environment Variables and provides a menu which allows you to Add, Delete, Modify and Install (changes) variables. Each variable name is displayed followed by an equals "=" and the value assigned to that variable enclosed in braces - "{" .. "}". The menu line allows you to Quit and return to the General screen or select further actions. ----------------------------------------------------- [antares-RAID-SparcLinux-HOWTO010] Figure 3.10: The global environment variable configuration screen. -----------------------------------------------------   + [Q]uit: Exit the Environment Variable Configuration screen and return to the General screen. If you have modified, deleted or added an environment variable and have not installed the changes you will be asked to confirm this. If you select Yes to continue the exit, all changes made since the last install action will be discarded.   + [I]nst: This action installs (into the RaidRunner configuration area) any changes that may have been made to environment variables, be that deletion, addition or modification. If you exit prior to installing, all changes made since the last installation will be discarded. The installation process takes time. It is complete once the typed "i" character, is cleared from the menu line.   + [M]od: This action allows you to modify an environment variable's value. You will be prompted for the name of the environment variable and then prompted for it's new value. If the environment variable entered is not found, a message will be printed and you will not be prompted for a new value. If you do not enter a new value, (i.e. just press RETURN) no change will be made. Remember you must install ([I] nstall) any changes.   + [A]dd: When adding a new environment variable, you will be prompted for it's name and value. Providing the variable name is not already used and you enter a value, the new variable will be added and displayed. Remember you must install ([I]nstall) any changes.   + [D]elete: When deleting an environment variable, you will be prompted for the variable name and if valid, the environment variable will be deleted. Remember you must install ([I]nstall) any changes.   * Throughput statistics (viewing)[S]tats: Enter the Statistics monitoring screen (Figure 3.11). The Statistics screen display various general and specific statistics about raid sets configured and running on the RaidRunner. The first section of the data area displays the current temperature in degrees Celsius and the current speed of fans in the RaidRunner. The next section of the data area displays various statistics about the named raid set. The statistics are - the current cache hit rate, the cumulative number of reads, read failures, writes and write failures for each backend of the raid set and finally the read and write throughput for each stargd process (indicated by it's process id) that front's the raid set. The menu line allows one the leave the Statistics screen or select further actions. ----------------------------------------------------- [antares-RAID-SparcLinux-HOWTO011] Figure 3.11: The statistics monitoring screen -----------------------------------------------------   + [Q]uit: Exit the Statistics screen and return to the General screen.   + [F]irst, [L]ast, [N]ext and [P]rev allow you to scroll through the statistics.   + [R]efresh: This option will get the statistics for the given raid set and re-display the current statistics on the screen.   + [Z]ero: This option will zero the cumulative statistics for the currently displayed raid set.   + [C]ontinuous: This option will start a back­ ground process that will update the statistics of the currently displayed raid set every 2 seconds. A loop counter is created and updated every 2 seconds also. To inter­ rupt this continuous mode of gathering statistics, just press any character. If you need to re-fresh the display, then press the refresh characters - or . 3.3.8  [P]robe The probe option re-scans the SCSI channels and updates the backend list with the hardware it finds. 3.3.9  Example RAID Configuration Session The generalized procedure for configuration consists of three steps arranged in the following order: 1. Configuring the Host Port(s) 2. Assigning Spares 3. Configuring the RAID set Note that there is a minimum number of backends required for the various supported RAID levels:   * Level 0 : 2 backends   * Level 3 : 2 backends   * Level 5 : 3 backends In this example we will configure a RAID 5 using 6, 2.04 gigabyte drives. The total capacity of the virtual drive will be 10 gigabytes (the equivalent of one drive is used for redundancy). This same configuration procedure can be used to configure other levels of RAID sets by changing the type parameter. 1. Power on the computer with the serial terminal connected to the RaidRunner's serial port. 2. When the husky ( :raid; ) prompt appears, Start the GUI by typing "agui" and pressing return. 3. When the main screen appears, select "H" for [H]ostport configuration 4. On some models of RaidRunner the host port in not configurable. If you have only a [Q]uit option here then there is nothing further to be done for the host port configuration, note the values and skip to step 6. If you have add/modify options then your host port is software configurable. 5. If there is no entry for a host port on this screen, add an entry with the parameters: controller=0, hostport=0 , SCSI ID=0. Don't forget to [I] nstall your changes. If there is already and entry present, note the values (they will be used in a later step). 6. From this point onward I will assume the following hardware configuration: a. There are 7 - 2.04 gig drives connected as follows: i. 2 drives on SCSI channel 0 with SCSI IDs 0 and 1 (backends 0.0.0, and 0.1.0, respectively). ii. 3 drives on SCSI channel 1 with SCSI IDs 0 ,1 and 5 (backends 1.0.0, 1.1.0, and 1.5.0). iii. 2 drives on SCSI channel 2 with SCSI IDs 0 and 1 (backends 2.0.0 and 2.1.0). b. Therefore: i. Rank 0 consists of backends 0.0.0, 1.0.0, 2.0.0 ii. Rank 1 consists of backends 0.1.0, 1.1.0, 2.1.0 iii. Rank 5 contains only the backend 1.5.0 c. The RaidRunner is assigned to controller 0, hostport 0 7. Press Q to [Q]uit the hostports screen and return to the Main screen. 8. Press S to enter the [S]pares screen 9. Select A to [A]dd a new spare to the spares pool. A list of available backends will be displayed and you will be prompted for the following information: Enter the device name to add to spares - from above: enter D1.5.0 10. Select I to [I]nstall your changes 11. Select Q to [Q]uit the spares screen and return to the Main screen 12. Select R from the Main screen to enter the [R]aidsets screen. 13. Select A to [A]dd a new RAID set. You will be prompted for each of the RAID set parameters. The prompts and responses are given below. a. Enter the name of Raid Set: cim_homes (or whatever you want to call it). b. Raid set type [0,1,3,5]: 5 c. Enter initial host interface - ctlr,hostport,scsilun: 0.0.0 Now a list of the available backends will be displayed in the form: 0 - D0.0.0 1 - D1.0.0 2 - D2.0.0 3 - D0.1.0 4 - D1.1.0 5 - D2.1.0 d. Enter index from above - Q to Quit: 1 press return 2 press return 3 press return 4 press return 5 press return Q 14. After pressing Q you will be returned to the Raid Sets screen. You should see the newly configured Raid set displayed in the data area (Figure 3.12 ). 15. Press I to [I]nstall the changes ----------------------------------------------------- [antares-RAID-SparcLinux-HOWTO012] Figure 3.12: The RaidSets screen of the GUI showing the newly configured RAID 5 ----------------------------------------------------- 16. Press Q to exit the RaidSet screen and return to the the Main screen 17. Press Q to [Q]uit agui and exit to the husky prompt. 18. type "reboot" then press enter. This will reboot the RaidRunner (not the host machine.) 19. When the RaidRunner reboots it will prepare the drives for the newly configured RAID. NOTE: Depending on the size of the RAID this could take a few minutes to a few hours. For the above example it takes the 5070 approximately 10 - 20 minutes to stripe the RAID set. 20. Once you see the husky prompt again the RAID is ready for use. You can then proceed with the Linux configuration. 3.4  Linux Configuration These instructions cover setting up the virtual RAID drives on RedHat Linux 6.1. Setting it up under other Linux distributions should not be a problem. The same general instructions apply. If you are new to Linux you may want to consider installing Linux from scratch since the RedHat installer will do most of the configuration work for you. If so skip to section titled "New Linux Installation." Otherwise go to the "Existing Linux Installation" section (next). 3.4.1  Existing Linux Installation Follow these instructions if you already have Redhat Linux installed on your system and you do not want to re-install. If you are installing the RAID as part of a new RedHat Linux installation (or are re-installing) skip to the "New Linux Installation" section. QLogic SCSI Driver The driver can either be loaded as a module or compiled into your kernel. If you want to boot from the RAID then you may want to use a kernel with compiled in QLogic support (see the kernel-HOWTO available from http:// www.linuxdoc.org. To use the modular driver become the superuser and add the following lines to /etc/conf.modules: alias qlogicpti /lib/modules/preferred/scsi/qlogicpti  Change the above path to where ever your SCSI modules live. Then add the following line to you /etc/fstab (with the appropriate changes for device and mount point, see the fstab man page if you are unsure) /dev/sdc1 /home ext2 defaults 1 2 Or, if you prefer to use a SYSV initialization script, create a file called ``raid'' in the /etc/rc.d/init.d directory with the following contents (NOTE: while there are a few good reasons to start the RAID using a script, one of the aforementioned methods would be preferable): #!/bin/bash case "$1" in start) echo "Loading raid module" /sbin/modprobe qlogicpti echo echo "Checking and Mounting raid volumes..." mount -t ext2 -o check /dev/sdc1 /home touch /var/lock/subsys/raid ;; stop) echo "Unmounting raid volumes" umount /home echo "Removing raid module(s)" /sbin/rmmod qlogicpti rm -f /var/lock/subsys/raid echo ;; restart) $0 stop  $0 start  ;;  *) echo "Usage: raid {start|stop|restart}" exit 1 esac exit 0  You will need to edit this example and substitute your device name(s) in place of /dev/sdc1 and mount point(s) in place of /home. The next step is to make the script executable by root by doing: chmod 0700 /etc/rc.d/init.d/raid Now use your run level editor of choice (tksysv, ksysv, etc.) to add the script to the appropriate run level. Device mappings Linux uses dynamic device mappings you can determine if the drives were found by typing: more /proc/scsi/scsi one or more of the entries should look something like this: Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: ANTARES Model: CX106 Rev: 0109 Type: Direct-Access ANSI SCSI revision: 02 There may also be one which looks like this: Host: scsi1 Channel: 00 Id: 00 Lun: 07 Vendor: ANTARES Model: CX106-SMON Rev: 0109 Type: Direct-Access ANSI SCSI revision: 02 This is the SCSI monitor communications channel which is currently un-used under Linux (see SMON in the advanced topics section below). To locate the drives (following reboot) type: dmesg | more Locate the section of the boot messages pertaining to you SCSI devices. You should see something like this: qpti0: IRQ 53 SCSI ID 7 (Firmware v1.31.32)(Firmware 1.25 96/10/15) [Ultra Wide, using single ended interface] QPTI: Total of 1 PTI Qlogic/ISP hosts found, 1 actually in use. scsi1 : PTI Qlogic,ISP SBUS SCSI irq 53 regs at fd018000 PROM node ffd746e0 Which indicates that the SCSI controller was properly recognized, Below this look for the disk section: Vendor ANTARES Model: CX106 Rev: 0109 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sdc at scsi1, channel 0, id 0, lun 0 SCSI device sdc: hdwr sector= 512 bytes. Sectors= 20971200 [10239 MB] [10.2 GB] Note the line that reads "Detected scsi disk sdc ..." this tells you that this virtual disk has been mapped to device /dev/sdc. Following partitioning the first partition will be /dev/sdc1, the second will be /dev/sdc2, etc. There should be one of the above disk sections for each virtual disk that was detected. There may also be an entry like the following: Vendor ANTARES Model: CX106-SMON Rev: 0109 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sdd at scsi1, channel 0, id 0, lun 7 SCSI device sdd: hdwr sector= 512 bytes. Sectors= 20971200 [128 MB] [128.2 MB] BEWARE: this is not a drive DO NOT try to fdisk, mkfs, or mount it!! Doing so WILL hang your system. Partitioning A virtual drive appears to the host operating system as a large but otherwise ordinary SCSI drive. Partitioning is performed using fdisk or your favorite utility. You will have to give the virtual drive a disk label when fdisk is started. Using the choice ``Custom with autoprobed defaults'' seems to work well. See the man page for the given utility for details. Installing a filesystem Installing a filesystem is no different from any other SCSI drive: mkfs -t  /dev/ for example: mkfs -t ext2 /dev/sdc1 Mounting If QLogic SCSI support is compiled into you kernel OR you are loading the "qlogicpti" module at boot from /etc/conf.modules then add the following line (s) to the /etc/fstab: /dev/  ext2 defaults 1 1 If you are using a SystemV initialization script to load/unload the module you must mount/unmount the drives there as well. See the example script above. 3.4.2  New Linux Installation This is the easiest way to install the RAID since the RedHat installer program will do most of the work for you. 1. Configure the host port, RAID sets, and spares as outlined in "Onboard Configuration." Your computer must be on to perform this step since the 5070 is powered from the SBUS. It does not matter if the computer has an operating system installed at this point all we need is power to the controller card. 2. Begin the RedHat SparcLinux installation 3. The installation program will auto detect the 5070 controller and load the Qlogic driver 4. Your virtual RAID drives will appear as ordinary SCSI hard drives to be partitioned and formatted during the installation. NOTE: When using the graphical partitioning utility during the RedHat installation DO NOT designate any partition on the virtual drives as type RAID since they are already hardware managed virtual RAID drives. The RAID selection on the partitioning utilities screen is for setting up a software RAID. IMPORTANT NOTE: you may see a small SCSI drive ( usually ~128 MB) on the list of available drives. DO NOT select this drive for use. It is the SMON communication channel NOT a drive. If setup tries to use it it will hang the installer. 5. Thats it, the installation program takes care of everything else !! 3.5  Maintenance 3.5.1  spares, activatingActivating a spare When running a RAID 3 or 5 (if you configured one or more drives to be spares) the 5070 will detect when a drive goes offline and automatically select a spare from the spares pool to replace it. The data will be rebuilt on-the-fly. The RAID will continue operating normally during the re-construction process (i.e. it can be read from and written to just is if nothing has happened). When a backend fails you will see messages similar to the following displayed on the 5070 console: 930 secs: Redo:1:1 Retry:1 (DIO_cim_homes_D1.1.0_q1) CDB=28(Read_10)Re-/ Selection Time-out @682400+16 932 secs: Redo:1:1 Retry:2 (DIO_cim_homes_D1.1.0_q1) CDB=28(Read_10)Re-/ Selection Time-out @682400+16 933 secs: Redo:1:1 Retry:3 (DIO_cim_homes_D1.1.0_q1) CDB=28(Read_10)Re-/ Selection Time-out @682400+16 934 secs: CIO_cim_homes_q3 R5_W(3412000, 16): Pre-Read drive 4 (D1.1.0) fails with result "Re-/Selection Time-out" 934 secs: CIO_cim_homes_q2 R5: Drained alternate jobs for drive 4 (D1.1.0) 934 secs: CIO_cim_homes_q2 R5: Drained alternate jobs for drive 4 (D1.1.0) RPT 1/0 934 secs: CIO_cim_homes_q2 R5_W(524288, 16): Initial Pre-Read drive 4 (D1.1.0) fails with result "Re-/Selection Time-out" 935 secs: Redo:1:0 Retry:1 (DIO_cim_homes_D1.0.0_q1) CDB=28(Read_10)SCSI Bus ~Reset detected @210544+16 936 secs: Failed:1:1 Retry:0 (rconf) CDB=2A(Write_10)Re-/Selection Time-out @4194866+128 ... Then you will see the spare being pulled from the spares pool, spun up, tested, engaged, and the data reconstructed. 937 secs: autorepair pid=1149 /raid/cim_homes: Spinning up spare device 938 secs: autorepair pid=1149 /raid/cim_homes: Testing spare device/dev/ hd/1.5.0/data 939 secs: autorepair pid=1149 /raid/cim_homes: engaging hot spare ... 939 secs: autorepair pid=1149 /raid/cim_homes: reconstructing drive 4 ... 939 secs: 1054 939 secs: Rebuild on /raid/cim_homes/repair: Max buffer 2800 in 7491 reads, priority 6 sleep 500 ... The rebuild script will printout its progress every 10% of the job completed 939 secs: Rebuild on /raid/cim_homes/repair @ 0/7491 1920 secs: Rebuild on /raid/cim_homes/repair @ 1498/7491 2414 secs: Rebuild on /raid/cim_homes/repair @ 2247/7491 2906 secs: Rebuild on /raid/cim_homes/repair @ 2996/7491 3.5.2  re-integrating repaired driveRe-integrating a repaired drive into the RAID (levels 3 and 5) After you have replaced the bad drive you must re-integrate it into the RAID set using the following procedure. 1. Start the text GUI 2. Look the list of backends for the RAID set(s). 3. Backends that have been marked faulty will have a (-) to the right of their ID ( e.g. D1.1.0- ). 4. If you set up spares the ID of the faulty backend will be followed by the ID of the spare that has replaced it ( e.g. D1.1.0-D1.5.0 ) . 5. Write down the ID(s) of the faulty backend(s) (NOT the spares). 6. Press Q to exit agui 7. At the husky prompt type: replace    Where is whatever you named the raid set and is the ID of the backend that is being re-integrated into the RAID. If a spare was in use it will be automatically returned to the spares pool. Be patient, reconstruction can take a few minutes minutes to several hours depending on the RAID level and the size. Fortunately, you can use the RAID as you normally would during this process. 3.6  Troubleshooting / Error Messages 3.6.1  Out of band temperature detected...   * Probable Cause: The 5070 SBUS card is not adequately cooled.   * Solution: Try to improve cooling inside the case. Clean dust from the fans, re-organize the cards so the raid card is closest to the fan, etc. On some of the "pizza box" sun cases (e.g. SPARC 20) you may need to add supplementary cooling fans especially if you have it loaded with cards. 3.6.2  ... failed ... cannot have more than 1 faulty backend.   * Cause: More than one backend in the RAID 3/4/5 has failed (i.e. there is no longer sufficient redundancy to enable the lost data to be reconstructed).   * Solution: You're hosed ... Sorry. If you did not assign spares when you configured you RAID 3/4/5 now may be a good time to re-consider the wisdom of that decision. Hopefully you have been making regular backups. Since now you will have to replace the defective drives, re-configure the RAID, and restore the data from a secondary source. 3.6.3  When booting I see: ... Sun disklabel: bad magic 0000 ... unknown partition table.   * Suspected Cause: Incorrect settings in the disk label set by fdisk (or whatever partitioning utility you used). This message seems to happen when you choose one of the preset disk labels rather than "Custom with autoprobed defaults."   * Solution: Since this error does not seem to effect the operation of the drive you can choose to do nothing and be ok. If you want to correct it you can try re-labeling the disk or re-partitioning the disk and choose "Custom with autoprobed defaults." If you are installing RedHat Linux from scratch the installer will get all of this right for you. 3.7  Bugs None yet! Please send bug reports to tcoates@neuropunk.org 3.8  Frequently Asked Questions 3.8.1  How do I reset/erase the onboard configuration? At the husky prompt issue the following command: rconf -init This will delete all of the RAID configuration information but not the global variables and scsi monitors. the remove ALL configuration information type: rconf -fullinit Use these commands with caution! 3.8.2  How can I tell if a drive in my RAID has failed? In the text GUI faulty backends appear with a (-) to the right of their ID. For example the list of backends: D0.0.0,D1.0.0-,D2.0.0,D0.1.0,D1.1.0,D2.1.0 Indicates that backend (drive) D1.0.0 is either faulty or not present. If you assigned spares (RAID 3 or 5) then you should also see that one or more spares are in use. Both the main and the and the RaidSets screens will show information on faulty/not present drives in a RAID set. 3.9  command referenceAdvanced Topics: 5070 Command Reference In addition to the text based GUI the RAID configuration may also be manipulated from the husky prompt ( the : raid; prompt) of the onboard controller. This section describes commands that a user can input interactively or via a script file to the K9 kernel. Since K9 is an ANSI C Application Programming Interface (API) a shell is needed to interpret user input and form output. Only one shell is currently available and it is called husky. The K9 kernel is modelled on the Plan 9 operating system whose design is discussed in several papers from AT&T (See the "Further Reading" section for more information). K9 is a kernel targeted at embedded controllers of small to medium complexity (e.g. ISDN-ethernet bridges, RAID controllers, etc). It supports multiple lightweight processes (i.e. without memory management) on a single CPU with a non-pre-emptive scheduler. Device driver architecture is based on Plan 9 (and Unix SVR4) STREAMS. Concurrency control mechanisms include semaphores and signals. The husky shell is modelled on a scaled down Unix Bourne shell. Using the built-in commands the user can write new scripts thus extending the functionality of the 5070. The commands (adapted from the 5070 man pages) are extensive and are described below. 3.9.1  autobootAUTOBOOT - script to automatically create all raid sets and scsi monitors   * SYNOPSIS: autoboot   * DESCRIPTION: autoboot is a husky script which is typically executed when a RaidRunner boots. The following steps are taken - 1. Start all configured scsi monitor daemons (smon). 2. Test to see if the total cache required by all the raid sets that are to boot is not more than 90% of available memory. 3. Start all the scsi target daemons (stargd) and set each daemon's mode to "spinning-up" which enables it to respond to all non medium access commands from the host. This is done to allow hosts to gain knowledge about the RaidRunner's scsi targets as quickly as possible. 4. Bind into the root (ram) filesystem all unused spare backend devices. 5. Build all raid sets. 6. If battery backed-up ram is present, check for any saved writes and restore them into the just built raid sets. 7. Finally, set the state of all scsi target daemons to "spun-up" enabling hosts to fully access the raid set's behind them. 3.9.2  AUTOFAULT - script to automatically mark a backend faulty after a drive failure   * SYNOPSIS: autofault raidset   * DESCRIPTION: autofault is a husky script which is typically executed by a raid file system upon the failure of a backend of that raid set when that raid file system cannot use spare backends or has been configured not to use spare backends. After parsing it's arguments (command and environment) autofault issues a rconf command to mark a given backend as faulty.   * OPTIONS:   + raidset: The bind point of the raid set whose backend failed.   + $DRIVE_NUMBER: The index of the backend that failed. The first backend in a raid set is 0. This option is passed as an environment variable.   + $BLOCK_SIZE: The raid set's io block size in bytes. (Ignored). This option is passed as an environment variable.   + $QUEUE_LENGTH: The raid set's queue length. (Ignored). This option is passed as an environment variable.   * SEE ALSO: rconf 3.9.3  AUTOREPAIR - script to automatically allocate a spare and reconstruct a raid set   * SYNOPSIS: autorepair raidset size   * DESCRIPTION: autorepair is a husky script which is typically executed by either a raid type 1, 3 or 5 file system upon the failure of a backend of that raid set. After parsing it's arguments (command and environment) autorepair gets a spare device from the RaidRunner's spares spool. It then engages it in write-only mode and reads the complete raid device which reconstructs the data on the spare. The read is from the raid file system repair entrypoint. Reading from this entrypoint causes a read of a block immediately followed by a write of that block. The read/write sequence is atomic (i.e is not interruptible). Once the reconstruction has completed, a check is made to ensure the spare did not fail during reconstruction and if not, the access mode of the spare device is set to the access mode of the raid set. The process that reads the repair entrypoint is rebuild. This device reconstruction will take anywhere from 10 minutes to one and a half hours depending on both the size and speed of the backends and the amount of activity the host is generating. During device reconstruction, pairs of numbers will be printed indicating each 10% of data reconstructed. The pairs of numbers are separated by a slash character, the first number being the number of blocks reconstructed so far and the second being the number number of blocks to be reconstructed. Further status about the rebuild can be gained from running rebuild. When the spare is allocated both the number of spares currently used on the backend and the spare device name is printed. The number of spares on a backend is referred to the depth of spares on the backend. Thus prior to re-engaging the spare after a reconstruction a check can be made to see if the depth is the same. If it is not, then the spare reconstruction failed and reconstruction using another spare is underway (or no spares are available), and hence we don't re-engage the drive.   * OPTIONS:   + raidset: The bind point of the raid set whose backend failed.   + size : The size of the raid set in 512 byte blocks.   + $DRIVE_NUMBER: The index of the backend that failed. The first backend in a raid set is 0. This option is passed as an environment variable.   + $BLOCK_SIZE: The raid set's io block size in bytes. This option is passed as an environment variable.   + $QUEUE_LENGTH: The raid set's queue length. This option is passed as an environment variable.   * SEE ALSO: rconf, rebuild 3.9.4  BIND - combine elements of the namespace   * SYNOPSIS: bind [-k] new old   * DESCRIPTION: Bind replaces the existing old file (or directory) with the new file (or directory). If the"-k" switch is given then new must be a kernel recognized device (file system). Section 7k of the manual pages documents the devices (sometimes called file systems) that can be bound using the "-k" switch. 3.9.5  BUZZER - get the state or turn on or off the buzzer   * SYNOPSIS: buzzer or buzzer on|off|mute   * DESCRIPTION: Buzzer will either print the state of the buzzer, turn on or off the buzzer or mute it. If no arguments are given then the state of the buzzer is printed, that is on or off will be printed if the buzzer is currently on or off respectively. If the buzzer has been muted, then you will be informed of this. If the buzzer has not been used since the RaidRunner has booted then the special state, unused, is printed. If the argument on is given the buzzer is turned on, if off, the buzzer is turned off. If the argument mute is given then the muted state of the buzzer is changed.   * SEE ALSO: warble, sos 3.9.6  CACHE - display information about and delete cache ranges   * SYNOPSIS: cache [-D moniker] [-I moniker] [-F] [-g moniker first|last] lastoffset   * DESCRIPTION: cache will print (to standard output) information about the given cache range, delete a given cache range, flush the cache or return the last offset of all cache ranges.   * OPTIONS   + -F: Flush all cache buffers to their backends (typically raid sets).   + -D moniker: Delete the cache range with moniker (name) moniker.   + -I moniker: Invalidate the cache for the given cache range (moniker). This is only useful for debugging or elaborate benchmarks.   + g moniker first|last: Print either the first or last block number of a cache range with moniker (name) moniker.   + lastoffset: Print the last offset of all cache ranges. The last offset is the last block number of all cache ranges. 3.9.7  CACHEDUMP - Dump the contents of the write cache to battery backed-up ram   * SYNOPSIS: cachedump   * DESCRIPTION: cachedump causes all unwritten data in the RaidRunner's cache to be written out to the battery backed-up ram. No data will be written to battery backed-up ram if there is currently valid data already stored there. This command is typically executed when there is something wrong with the data (or it's organization) in battery backed-up ram and you need to re-initialize it. cachedump will always return a NULL status.   * SEE ALSO: showbat, cacherestore 3.9.8  CACHERESTORE - Load the cache with data from battery backed-up ram   * SYNOPSIS: cacherestore   * DESCRIPTION: cacherestore will check the RaidRunner's battery backed-up ram for any data it has stored as a result of a power failure. It will copy any data directly into the cache. This command is typically executed automatically at boot time and prior to the RaidRunner making it's data available to a host. Having successfully copied any data from battery backed-up ram into the cache, it flushes the cache and then re-initializes battery backed-up ram to indicate it holds no data. cacherestore will return a NULL status on success or 1 if an error occurred during the loading (with a message written to standard error).   * SEE ALSO: showbat 3.9.9  CAT - concatenate files and print on the standard output   * SYNOPSIS: cat [ file... ]   * DESCRIPTION: cat writes the contents of each given file, or standard input if none are given or when a file named `-' is given, to standard output. If the nominated file is a directory then the filenames contained in that directory are sent to standard out (one per line). More information on a file (e.g. its size) can be obtained by using stat. The script file ls uses cat and stat to produce directory listings.   * SEE ALSO echo, ls, stat 3.9.10  CMP - compare the contents of 2 files   * SYNOPSIS: cmp [-b blockSize] [-c count] [-e] [-x] file1 file2   * DESCRIPTION: cmp compares the contents of the 2 named files. If file1 is "-" then standard input is used for that file. If the files are the same length and contain the same val­ ues then nothing is written to standard output and the exit status NIL (i.e. true) is set. Where the 2 files dif­ fer, the first bytes that differ and the position are out­ put to standard out and the exit status is set to "differ" (i.e. false). The position is given by a block number (origin 0) followed by a byte offset within that block (origin 0). The optional "-b" switch allows the blockSize of each read operation to be set. The default blockSize is 512 (bytes). For big compares involving disks a relatively large blockSize may be useful (e.g. 64k). See suffix for allowable suffixes. The optional "-c" switch allows the count of blocks read to fixed. A value of 0 for count is interpreted as read to the end of file (EOF). To compare the first 64 Megabytes of 2 files the switches "-b 64k -c 1k" could be used. See suffix for allowable suffixes. The optional "-e" switch instructs ccmmpp to output to stan­ dard out (usually overwriting the same line) the count of blocks compared, each time a multiple of 100 is reached. The final block count is also output. The optional "-x" switch instructs ccmmpp to continue after a comparison error (but not a file error) and keep a count of blocks in error. If any errors are detected only the last one will be output when the command exits. If the "-e" switch is also given then the current count of blocks in error is output to the right of the multiple of 100 blocks compared. This command is designed to compare very large files. Two buffers of blockSize are allocated dynamically so their size is bounded by the amount of memory (i.e. RAM in the target) available at the time of command execution. The count could be up to 2G. The number of bytes compared is the product of blockSize and count (i.e. big enough).   * SEE ALSO: suffix 3.9.11  CONS - console device for Husky   * SYNOPSIS: bind -k cons bind_point   * DESCRIPTION: cons allows an interpreter (e.g. Husky) to route console input and output to an appropriate device. That console input and output is available at bind_point in the K9 namespace. The special file cons should always be available.   * EXAMPLES: Husky does the following in its initialization: bind -k cons /dev/cons On a Unix system this is equivalent to: bind -k unixfd /dev/cons On a DOS system this is equivalent to: bind -k doscon /dev/cons On target hardware using a SCN2681 chip this is equivalent to: bind -k scn2681 /dev/cons   * SEE ALSO: unixfd, doscon, scn2681 3.9.12  DD - copy a file (disk, etc)   * SYNOPSIS: dd [if=file] [of=file] [ibs=bytes] [obs=bytes] [bs=bytes] [skip =blocks] [seek=blocks] [count=blocks] [flags=verbose]   * DESCRIPTION: dd copies a file (from the standard input to the standard output, by default) with a user-selectable blocksize.   * OPTIONS   + if=file Read from file instead of the standard input.   + of=file, Write to file instead of the standard output.   + ibs=bytes, Read given number of bytes at a time.   + obs=bytes, Write given number of bytes at a time.   + bs=bytes, Read and write given number of bytes at a time. Override ibs and obs.   + skip=blocks, Skip ibs-sized blocks at start of input.   + seek=blocks, By-pass obs-sized blocks at start of output.   + count=blocks, Copy only ibs-sized input blocks.   + flags=verbose, Print (to standard output) the number of blocks copied every ten percent of the copy. The output is of the form X/T where X is the number of blocks copied so far and T is the total number of blocks to copy. This option can only be used if both the count= and of= options are also given. The decimal numbers given to "ibs", "obs", "bs", "skip", "seek" and "count" must not be negative. These numbers can optionally have a suffix (see suffix). dd outputs to standard out in all cases. A successful copy of 8 (full) blocks would cause the following output: 8+0 records in 8+0 records out The number after the "+" is the number of fractional blocks (i.e. blocks that are less than the block size) involved. This number will usually be zero (and is otherwise when physical media with alignment requirements is involved). A write failure outputting the last block on the previous example would cause the following output: Write failed 8+0 records in 7+0 records out   * SEE ALSO: suffix 3.9.13  DEVSCMP - Compare a file's size against a given value   * SYNOPSIS: devscmp filename size   * DESCRIPTION: devscmp will find the size of the given file and compare it's size in 512-byte blocks to the given size (to be in 512-byte blocks). If the size of the file is less than the given value, then -1 is printed, if equal to then 0 is printed, and if the size of the given file is greater than the given size then 1 is printed. This routine is used in internal scripts to ensure that backends of raid sets are of an appropriate size. 3.9.14  DFORMAT- Perform formatting functions on a backend disk drive   * SYNOPSIS   + dformat -p c.s.l -R bnum   + dformat -p c.s.l -pdA|-pdP|-pdG   + dformat -p c.s.l -S [-v] [-B firstbn]   + dformat -p c.s.l -F   + dformat -p c.s.l -D file   * DESCRIPTION: In it's first form dformat will either reassign a block on a nominated disk drive. via the SCSI-2 REASSIGN BLOCKS command. The second form will allow you to print out the current manufacturers defect list (-pdP), the grown defect list (-pdG) or both defect lists (-pdA). Each printed list is sorted with one defect per line in Physical Sector Format - Cylinder Number, Head Number and Defect Sector Number. The third form causes the drive to be scanned in a destructive write/read/compare manner. If a read or write or data comparison error occurs then an attempt is made to identify the bad sector(s). Typically the drive is scanned from block 0 to the last block on the drive. You can optionally give an alternative starting block number. The fourth form causes a low level format on the specified device. The fifth option allows you to download a device's microcode into the device.   * OPTIONS:   + -R bnum: Specify a logical block number to reassign to the drive's grown defect list.   + -pdA: Print both the manufacturer's and grown defect list.   + \ -pdP: Print the manufacturer's defect list.   + -pdG: Print the grown defect list.   + -S: Perform a destructive scan of the disk reporting I/O errors.   + -B firstbn: Specify the first logical block number to start a scan from.   + -v: Turn on verbose mode - which prints the current block number being scanned.   + -F: Issue a low-level SCSI format command to the given device. This will take some time.   + -D file: Download into the specified device, the given file. The download is effected by a single SCSI Write-Buffer command in save microcode mode. This allows users to update a device's microcode. Use this command carefully as you could destroy the device by loading an incorrect file.   + -p c.s.l: Identify the disk device by specifying it's channel, SCSI ID (rank) and SCSI LUN provided in the format "c.s.l"   * SEE ALSO: Product manual for disk drives used in your RAID. 3.9.15  DIAGS - script to run a diagnostic on a given device   * SYNOPSIS: diags disk -C count -L length -M io-mode -T io-type -D device   * DESCRIPTION: diags is a husky script which is used to run the randio diagnostic on a given device. When randio is executed, it is executed in verbose mode.   * OPTIONS:   + disk: This is the device type of diagnostic we are to run.   + -C count: Specify the number of times to execute the diagnostic.   + -L length: Specify the "length" of the diagnostic to execute. This can be either short, medium or long and specified with the letter's s, m or l respectively. In the case of a disk, a short test will the first 10% of the device, a medium the first 50% and long the whole (100%) of the disk.   + -M io-mode: Specify a destructive (read-write) or non-destructive (read-only) test. Use either read-write or read-only.   + -T io-type: Specify a type of io - either sequential or random.   + -D device: Specify the device to test.   * SEE ALSO: randio, scsihdfs 3.9.16  DPART - edit a scsihd disk partition table   * SYNOPSIS:   + dpart -a|d|l|m -D file [-N name] [-F firstblock] [-L lastblock]   + dpart -a -D file -N name -F firstblock -L lastblock   + dpart -d -D file -N name   + dpart -l -D file   + dpart -m -D file -N name -F firstblock -L lastblock   * DESCRIPTION: Each scsihd device (typically a SCSI disk drive) can be divided up into eight logical partitions. By default when a scsihd device is bound into the RaidRunner's file system, it has four partitions, the whole device (raw), typically named bindpoint/raw, the partition file (bindpoint/partition), the RaidRunner backup configuration file (bindpoint/rconfig), and the "data" portion of the disk (bind- point/ data) which represents the whole device less the backup configuration area and partition file. For more information, see scsihdfs. If other partitions are added, then they will appear as bindpoint/partitionname. dpart allows you to edit or list the partition table on a scsihd device (typically a disk).   * OPTIONS:   + -a: Add a partition. When adding a partition, you need to specify the partition name (-N) and the partition range from the first block (-F) to the last block (-L).   + -d: Delete a named (-N) partition.   + -l: List all partitions.   + -m: Modify an existing partition. You will need to specify the partition name (-N) and BOTH it's first (-F) and last (-L) blocknumbers even if you are just modifying the last block number.   + -D file: Specify the partition file to be edited. Typically, this is the bindpoint/partition file.   + -N name: Specify the partition name.   + -F firstblock: Specify the first block number of the partition.   + -L lastblock: Specify the last block number of the partition.   * SEE ALSO: scsihd 3.9.17  DUP - open file descriptor device   * SYNOPSIS: bind -k dup bind_point   * DESCRIPTION: The dup device makes a one level directory with an entry in that directory for every open file descriptor of the invoking K9 process. These directory "entries" are the numbers. Thus a typical process (script) binding a dup device would at least make these files in the namespace: "bind_point/0", "bind_point/1" and "bind_point/2". These would correspond to its open standard in, standard out and standard error file descriptors. A dup device allows other K9 processes to access the open file descriptors of the invoking process. To do this the other processes simply "open" the required dup device directory entry whose name (a number) corresponds to the required file descriptor. 3.9.18  ECHO - display a line of text   * SYNOPSIS: echo [string ...]   * DESCRIPTION: echo writes each given string to the standard output, with a space between them and a newline after the last one. Note that all the string arguments are written in a single write kernel call. The following backslash-escaped characters in the strings are converted as follows: \b backspace \c suppress trailing newline \f form feed \n new line \r carriage return \t horizontal tab \v vertical tab \\ backslash \nnn the character whose ASCII code is nnn (octal)   * SEE ALSO: cat 3.9.19  ENV- environment variables file system   * SYNOPSIS: bind -k env bind_point   * DESCRIPTION: env file system associates a one level directory with the bind_point in the K9 namespace. Each file name in that directory is the name of the environment variable while the contents of the file is that variable's current value. Conceptually each process sees their own copy of the env file system. This copy is either empty or inherited from this process's parent at spawn time (depending on the flags to spawn). 3.9.20  ENVIRON - RaidRunner Global environment variables - names and effects   * DESCRIPTION: The RaidRunner uses GLOBAL environment variables to control the functionality of automatic actions. GLOBAL environment variables are saved in the Raid configuration area so they retain their values between reboots/power downs. Certain RaidRunner internal run-time variables can also be set as a GLOBAL environment variables. See the internals manual entry for details. The table below describes those GLOBAL environment variables that are used by the RaidRunner in it's normal operation.   + RebuildPri This variable, if set, controls the priority used when drive reconstruction occurs via the rebuild program. If the variable is not set then the default rebuild priority would be used. The variable is to be a comma separated list of raid set names and their associated rebuild priorities and sleep periods (colon separated). The form is Rname_1:Pri_1:Sleep_1,Rname_2:Pri_2:Sleep_2,...,Rname_N:Pri_N:Sleep_N where Pri_1 is to be the priority the rebuild program runs with when run on raid set Rname_1, Sleep_1 is the period, in milliseconds, to sleep between each rebuild action on the raid set, Pri_2 is to be the priority for raid set Rname_2, and so forth. For example, if the value of RebuildPri is R:5:30000 then if a rebuild occurs (via replace, repair or autorepair) on raid set R then the rebuild will run with priority 5 (via the -p rebuild option) and will sleep 30000 milliseconds (30 seconds) between each rebuild action (specified via the -S rebuild option). The priority given must be valid for the rebuild program.   + BackendRanks On certain RaidRunner's where multiple controllers may exist, you can restrict a controller's access to the backend ranks of devices available. For example, you may have 2 controllers and 4 ranks of backend devices. You can specify that the first controller can only access the first two ranks and the second controller, the second two ranks. This variable along with other associated commands allows you to set up this restriction. Additionally, you may only have a single controller RaidRunner which is in an enclosure with multiple ranks. By default the controller will attempt to probe for all devices on all ranks. If you have only populated the RaidRunner with say, half it's possible compliment of backend devices, then the RaidRunner will still probe for the other half. Setting this variable appropriately will prevent this un-needed (and on occasion time consuming) process. This variable takes the form controller_id:ranklist controller_id:ranklist ... where controller_id is the controller number (from 0 upwards) and ranklist is a comma list of backend ranks which the given controller will access. Note that the backend rank is the scsi-id of that rank. For example, on a 2 rank (rank 1 and 2 - i.e scsi id 1 for the first rank and scsi id 2 for the second), 1 controller This variable takes the form For example, on a 2 rank (rank 1 and 2 - i.e scsi id 1 for the first rank and scsi id 2 for the second), 1 controller RaidRunner where only the first rank has devices you could prevent the controller from attempting to access the (empty) second rank by setting BackendRanks to 0:1 Typically, you would not set this variable directly, but use supporting commands to set it. These commands are pranks and sranks. See these manual entries for details.   * RAIDn_reference_PBUFS Raid types 3, 4 and 5 all make use of memory for temporary parity buffers when they need to create parity data. This memory is in addition to that allocated to a raid set's cache. When a raid set is created, it will also create a default number of parity buffers (which are the same size is a raid set's iosize). Sometimes, if the iosize of the raid set is large there will not be enough memory to create this default number of parity buffers. To overcome this situation, you can set GLOBAL environment variables to over-ride the default number of parity buffers that all raid sets of a particular type or a specific raid set will use. You need to set these variables before you define the raid set via agui and if you delete them and not the raid set, then the effect raid sets may not boot and hence will not be accessible by a host. The variables are of the form RAIDn_reference_PBUFS where n is the raid type (3, 4 or 5), and reference is the raid set's name or the string 'Default' You use the reference of 'Default' to specify all raid sets of a particular type. For example, to over-ride the number of parity buffers for a raid 5 named : raid ; setenv RAID5_FRED_PBUFS 64 To over-ride the number of parity buffers for ALL raid 3's (and set only 72 parity buffers) set : raid ; setenv RAID3_Default_PBUFS 128 If you set a default for all raid sets of a particular type, but want ONE of them to be different then set up a variable for that particular raid set as it's value will over-ride the default. In the above example, where all Raid Type 3 will have 128 parity buffers, you could set the variable : raid ; setenv RAID3_Dbase_PBUFS 56  which will allow the raid 3 raid set named 'Dbase' to have 56 parity buffers, but all other raid 3's defined on the RaidRunner will have 128.   * SEE ALSO: setenv, printenv, rconf, rebuild, internals 3.9.21  EXEC - cause arguments to be executed in place of this shell   * SYNOPSIS: exec [ arg ... ]   * DESCRIPTION: exec causes the command specified by the first arg to be executed in place of this shell without creating a new process. Subsequent args are passed to the command specified by the first arg as its arguments. Shell redirection may appear and, if no other arguments are given, causes the shell input/output to be modified. 3.9.22  EXIT - exit a K9 process   * SYNOPSIS: exit [string]   * DESCRIPTION: exit has an optional string argument. If the optional argument is given the current K9 process is terminated with the given string as its exit value. (If the string has embedded spaces then the whole string should be a quoted_string). If no argument is given then the shell gets the string associated with the environment variable "status" and returns that string as the exit value. If the environment variable "status" is not found then the "true" exit status (i.e. NIL) is returned.   * SEE ALSO: true, K9exit 3.9.23  EXPR - evaluation of numeric expressions   * SYNOPSIS: expr numeric_expr ...   * DESCRIPTION: expr evaluates each numeric_expr command line argument as a separate numeric expression. Thus a single expression cannot contain unescaped whitespaces or needs to be placed in a quoted string (i.e. between "{" and "}"). Arithmetic is performed on signed integers (currently numbers in the range from -2,147,483,648 to 2,147,483,647). Successful calculations cause no output (to either standard out/error or environment variables). So each useful numeric_expr needs to include an assignment (or op-assignment). Each numeric_expr argument supplied is evaluated in the order given (i.e. left to right) until they all evaluate successfully (returning a true status). If evaluating a numeric_expr fails (usually due to a syntax error) then the expr command fails with "error" as the exit status and the error message is written to the environment variable "error".   * OPERATORS: The precedence of each operator is shown following the description in square brackets. "0" is the highest precedence. Within a single precedence group evaluation is left-to-right except for assignment operators which are right-to-left. Parentheses have higher precedence than all operators and can be used to change the default precedence shown below. UNARY OPERATORS + Does nothing to expression/number to the right. - negates expression/number to the right. ! logically negate expression/number to the right. ~ Bitwise negate expression/number to the right. BINARY ARITHMETIC OPERATORS * Multiply enclosing expressions [2] / Integer division of enclosing expressions % Modulus of enclosing expressions. + Add enclosing expressions - Subtract enclosing expressions. << Shift left expression _left_ by number in right expression. Equivalent to: left * (2 ** right) >> Shift left expression _right_ by number in right expression. Equivalent to: left / (2 ** right) & Bitwise AND of enclosing expressions ^ Bitwise exclusive OR of enclosing expressions. [8] | Bitwise OR of enclosing expressions. [9] BINARY LOGICAL OPERATORS These logical operators yield the number 1 for a true comparison and 0 for a false comparison. For logical ANDs and ORs their left and right expressions are assumed to be false if 0 otherwise true. Both logical ANDs and ORs evaluate both their left and right expressions in all case (cf. C's short-circuit action). <= true when left less than or equal to right. [5] >= true when left greater than or equal to right. [5] < true when left less than right. [5] > true when left greater than right. [5] == true when left equal to right. [6] != true when left not equal to right. [6] && logical AND of enclosing expressions [10] || logical OR of enclosing expressions [11] ASSIGNMENT OPERATORS In the following descriptions "n" is an environment variable while "r_exp" is an expression to the right. All assignment operators have the same precedence which is lower than all other operators. N.B. Multiple assignment operators group right-to-left (i.e. same as C language). = Assign right expression into environment variable on left. *= n *= r_exp is equivalent to: n = n * r_exp /= n /= r_exp is equivalent to: n = n / r_exp %= n %= r_exp is equivalent to: n = n % r_exp += n += r_exp is equivalent to: n = n + r_exp -= n -= r_exp is equivalent to: n = n - r_exp <<= n <<= r_exp is equivalent to: n = n << r_exp >>= n >>= r_exp is equivalent to: n = n >> r_exp &= n &= r_exp is equivalent to: n = n & r_exp |= n |= r_exp is equivalent to: n = n | r_exp   * NUMBERS: All number are signed integers in the range stated in the description above. Numbers can be input in base 2 through to base 36. Base 10 is the default base. The default base can be overridden by: 1. a leading "0" : implies octal or hexadecimal 2. a number of the form _base_#_num_ Numbers prefixed with "0" are interpreted as octal. Numbers prefixed with "0x" or "0X" are interpreted as hexadecimal. For numbers using the "#" notation the _base_ must be in the range 2 through to 36 inclusive. For bases greater then 10 the letters "a" through "z" are utilised for the extra "digits". Upper and lower case letters are acceptable. Any single digit that exceeds (or is equal to) the base is consider an error. Base 10 numbers only may have a suffix. See suffix for a list of valid suffixes. Also note that since expr uses signed integers then "1G" is the largest magnitude number that can be represented with the "Gigabyte" suffix (assuming 32 bit signed integers, -2G is invalid due to the order of evaluation).   * VARIABLES: The only symbolic variables allowed are K9 environment variables. Regardless of whether they are being read or written they should never appear preceded by a "$". Environment variables that didn't previous exist that appear as left argument of an assignment are created. When a non-existent environment variable is read then it is interpreted as the value 0.   * EXAMPLES: Some simple examples: expr {n = 1 + 2} # create n echo $n 3 expr {n*=2} # 3 * 2 result back into n echo $n 6 expr { k = n > 5 } # 6 > 5 is true so create k = 1 echo $k 1   * NOTE: expr is a Husky "built-in" command. See the "Note" section in "set" to see the implications.   * SEE ALSO: husky, set, suffix, test 3.9.24  FALSE - returns the K9 false status   * SYNOPSIS: false   * DESCRIPTION: false does nothing other than return a K9 false status. K9 processes return a pointer to a C string (null terminated array of characters) on termination. If that pointer is NULL then a true exit value is assumed while all other returned pointer values are interpreted as false (with the string being some explanation of what went wrong). This command returns a pointer to the string "false" as its return value.   * EXAMPLE: The following script fragment will print "got here" to standard out: if false then echo impossible else echo got here end   * SEE ALSO: true 3.9.25  FIFO - bi-directional fifo buffer of fixed size   * SYNOPSIS:   + bind -k {fifo size} bind_point   + cat bind_point   + bind_point/data   + bind_point/ctl   * DESCRIPTION: fifo file system associates a one level directory with the bind_point in the K9 namespace with a buffer size of size bytes. bind_point/data and bind_point/ctl are the data and control channels for the fifo. Data written to the bind_point/data file is available for reading from the same file in a first-in first-out basis. A write of x bytes to the bind_point/data file will either complete and and transfer all the data, or will transfer sufficient bytes until the fifo buffer is full then block until data is removed from the fifo buffer by reading. A read of x bytes from the bind_point/data file will transfer the lessor of the current amount of data in the fifo buffer or x bytes. A read from the bind_point/ctl will return the size of the fifo buffer and the current usage. The number of opens (# Opens) is the number of processes that currently have the bind_point/data file open.   * EXAMPLE > /buffer bind -k {fifo 2048} /buffer ls -l /buffer /buffer: /buffer/ctl                     fifo    2 0x00000001    1 0 /buffer/data                    fifo    2 0x00000002    1 0 cat /buffer/ctl Max: 2048 Cur: 0, # Opens: 0 echo hello > /buffer/data cat /buffer/ctl Max: 2048 Cur: 6, # Opens: 0 dd if=/buffer/data bs=512 count=1 hello 0+1 records in 0+1 records out cat /buffer/ctl Max: 2048 Cur: 0, # Opens: 0   * SEE ALSO: pipe 3.9.26  GET - select one value from list   * SYNOPSIS: get number [ value ... ]   * DESCRIPTION: get uses the given number to select one value from the given list. Indexing is origin 0 (e.g. "get 0 aaa bb c" returns "aaa"). If the number is out of range for an index on the given list of values then nothing is returned. 3.9.27  GETIV - get the value an internal RaidRunner variable   * SYNOPSIS:   + getiv   + getiv name   * DESCRIPTION: getiv prints the current value of an internal RaidRunner variable or prints a list of all variables. When a variable name is given it's current value is printed. If no value is given the all available internal variables are listed.   * NOTES: As different models of RaidRunners have different internal variables see your RaidRunner's Hardware Reference manual for a list of variables together with the meaning of their values. These variables are run-time variables and hence revert to their default value whenever the RaidRunner is booted.   * SEE ALSO: setiv 3.9.28  HELP - print a list of commands and their synopses   * SYNOPSIS: help or ?   * DESCRIPTION: help or the question mark character - ?, will print a list of all commands available to the command interpreter. Along with each command, it's synopsis is printed. 3.9.29  HUSKY - shell for K9 kernel   * SYNOPSIS   + husky [-c command] [ file [ arg ... ] ]   + hs [-c command] [ file [ arg ... ] ]   * DESCRIPTION: husky and hs are synonyms. husky is a command language interpreter that executes commands read from the standard input or from a file. husky is a scaled down model of Unix's Bourne shell (sh). One major difference is that husky has no concept of current working directory. If the "-c" switch is present then the following command is interpreted by husky in a newly thrown shell nested in the current environment. This newly thrown shell exits back to the current environment when the command finishes. Otherwise if arguments are given the first one is assumed to be a file containing husky commands. Again a new shell is thrown to execute these commands. husky script files can access their command line arguments and the 2nd and subsequent arguments to husky (if present) are passed to the file for that purpose. If no arguments are given to husky then commands are read from standard in (and the shell is considered interactive).   * RETURN STATUS: husky places the K9 return status of a process (NIL if ok, otherwise a string explaining the error) in the file "/env/status" An example: dd if=/xx dd: could not open /xx cat /env/status open failed cat /env/status # empty because previous "cat" worked As the file "/env/status" is an environment variable the return status of a command is also available in the variable $status. The exit status of a pipeline is the exit status of the last command in the pipeline.   * SIGNALS If an interactive shell receives an interrupt signal (i.e. K9_SIGINT - usually a control-C on the console) then the shell exits. The "init" process will then start a new instance of the husky shell with all the previously running processes (with the exception of the just killed shell) still running. This allows the user to kill the process that caused the previous shell problems. Alternatively a process that is acci­ dentally run in foreground is effectively put in the background by sending an interrupt signal to the shell. Note that this is quite different to Unix shells which would forward the signal onto the foreground process.   * QUOTES, ESCAPING, STRING CONCATENATION, ETC: A quoted_string (as defined in the grammar) commences with a "{" and finishes with the matching "}". The term "matching" implies that all embedded "{" must have a corresponding embedded "}" before the final "}" is said to match the original "{". A quoted_string can be spread across several lines. No command line substitution occurs within quoted_strings. The character for escaping the following character is "\". If a "{" needs to be interpreted literally then it can be represented by "\{". If a string containing spaces (whitespaces) needs to be interpreted as a single token then space (whitespace) can be escaped (i.e. "\ "). If a "\" itself needs to be interpreted literally then it can be represented by "\\". The string concatenation character is "^". This is useful when a token such as "/d4" needs to built up by a script when "/d" is fixed and the "4" is derived from some variable: set n 4 > /d^$n This example would create the file "/d4". The output of another husky command or script can be made available inline by starting the sequence with "`" and finishing it with a "'". For example: echo {ps output follows: } `ps' This prints the string "ps output follows:" followed on the next line by the current output from the command "ps". That output from "ps" would have its embedded newlines replaced by whitespaces.   * COMMAND LINE FILE REDIRECTION:   + Redirection should appear after a command and its arguments in a line to be interpreted by husky. A special case is a line that just contains "> filename" which creates the filename with zero length if it didn't previously exist or truncates to zero length if it did.   + Redirection of standard in to come from a file uses the token "<" with the filename appearing to its right. The default source of standard in is the console.   + Redirection of standard out to go to a file uses the token ">" with the filename appearing to its right. The default destination of standard out is the console.   + Redirection of standard error to go to a file uses the token ">[2]" with the filename appearing to its right. The default destination of standard error is the console.   + Redirection of writes from within a command which uses a known file descriptor number (say "n") to go to a file uses the token ">[n]" with the filename appearing to its right.   + Redirection of read from within a command which uses a known file descriptor number (say "n") to come from a file uses the token "<[n]" with the filename appearing to its right.   + Redirection of reads and writes from within a command which uses a known file descriptor number (say "n") to a file uses the token "<> [n]" with the filename appearing to its right. In order to redirect both standard out and standard error to the one file the form " > filename >[2=1]" can be used. This sequence first redirects standard out (i.e. file descriptor 1) to filename and then redirects what is written to file descriptor 2 (i.e. standard error) to file descriptor 1 which is now associated with filename.   * ENVIRONMENT VARIABLES: Each process can access the name it was invoked by via the variable: "arg0" . The command line arguments (excluding the invocation name) can be accessed as a list in the variable: "argv" . The number of elements in the list "argv" is place in "argc". The get command is useful for fetching individual arguments from this list. The pid of the current process can be fetched from the variable: "pid". When a script launches a new process in the background then the child's pid can be accessed from the variable "child". The variable "ContollerId" is set to the RaidRunner controller number husky is running on. Environment variables are a separate "space" for each process. Depending on the way a process was created, its initial set of environment variables may be copied from its parent process at the "spawn" point.   * SEE ALSO: intro 3.9.30  HWCONF - print various hardware configuration details   * SYNOPSIS: hwconf [-D] [-M] [-I] [-d [-n]] [-f] [-h] [-i -p c.s.l] [-m] [-p c.s.l] [-s] [-S] [-t] [-T] [-P] [-W]   * DESCRIPTION: hwconf prints details about the RaidRunner hardware and devices attached.   * OPTIONS:   + -h: Print the number of controllers, host interfaces per controller, the number of disk channels per controller, number of ranks of disks and the details memory (in bytes) on each controller. Four memory figures are printed, the first is the total memory in the controller, next is the amount of memory at boot time, next is the amount currently available and lastly is the largest available contiguous area of memory. This is the default option.   + -f: Print the number of fans in the RaidRunner and then the speed for each fan in the system. The speeds values are in revolutions per minute (rpms). The fans in the system are labeled in your hardware specification sheet for your RaidRunner. The first speed printed from this command corresponds to fan number 0 on your specification sheet, the second is for fan 1, and so forth.   + -d: Print out information on all the disk drives on the RaidRunner. For each disk on the RaidRunner, print out - the device name, in the format c.s.l where c is the channel, s is the SCSI ID (or rank) and l is the SCSI LUN of the device, the manufacturer's name (vendor id), the disk's model name (product id), the disk's version id, the disk serial number, the disk geometry - number of cylinders, heads and sectors, and the last block number on the disk and the block size in bytes. the disk revolution count per minute (rpm's), the number of notches/zones available on the drive (if any)   + -n: Print out the disk drive notch/zone tables if available. This is a sub-option to the -d option. Not all disks appear to correctly report the notch/zone partition tables. For each notch/zone,   + the following is printed: the zone number, the zone's starting cylinder, the zone's starting head, the zone's ending cylinder, the zone's ending head, the zone's starting logical block number, the zone's ending logical block number, the zone's number of sectors per track   + -D: Print out the device names for all disk drives on the system.   + -I: Initialize back-end NCR SCSI chips. This flag may be used in conjunction with any other option and will done first. It has an effect only the first call to hwconf that has not yet used a -d, -D or -I options, or on those chips that have not yet had a -p on the channel associated with that chip.   + -m: Print out major flash and battery backed-up ram addresses (in hex). Additionally print out the size of the RaidRunner configuration area. Eight (8) addresses are printed in order RaidRunner configuration area start and end addresses (FLASH RAM), RaidRunner Husky Scripts area start and end addresses (FLASH RAM), RaidRunner Binary Image area start and end addresses (FLASH RAM), RaidRunner Battery Backed-up area start and end addresses. And the size of the RaidRunner configuration area (in bytes) is then printed.   + -p c.s.l: Probe a single device specified by the given channel, SCSI ID (rank) and SCSI LUN provided in the format "c.s.l". The output of this command is the same as the "-d" option but just for the given device. If the device is not present then nothing will be output and the exit status of the command will be 1.   + -i -p c.s.l: Re-initialize the SCSI device driver specified by the given channel, SCSI ID (rank) and SCSI LUN provided in the format "c.s.l". Typically this command is used when, on a running RaidRunner, a new drive is plugged in, and it will be used prior to the RaidRunner's next reboot.   + -M: Set the boottime memory. This option is executed internally by the controller at boot time and has no function (or effect) executed at any other time.   + -s: Print the 12 character serial number of the RaidRunner.   + -S: Issue SCSI spin up commands to all backends as quickly as possible. This option is intended for use at power-on stage only.   + -t: Probe the temperature monitor returning the internal temperature of the RaidRunner in degrees Celsius.   + -T: Print the temperatures being recorded by the hardware monitoring daemon (hwmon).   + -P: For both AC and DC power supplies, print the number of each present and the state of each supply. The state will be printed as ok or flt depending on whether the PSU is working or faulty.   + -W: This option will wait until all possible backends have spun up. It is used in conjunction with   * NOTES : The order of printing the disk information is by SCSI ID (rank), by channel, by SCSI LUN. 3.9.31  HWMON - monitoring daemon for temperature, fans, PSUs.   * SYNOPSIS: hwmon [-t seconds] [-d]   * DESCRIPTION: hwmon is a hardware monitoring daemon. It periodically probes the status of certain elements of a RaidRunner and if an out-of-band occurrence happens, will cause the alarm to sound or light up fault leds as well as saving a message in the system log. Depending on the model of RaidRunner, the elements monitored are temperature, fans and power supplies. When an out-of-band occurrence is found, hwmon will reduce the time between probes to 5 seconds. If a buzzer is the alarm device, then the buzzer will turn on for 5 seconds then off for 5 seconds and repeat this cycle until the buzzer is muted or the occurrence is corrected. If the RaidRunner model supports a buzzer muting switch, then the buzzer will be muted if the switch is pressed during a cycle change as per the previous paragraph. When hwmon recognizes the mute switch it will beep twice. Certain out-of-band occurrences can be considered to be catastrophic, meaning if the occurrence remains uncorrected, the RaidRunner's hardware is likely to be damaged. Occurrences such as total fan failure and sustained high temperature along with total or partial fan failure are considered as catastrophic. hwmon has a means of automatically placing the RaidRunner into a "shutdown" or quiescent state where minimal power is consumed (and hence less heat is generated). This is done by the execution of the shutdown command after a period of time where catastrophic out-of-band occurrences are sustained. This process is enabled, via the AutoShutdownSecs internal variable. See the internals manual for use of this variable. hwmon can be prevented from starting at boot time by creating the global environment variable NoHwmon and setting any value to it. A warning message will be stored in the syslog.   * OPTIONS:   + t seconds: Specify the number of seconds to wait between probes of the hardware elements. If this option is not specified, the default period is 300 seconds.   + -d: Turn on debugging mode which can produce debugging output.   * SEE ALSO: hwconf, pstatus, syslogd, shutdown, internals 3.9.32  INTERNALS - Internal variables used by RaidRunner to change dynamics of running kernel   * DESCRIPTION: Certain run-time features of the RaidRunner can be manipulated by changing internal variables via the setiv command. The table below describes each changeable variable, it's effect, it's default value and range of values it can be set to. The variables below are run-time features of a RaidRunner and hence are always set to their default values when a RaidRunner boots. Certain variables can be stored as a global environment variable and will over-ride the defaults at boot time. If you create a global environment variable of that variable's name with an appropriate value, it's default value will be over-ridden the next time the RaidRunner is re-booted. Note, that the values of these variables ARE NOT CHECKED when set in the global environment variable tables and, if incorrectly set, will generate errors at boot until deleted or corrected. In the table below, any variable that can have a value stored as a global environment variable is marked with (GEnv)   * write_limit: This variable is the maximum number of 512-byte blocks the cache filesystem will buffer for writes. If this limit is reached all writes to the cache filesystem will be blocked until the cache filesystem has written out (to it's backend) enough blocks to reach a low water mark - write_low_tide. This variable cannot be changed if battery backed-up RAM is available as it is tied to the amount of battery backed-up RAM available. The value of this variable is calculated when the cache is initialized. It's value is dependant on whether battery backed-up RAM is installed in the RaidRunner. If installed, the number of blocks of data that can be saved into the battery backed-up RAM is calculated. If no battery backed-up RAM is present, it's value is set to 75% of the RaidRunner's memory (expressed in a count of 512 byte blocks) then adjusted to reflect the amount of cache requested by configured raid sets. When write_limit is changed then both write_high_tide and write_low_tide are automatically changed to there default values (a function of the value of write_limit).   * write_high_tide: This variable is a high water mark for the number of written-to 512-byte blocks in the cache. When the number of data blocks exceeds this value, to avoid the cache filesystem from blocking it's front end, the cache flushing mechanism continually flushes the cache buffer until the amount of unwritten (to the backend) cache buffers is below the low water mark (write_low_tide). This value defaults to 75% of write_limit. This variable can have values ranging from write_limit down to write_low_tide. It is recommended that this variable not be changed.   * write_low_tide: This variable is a low water mark for when the cache flushing mechanism is continually flushing data to it's backend. Once the number of written-to cache blocks yet to be flushed equals or is less than this value, the sustained flushing is stopped. This value defaults to 25% of write_limit. This variable can have values ranging from write_high_tide-1 down to zero (0). It is recommended that this variable not be changed.   * cache_nflush: This variable is the number of cache buffers (not 512-byte data blocks) that the cache flushing mechanism will attempt to write out in one flush cycle. Adjusting this value may improve performance on writes depending of the size of the cache buffers and type of disk drives used in the raid set backends. The default value is 128. It's value can range from 2 to 128.   * cache_nread: This variable is the number of cache buffers (not 512-byte data blocks) that the cache reading mechanism will attempt to read out in one read cycle. Adjusting this value may improve performance on reads depending of the size of the cache buffers and type of disk drives used in the raid set backends. The default value is 128. It's value can range from 2 to 128.   * cache_wlimit: This variable is the number of cache buffers (not 512-byte data blocks) that the cache flushing mechanism will attempt coalesce into a single sequential write. It is different to cache_nflush in that cache_nflush is the total number of cache buffers that can be written in a single cache flush cycle and these buffers can be non sequential whereas cache_wlimit is a limit on the number of sequential cache buffer's that can be written with one write. Adjusting this value may improve performance on writes depending of the size of the cache buffers and type of disk drives used in the raid set backends. The default value is 128. It's value can range from 2 to 128.   * cache_fperiod (GEnv): By default, the cache flushes any data to be written every 1000 milliseconds (unless it's forced to by the fact that the cache is getting full and then it flushes the cache and resets the timer). You can vary this flushing period by setting this variable. Given you have a large number of sustained reads and minimal writes, then you may want to delay the writes out of cache to the backends as long as possible. Note, that by setting this to a high value, you run the risk of loosing what you have written. The default value is 1000 milliseconds (i.e 1 second). It's value can range from 500ms to 300000ms.   * scsi_write_thru (GEnv): By default all writes (from a host) are buffered in the RaidRunner's cache and are flushed to the backend disks periodically. When battery backed-up RAM is available then this results in the most efficient write throughput. If no battery backed-up RAM is available or you do not want to depend on writes being saved in battery backed-up RAM in event of a power failure you can force the RaidRunner to write data straight thru to the backends prior to returning an OK status to the host. This essentially provides a write-thru cache. The default value of this variable is 0 - write-thru mode is DISABLED. The values this variable can take are   + 0 - DISABLE write-thru mode, or   + 1 - ENABLE write-thru mode.   * scsi_write_fua (GEnv): This variable effects what is done when the FUA (Force Unit Access) bit is set on a SCSI WRITE-10 command. When this variable is enabled and a SCSI WRITE-10 command has the FUA bit set is processed then the data is written directly thru the cache to the backend disks. If the variable is disabled, then the setting of the FUA bit on SCSI WRITE-10 commands is ignored. The default value for this variable is disabled (0) if battery backed-up RAM is present, or enabled (1) if battery backed-up RAM is NOT present. The values this variable can take are   + 0 - IGNORE FUA bit on SCSI WRITE-10 commands, or   + 1 - ACT on FUA bit on SCSI WRITE-10 commands.   * scsi_ierror (GEnv): This variable controls what is done when the RaidRunner receives a Initiator Detected Error message on a SCSI host channel. If set (1), cause an Check Condition, If NOT set (0), follow the SCSI-2 standard and re-transmit the Data In / Out phase. The default value is 0. The values this variable can take are   + 0 - follow SCSI-2 standard   + 1 - ignore the SCSI-2 standard and cause a Check Condition.   * scsi_sol_reboot (GEnv): Determines whether to auto-detect a Solaris reboot and the clear any wide mode negotiations. If set (1), detect a Solaris reboot and clear wide mode. If NOT set (0), follow the SCSI-2 standard and not clear wide mode. The default value is 0. The values this variable can take are   + 0 - follow SCSI-2 standard   + 1 - ignore the SCSI-2 standard and clear wide mode.   * scsi_hreset (GEnv): Determines whether to issue a SCSI bus reset on host ports after power-on. If set (1), then a SCSI bus reset is done on the host port when starting the first smon/stargd process on that port. If NOT set (0), nothing is done. The default value is 0. The values this variable can take are   + 0 - don't issue SCSI bus resets on power-on.   + 1 - issue SCSI bus resets on power-on when the first smon/stargd process is started.   * scsi_full_log (GEnv): Determines whether or not stargd reports, via syslog, a Reset Check condition on Read, Write, Test Unit Ready and Start Stop commands. This reset check condition is always set when a RaidRunner boots or the raid detects a scsi-bus reset. Note that this variable only suppresses the logging of this Check condition into syslog, it does not effect the response to the host of this and any Check condition. If set (1), then all stargd detected reset Check condition error messages are logged. If NOT set (0), these messages are suppressed The default value is 0. The values this variable can take are   + 0 - suppress logging these messages   + 1 - log all messages.   * scsi_ms_badpage (GEnv): Determines whether or not stargd reports, via syslog, that it has received a non-supported page number in a MODE SENSE or MODE SELECT command it receives from a host. Note that stargd will issue the appropriate Check condition to the host ("Invalid Field in CDB") irrespective of the value of this variable. If set (1), then all stargd detected non-supported page numbers in MODE SENSE and MODE SELECT commands will be logged. If NOT set (0), these messages are suppressed The default value is 0. The values this variable can take are   + 0 - suppress logging these messages   + 1 - log all messages.   * scsi_bechng (GEnv): Determines whether or not the raid reports backend device parameter change errors. In a multi controller environment, backends are probed and some of their parameters are changed by a booting controller. This will generate parameter change mode sense errors. If cleared (0), then all parameter change errors will NOT be logged. If set (1), these messages are logged like any other backend error. The default value is 0. The values this variable can take are   + 0 - suppress logging these messages   + 1 - log all messages.   * scsi_dnotch (GEnv): Some disk drives take an inordinate amount of time to perform mode select commands. One set of information a RaidRunner will obtain from a device backend are the disk notch pages (if present). As this is for information only, then to reduce the boot time of a RaidRunner you can request that disk notches are not obtained. If cleared (0), backend disk notch information is not probed for. If set (1), then backend disk notch information is probed for. The default value is 1. The values this variable can take are:   + 0 - don't probe for notch pages   + 1 - probe for notch pages   * scsi_rw_retries (GEnv): Specify the number of read or write retries to perform on a device backend before effecting an error on the given operation. Note that ALL retries are reported via syslog. The default value is 3. It's value can range from 1 to 9.   * scsi_errpage_r (GEnv): Specify the number of internal read retries that a disk backend is to perform before reporting an error (to the raid). Setting this variable causes the Read Retry Count field in the Read-Write Error Recovery mode sense page. A value of -1 will cause the drive's default to be used. The default value is -1. It's value can range from -1 (use disk's default) or from 0 to 255.   * scsi_errpage_w (GEnv): Specify the number of internal write retries that a disk backend is to perform before reporting an error (to the raid). Setting this variable causes the Write Retry Count field in the Read-Write Error Recovery mode sense page. A value of -1 will cause the drive's default to be used. The default value is -1. It's value can range from -1 (use disk's default) or from 0 to 255.   * BackFrank: Specify the SCSI-ID of the first rank of backend disks on a RaidRunner. This variable should never be changed and is for informative purposes only. The default value is dependant on the model of RaidRunner being run. The values this variable can take are   + 0 - the first rank SCSI-ID will be 0   + 1 - the first rank SCSI-ID will be 1   * raid_drainwait (GEnv): Specify the number of milliseconds a raidset is to delay, before draining all backend I/O's when a backend fails. Setting this variable to a lower value will speed up the commencement of any error recovery procedures that would be performed on a raid set when a backend fails. The default value is 500 milliseconds. It's value can range from 50 to 10000 milliseconds.   * EnclosureType: Specify the enclosure type a raid controller is running within. This variable should never be changed and is for informative purposes only. The default value is dependant on the model of RaidRunner being run. The values this variable can take are integers starting from 0.   * fmt_idisc_tmo (GEnv): Specify the SCSI command timeout (in milliseconds) when a SCSI FORMAT command is issued on a backend. Disk drives take different amounts of time to perform a SCSI FORMAT command and hence a timeout is required to be set when the command is issued. As certain drives may take longer to format than the default timeout you can change it. The default value is 720000 milliseconds. It's value can range from 200000 to 1440000 milliseconds.   * AutoShutdownSecs (GEnv): Specify the number of seconds the RaidRunner should monitor catastrophic hardware failures before deciding to automatically shutdown. A catastrophic failure is one which will cause damage to the RaidRunner's hardware if not fixed immediately. Failures like all fans failing would be considered catastrophic. A value of 0 seconds (the default) will disable this feature, that is, with the exception of logging the errors, no action will occur. See the shutdown and hwmon for further details. The default value is 0 seconds. It's value can range from 20 to 125 seconds.   * SEE ALSO: setiv, getiv, syslog, setenv, printenv, hwmon, shutdown 3.9.33  KILL - send a signal to the nominated process   * SYNOPSIS: kill [-sig_name] pid   * DESCRIPTION: kill sends a signal to the process nominated by pid. If the pid is a positive number then only the nominated process is signaled. If the pid is a negative number then the signal is sent to all processes in the same process group as the process with the id of -pid. The switch is optional and if not given a SIGTERM (software termination signal) is sent. If the sig_name switch is given then it should be one of the following (lower case) abbreviations. Only the first 3 letters need to be given for the signal name to be recognized. Following each abbreviation is a brief explanation and the signal number in brackets: null - unused signal [0] hup - hangup [1] int - interrupt (rubout) [2] quit - quit (ASCII FS) [3] kill - kill (cannot be caught or ignored) [4] pipe - write on a pipe with no one to read it [5] alrm - alarm clock [6] term - software termination signal [7] cld - child process has changed state [8] nomem - could not obtain memory (from heap) [9] You cannot kill processes whose process id is between 0 and 5 inclusive. These are considered sacrosanct - hyena, init and console reader/writers.   * SEE ALSO: K9kill 3.9.34  LED- turn on/off LED's on RaidRunner   * SYNOPSIS:   + led   + led led_id led_function   * DESCRIPTION: led uses the given led_id to identify the LED to manipulate based on the led_function. When no arguments are given, an internal LED register is printed along with the current function the onboard LEDS, led1 and led2 are tracing. If a undefined led_id is given, the led command silently does nothing and returns NULL. If an incorrect number of arguments or invalid led_function is given a usage message is printed. Depending on the RaidRunner model the led_id can be one of   + led1 - LED1 on the RaidRunner controller itself   + led2 - LED2 on the RaidRunner controller itself   + Dc.s.l - Device on channel c, scsi id s, scsi lun l   + status - the status LED on the RaidRunner   + io - the io LED on the RaidRunner and led_function can be one of   + on - turn on the given LED   + off - turn off the given LED   + ok - set the given LED to the defined OK state   + faulty - set the given LED to the defined FAULTY state   + warning - set the given LED to the defined WARNING state   + rebuild - set the given LED to the defined REBUILD state   + tprocsw - set the given LED to trace kernel process switching   + tparity - set the given LED to trace I/O parity generation   + tdisconn - set the given LED to trace host interface disconnect activity   + pid - set the given LED to trace the process pid as it runs Different models of RaidRunner have various differences in number of LED's and their functionality. Depending on the type of LED, the ok, faulty, warning and rebuild functions perform different functions. See your RaidRunner's Hardware Reference manual to see what LED's exist and what different functions do.   * NOTES: Tracing activities can only occur on the `onboard` leds (LED1, LED2).   * SEE ALSO: lflash 3.9.35  LFLASH- flash a led on RaidRunner   * SYNOPSIS: lflash led_id period   * DESCRIPTION: lflash uses the given led_id to identify the LED to flash every period seconds. If a undefined led_id is given, the led command silently does nothing and returns NULL. Depending on the RaidRunner model the led_id can be one of: led1 - LED1 on the RaidRunner controller itself led2 - LED2 on the RaidRunner controller itself Dc.s.l - Device on channel c, scsi id s, scsi lun l status - the status LED on the RaidRunner io - the io LED on the RaidRunner   * NOTE: The number of seconds must be greater than or equal to 2.   * SEE ALSO: led 3.9.36  LINE - copies one line of standard input to standard output   * SYNOPSIS: line   * DESCRIPTION: line accomplishes the one line copy by reading up to a newline character followed by a single K9write.   * SEE ALSO: K9read, K9write 3.9.37  LLENGTH - return the number of elements in the given list   * SYNOPSIS: llength list   * DESCRIPTION: llength returns the number of elements in a given list.   * EXAMPLES: Some simple examples: set list D1 D2 D3 D4 D5 # create the list set len `llength $list' # get it's length echo $len 5 set list {D1 D2 D3 D4 D5} {D6 D7}  # create the list set len `llength $list' # get it's length echo $len 2 set list {} # create an empty list set len `llength $list' # get it's length echo $len 0 3.9.38  LOG - like zero with additional logging of accesses   * SYNOPSIS: bind -k {log fd error_rate tag} bind_point   * DESCRIPTION: log is a special file that when written to is a infinite sink of data (i.e. anything can be written to it and it will be disposed of quickly). When log is read it is an infinite source of zeros (i.e. the byte value 0). The log file will appear in the K9 namespace at the bind_point. Additionally, ASCII log data is written to the file associated with file descriptor fd. error_rate should be a number between 0 and 100 and is the percentages of errors (randomly distributed) that will be reported (as an EIO error) to the caller. Each line written to fd will have tag appended to it. There is one line output to fd for each IO operation on the log special file. The first character output is "R" or "W" indicating a read or write. The second character is blank if no error was reported and "*" if one was reported. Next (after a white space) is a (64 bit integer) offset into the file of the start of the operation, followed by the size (in bytes) of that operation. The line finishes with the tag.   * EXAMPLE: Bind a log special file at "/dev/log" that writes log information to standard error. Each line written to standard error has the tag string "scsi" appended to it. Approximately 30% of reads and writes (i.e. randomly distributed) return an EIO error to the caller. This is done as follows: bind "log 2 30 scsi" /dev/log dd if=/dev/zero of=/dev/log count=5 bs=512 W  0000000000 512        scsi W  0000000200 512        scsi W  0000000400 512        scsi W* 0000000600 512        scsi Write failed. 4+0 records in 3+0 records out   * SEE ALSO: zero 3.9.39  LRANGE - extract a range of elements from the given list   * SYNOPSIS: lrange first last list   * DESCRIPTION: lrange returns a list consisting of elements first through last of list. 0 refers to the first element in the list. If first is greateR THAN last then the list is extracted in reverse order.   * EXAMPLES: Some simple examples: set list D1 D2 D3 D4 D5 # create the list set subl `lrange 0 3 $list' # extract from indices 0 to 3 echo $subl D1 D2 D3 D4 set subl `lrange 3 1 $list' # extract from indices 3 to 1 echo $subl D4 D3 D2 set subl `lrange 4 4 $list' # extract from indices 0 to 3 echo $subl # equivalent to get 4 $list D5 set subl `lrange 3 100 $list' echo $subl D4 D5 3.9.40  LS - list the files in a directory   * SYNOPSIS: ls [ -l ] [ directory... ]   * DESCRIPTION: ls lists the files in the given directory on standard out. If no directory is given then the root directory (i.e. "/") is listed. Each file name contained in a directory is put on a separate line. Each listing has a lead-in line stating which directory is being shown. If there is more than one directory then they are listed sequentially separated by a blank line. If the "-l" switch is given then every listed file has data such as its length and the file system it belongs to shown on the same line as its name. See the stat command for more information. ls is not an inbuilt command but a husky script which utilizes cat and stat. The script source can be found in the file "/bin/ps".   * SEE ALSO: cat, stat 3.9.41  LSEARCH - find the a pattern in a list   * SYNOPSIS: lsearch pattern list   * DESCRIPTION: lsearch returns the index of the first element in list that matches pattern or -1 if none. 0 refers to the first element in the list   * EXAMPLES: Some simple examples: set list D1 D2 D3 D4 D5 # create the list set idx `lsearch D4 $list' # get index of D4 in list echo $idx 3 set idx `lsearch D1 $list' # get index of D1 in list echo $idx 0 set idx `lsearch D8 $list' # get index of D8 in list echo $idx # equivalent to get 4 $list -1 3.9.42  LSUBSTR - replace a character in all elements of a list   * SYNOPSIS: lsubstr find_char replacement_char list   * DESCRIPTION: lsubstr returns a list replacing every find_ch character found in any element of the list with the replacement_char character. replacement_char can be NULL which effectively deletes all find_char characters in the list.   * EXAMPLES: Some simple examples: set list D1 D2 D3 D4 D5 # create the list set subl `lsubstr D x $list' # replace all D's with x's echo $subl x1 x2 x3 x4 x5 set subl `lsubstr D {} $list' # delete all D's echo $subl 1 2 3 4 5 set list -L -16 # create a list with embedded braces set subl `lsubstr {} $list' # delete all open braces set subl `lsubstr {} $subl' # delete all close braces echo $subl -L 16 3.9.43  MEM - memory mapped file (system)   * SYNOPSIS: bind -k {mem first last [ r ]} bind_point   * DESCRIPTION: mem allows machine memory to be accessed as a single K9 file (rather than a file system). The host system's memory is used starting at the first memory location up to and including the last memory location. Both first and last need to be given in hexadecimal. If successful the mem file will appear in the K9 namespace at the bind_point. The stat command will show it as a normal file with the appropriate size (i.e. last - first + 1). If the optional "r" is given then only read-only access to the file is permitted. In a target environment mem can usefully associate battery backed-up RAM (or ROM) with the K9 namespace. In a Unix environment it is of limited use (see unixfd instead). In a DOS environment it may be useful to access memory directly (IO space) but for accessing the DOS console see doscon. When mem is associated with the partition of Flash RAM that stores the husky scripts, which is stored compressed, reading from that page will automatically decompress and return the data as it is read. When mem is associated with the writable partitions of Flash RAM (configuration partition, husky script partition and main binary partition) a write to the start of any partition will erase that partition.   * SEE ALSO: ram   * BUGS: Only a single file rather than a file system can be bound. 3.9.44  MDEBUG - exercise and display statistics about memory allocation   * SYNOPSIS: mdebug [off|on|trace|p|m size|f ptr|c nel elsize|r ptr size]   * DESCRIPTION: mdebug can be used to directly allocate and free memory. mdebug will also print (to standard output) information about the current state of memory allocation. With out any given options a brief five line summary of memory usage is printed, e.g. : raid; mdebug Mdebug is off nreq-nfree=87096-82951=4145(13905745) size=15956672/16150000 waste=1%/2% list=4251/8396 : raid; The first line indicates the debug mode, either off, on or trace. The second line indicates the number times a request for memory is made (to Mmalloc() or Mcalloc() and related functions) and the number of times the memory allocator is called to free memory (via Mfree()). The difference between these first two numbers is the total number of currently allocated blocks of memory, with the number between the '(' and ')' being the total memory requested. Note that the amount of memory actually assign may be more than requested. The third line indicates the amount of memory being managed. The second number is the total memory man aged (i.e. left over after loading the statically allocated text, data and bss space). The first number is that left over after various memory allocation tables have been subtracted out from that afore mention number. The fourth line is the total amount of extra memory assigned to requests in excess of the actual requested memory as compared with the totals on line 3. The fifth line relates to the list of currently allocated memory. The first number is the number of free entries left and the second is the maximum table size. Note that the number of currently allocated blocks (third number on line 2) when added to the first number on line 5 gives the second number on line 5.   * OPTIONS:   + p: Prints the above mentioned five line summary and then the free list.   + P: Prints all the above plus dumps the list of currently allocated memory.   + PP: Prints all the above plus the free bitmap. The above three options can generate copious output and require a detailed knowledge of the source to understand their meaning.   + off: Turns off memory allocation debugging. This is the default condition after booting.   + on: Turns on memory allocation assertion checking.   + trace: Turns on memory allocation assertion checking and traces every memory allocation / deallocation.   + m: Uses Mmalloc() to allocate a block of memory of size bytes.   + f: Uses Mfree() to de-allocate a block of memory addressed by ptr.   + c: Uses Mcalloc() to allocate a contiguous block of memory consisting of nel elements each of size bytes.   + r: Uses Mrealloc() to re-allocate a block of previously allocated memory, ptr, changing the allocated size to be size bytes.   * SEE ALSO: Unix man pages on malloc() 3.9.45  MKDIR - create directory (or directories)   * SYNOPSIS: mkdir [ directory_name ... ]   * DESCRIPTION: mkdir creates the given directory (or directories). If all the given directories can be created then NIL is returned as the status; otherwise the first directory that could not be created is returned (and this command will continue trying to create directories until the list is exhausted). A directory cannot be created with a file name that previously existed in the enclosing directory. 3.9.46  MKDISKFS - script to create a disk filesystem   * SYNOPSIS: mkdiskfs disk_directory_root disk_name   * DESCRIPTION: mkdiskfs is a husky script which is used to perform all the necessary commands to create a disk filesystem given the root of the disk file system and the name of the disk.   * OPTIONS :   + disk_directory_root: Specify the directory root under which the disk filesystems are bound. This is typically /dev/hd.   + disk_name: Specify the name of the disk in the format Dc.s.l where c is the channel, s is the scsi id (or rank) and l is the scsi lun of the disk. After parsing it's arguments mkdiskfs creates the disk filesystem's bind point and binds in the disk at that point. set.   * SEE ALSO: rconf, scsihdfs 3.9.47  MKHOSTFS - script to create a host port filesystem   * SYNOPSIS: mkhostfs controller_number host_port host_bus_directory   * DESCRIPTION: mkhostfs is a husky script which is used to perform all the necessary commands to create a host port filesystem on the given RaidRunner controller given the root of the host port file systems and the host port number.   * OPTIONS:   + controller_number: Specify the controller on which the host port filesystem is to be created.   + host_port: Specify the host port number to create the filesystem for.   + host_bus_directory: Specify the directory root under which host filesystems are bound. This is typically /dev/hostbus. After parsing it's arguments mkhostfs finds out what SCSI ID the host port is to present (see hconf and then binds in the host filesystem. set.   * SEE ALSO: hconf, scsihpfs 3.9.48  MKRAID - script to create a raid given a line of output of rconf   * SYNOPSIS: mkraid `rconf -list RaidSetName'   * DESCRIPTION: mkraid is a husky script which is used to perform all the necessary commands to create and enable host access to a given Raid Set. The arguments to mkraid is a line of output from a rconf -list command. After parsing it's arguments mkraid checks to see if a reconstruction was being performed when the RaidRunner was last operating, and if so, notes this. It then creates the raid filesystem (see mkraidfs) and adds a cache frontend to the raid filesystem. It then creates the required host filesystems (see mkhsotfs) and finally, if a reconstruction had been taking place when the RaidRunner was last operating, it restarts a reconstruction.   * NOTE: This husky script DOES NOT enable target access (stargd) to the raid set it creates.   * SEE ALSO: rconf, mkraidfs, mkhostfs 3.9.49  MKRAIDFS - script to create a raid filesystem   * SYNOPSIS: mkraidfs -r raidtype -n raidname -b backends [-c chunk] [-i iomode] [-q qlen] [-v] [-C capacity] [-S]   * DESCRIPTION: mkraidfs is a husky script which is used to perform all the necessary commands to create a Raid filesystem.   * OPTIONS:   + -r raidtype: Specify the raid type as raidtype for the raid set. Must be 0, 1, 3 or 5.   + -n raidname: Specify the name of the raid set as raidname.   + -b backends: Specify the comma separated list of the raid set's backends in the format used by rconf.   + -c iosize: Optionally specify the IOSIZE (in bytes) of the raid set.   + -i iomode: Optionally specify the raid set's iomode - read-write, read-only, write-only.   + -q qlen: Optionally specify the raid set's queue length for each backend.   + -v: Enable verbose mode which prints out the main actions (binding, engage commands) as they are performed.   + -C capacity: Optionally specify the raid set's size in 512-byte blocks.   + -S: Optionally specify that spares pool access is required should a backend fail. After parsing it's arguments mkraidfs creates the Raid Set's backend filesystems, typically, disks (see mkdisfs) taking care of failed backends. It then binds in the raid filesystem and engages the backends into the filesystem. If spares access is requested, it enables the autorepair feature of the raid set.   * SEE ALSO: rconf, mkraidfs, mkhostfs, mkdiskfs, raid[0135]fs 3.9.50  MKSMON - script to start the scsi monitor daemon smon   * SYNOPSIS: mksmon controllerno hostport scsi_lun protocol_list   * DESCRIPTION: mksmon is a husky script which is used to perform all the necessary commands to start the scsi monitor daemon smon given the controller number, hostport, scsi lun, and the block protocol list. Typically, mksmon, is run with it's arguments from the output of a mconf -list command.   * OPTIONS:   + controllerno: Specify the controller on which the scsi monitor daemon is to be run.   + hostport: Specify the host port through which the scsi monitor daemon communicates.   + scsi_lun: Specify the SCSI LUN the scsi monitor daemon is to respond to.   + protocol_list: Specify the comma separated block protocol list the scsi monitor daemon is to implement. After parsing it's arguments mksmon checks to see if it's already running and issues a message if so and exits. Otherwise, it creates the host filesystem (mkhostfs), creates a memory file and set of fifo's for smon to use and finally starts smon set.   * SEE ALSO: smon, mconf, mkhostfs, fifofs 3.9.51  MKSTARGD - script to initialize a scsi target daemon for a given raid set   * SYNOPSIS: mkstargd `rconf -list raidname'   * DESCRIPTION: mkstargd is a husky script which is used to perform all the necessary commands to start and initialize the scsi target daemon stargd for a given raid set. Typically, mkstargd, is run with it's arguments from the output of a rconf -list command. After parsing it's arguments mkstargd checks to see if it's already running and issues a message if so and exits. Otherwise, it creates the host filesystem (mkhostfs), then starts the scsi target daemon (stargd) for the given raid set. stargd is started in a mode that responds to non-medium access SCSI commands and sets a state of "spinning up". Typically stargd's are started as soon as possible but do not allow medium access commands until the underlying raid sets have been created and then are flagged as "spun up" (via mstargd -o command) which allows medium access.   * SEE ALSO: stargd, rconf, mkhostfs 3.9.52  MSTARGD - monitor for stargd   * SYNOPSIS: mstargd [-a] [-d level] [-h] [-hr] [-hw] [-l] [-m] [-n] [-o offset] [-R] [-s] [-t] [-v] [-W] [-z] [-Z spinstate] [-U] [-irgap nblks] pid   * DESCRIPTION: mstargd modifies the state or prints out information about the stargd daemon with the given pid. If such a process exists and no optional switches are given then the current debug level of that stargd daemon is printed to standard out by this call. mstargd works by looking for file named "/mon/stargd/pid". If it is not found or the pid does not represent an existing process then mstargd exits with an appropriate error message. If it is found then it is assumed to contain a reference into the associated stargd process's state (and statistics) structure. The "reference" for target hardware such as the raid (without memory management) is a memory address. The "reference" for Unix machines could be a shared memory identifier or a memory address (depending on how K9 processes are mapped to Unix processes). mstargd performs its monitoring (or modifying) task then exits immediately. A sanity check is performed on the associated stargd process's state (and statistics) structure before it is used. Operations that take a little time (e.g. "-h", "-s" and "-t") take an internal copy of this state structure. mstargd is designed to have no detrimental effect on a running stargd daemon. Note that the 3 modifying operations (i.e. "-o", "-d" and "-z") have no adverse effect either.   * OPTIONS   + -a: This option has the same effect as giving mstargd the options, "-h -l -m -s -t".   + -d level: This option will change the debug level of the nominated stargd daemon to the given level. The debug levels are:   o 0 = no debug messages   o 1 = debug messages on all non-read/write commands   o 2 = debug messages on all commands   + -h: This option will output a histogram for reads and a separate one for writes. The histogram currently consists of a header line followed by multiple lines. Each line has the number of blocks in its 1st column, the number of invocations in the 2nd column and the cumulative time in the SCSI data phase in the 3rd column. Only lines with non-zero invocations are output. Block number 257 (if present) will be the last line and records all commands requesting 257 or more blocks.   + -hr: This option only prints out the histogram for reads.   + -hw: This option only prints out the histogram for writes.   + -l: This option prints out the stargd read lookahead statistics.   + -m: This option prints out the moniker (i.e name) of the raid set indicated by the given pid.   + -n: This option toggles the collection of statistics by the indicated stargd process.   + -o offset: This option informs the daemon that reads and writes into it's store are to be offset by the given number of blocks. The default value for offset is 0. The given offset must be in the range 0 to (2**32 - 1). The typical block size is 512 bytes. To simplify writing out large numbers certain suffixes can be used, see suffix. Additionally, this option informs the daemon that it's store is now available for access by setting it's spin state to 1. See "-Z" option below.   + -R: This option sets the write-protect flag which will result in any write command issued to the stargd process (as indicated by the given pid) to return a check condition with the sense key set to "DATA PROTECT" and the additional sense key set to "WRITE PROTECT".   + -s: This option prints the current state of the SCSI command state machine.   + -t: This option prints a row each for read(6), read(10), write(6), write(10) and others. These 5 rows represent a categorization of all incoming SCSI commands. Each row contains the number of invocations and the number of errors detected. Errors are divided into 2 categories: type 1 for situations when a "Check Condition" status is returned, and type 2 when some other failed status is returned (e.g. "Command terminated" or "Reservation conflict").   + -v: This option prints out the histogram (-h, -hw, -hr) and SCSI command­ summary (-t) in vector (or line) form.   + -U: This option clears any SCSI-2 reservations set on the scsi target specified by the given pid. WARNING this clearance "controller system wide".   + -W: This option clears the write-protect flag which enables write commands issued to the stargd process (as indicated by the given pid) to write data.   + -z: This option zeroes the internal tables used by the histograms.   + -irgap nblks: Specify the inter-read gap, nblks, (in blocks). When sequential reads arrive from a host there may be a small gap between successive reads. Normally the lookahead algorithm will ignore these gaps providing they are no larger than the average length of the group of sequential reads that have occurred. By specifying this value, you can increase this gap.   + -Z spinstate: This option will change the spin state of the nominated stargd daemon to the given spinstate. The spin states are (State, Description, ASC,ASCQ):   o 0, LOGICAL UNIT IS IN PROCESS OF BECOMING READY, 0x04, 0x01   o 1, Logical unit is ready - medium access commands allowed, -, -   o 2, LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED, 0x04, 0x03   o 3, LOGICAL UNIT HAS NOT SELF-CONFIGURED YET, 0x3e, 0x00   o 4, LOGICAL UNIT HAS FAILED SELF-CONFIGURATION, 0x4c, 0x00   o stargd's spin state is used to describe the condition of the tar­ get whilst is is NOT READY. When stargd is not ready to accept SCSI-2 medium access commands it returns a CHECK CONDITION status to all medium access commands, sets the mode sense key to NOT READY and the additional mode sense code and code qualifier to values depending in the spin state value. Typically, when a RaidRunner boots it requires time to create the configured Raid Sets, although it needs to start the scsi target daemons (stargd) as soon as possible. It does start all the required stargds and set's their spin state to 0. Once the raid sets have been built and are linked to their stargds, the spin state is set to 1 meaning it will allow and process SCSI-2 medium access commands. (See the "-o" option above). The additional spin states of 2, 3 and 4 can be used by systems with intelligent SCSI drivers in high availability environments.   * SEE ALSO: stargd 3.9.53  NICE - Change the K9 run-queue priority of a K9 process   * SYNOPSIS: nice pid priority   * DESCRIPTION: nice will change the run-queue priority of the process given by pid to priority.   * OPTIONS:   + pid: This is the process identifier of the K9 process whose run-queue priority is to change.   + priority: This the the priority to set. Priorities range from 0 (lowest) to 9 (highest).   * SEE ALSO: K9setpriority 3.9.54  NULL- file to throw away output in   * SYNOPSIS: bind -k null bind_point   * DESCRIPTION: null is a special file that when written to is a infinite sink of data (i.e. anything can be written to it and it will be disposed of quickly). When null is read it is an infinite source of end-of-files. The null file will appear in the K9 names- pace at the bind_point.   * EXAMPLE Husky installs a null special file as follows: bind null /dev/null   * SEE ALSO zero, log 3.9.55  PARACC - display information about hardware parity accelerator   * SYNOPSIS: paracc   * DESCRIPTION: paracc will print (to standard output) information about the hardware parity accelerator, if installed. The main output line of interest to all except those debugging the RaidRunner is the first line which displays, in bytes, the size of the memory on the hardward parity accelerator. IE Parity Memory available : 1048576 PAccLock@0xBF368{own=-1,cnt=1,pvt=0x0,nwait=0,name="PAc"} Request failures: 0, Max Usage: 2, Alloc: 1, Free: 31 have_paracc is 1. Req Fails: 0 0 0 0 0 0 0 0 0 0 All other lines are only informative for debugging purposes. If there is no accelerator present, then the Parity Memory available will be 0. 3.9.56  PEDIT - Display/modify SCSI backend Mode Parameters Pages   * SYNOPSIS:   + pedit page_code c.s.l   + pedit page_code c.s.l byte_modifier_list   * DESCRIPTION: pedit will either report the SCSI pages of mode parameters for a given page - page_code on a given SCSI backend device c.s.l and or allow you to change the mode parameters. In it's first form, pedit, for a given page code, will print five lines. The first is a header for ease of reading, the second will be the DEFAULT mode parameters, the second will be CHANGEABLE bitmask values, the third will be the CURRENT mode parameters and the last will be the SAVED mode parameters. In it's second form, pedit, for a given page code and device, will print the page codes but also will apply the byte_modifier_list to either the CURRENT or SAVED mode parameters. The supported SCSI pages are 0x1 ERROR RECOVERY page 0x2 DISCONNECT page 0x3 FORMAT page 0x4 GEOMETRY page 0x8 CACHE page 0xc NOTCH page 0xa CONTROL page   * OPTIONS   + page_code: Specify the SCSI Page Code in hex.   + c.s.l: Identify the disk device to select by specifying it's channel, SCSI ID (rank) and SCSI LUN provided in the format "c.s.l"   + byte_modifier_list: The byte_modifier_list is of the form C|Sbyte_no:set_val: clr_val,byte_no:set_val:clr_val,... where the C or S prefix specifies whether you want to change the CURRENT or SAVED mode pages respectively. This prefix is followed by a comma separated list of byte modifiers in the form byte_no:set_val: clr_val where byte_no is the byte number to change, set_val is a mask of which bits within that byte to SET (to 1) and clr_val is a mask of which bits within that byte to CLEAR (to 0).   * SEE ALSO: The mode sense page references in the relevant product manual for the disks used in the RAID. 3.9.57  PIPE - two way interprocess communication   * SYNOPSIS:   + bind -k pipe bind_point   + cat bind_point   + bind_point/data   + bind_point/ctl   + bind_point/data1   + bind_point/ctl1   * DESCRIPTION: pipe file system associates a one level directory with the bind_point in the K9 namespace. This device allocates two streams which are joined at the device end. bind_point/data and bind_point/ctl are the data and control channels for one stream while bind_point/data1 and bind_point/ctl1 are the data and control channels for the other stream. Data written to one channel is available for reading at the other. Write boundaries are preserved: each read terminates when the read buffer is full or after reading the last byte of a write, whichever comes first. 3.9.58  PRANKS - print or set the accessible backend ranks for the current controller   * SYNOPSIS   + pranks   + pranks rank1 [rank2 ... rankn]   * DESCRIPTION: pranks, without any arguments will print which backend ranks of devices the controller, on which the command is executed, can access. If a rank is not accessible, then a "-1" is printed in place of the rank number (i.e scsi id of that rank). If you wish to set which rank id's a controller is allowed to access then execute this command with those rank id's as it's arguments. When you execute this command to set access to certain (or all) ranks, then the access restrictions (if any) are effective immediately. Additionally, the GLOBAL environment variable BackendRanks is either defined or modified and when you next boot the RaidRunner, the settings you have just created will be set again automatically.   * EXAMPLE: Assume you have a RaidRunner system with four ranks of backend devices (rank 1, 2, 3 and 4) but you want to restrict the controller's access to the first two as you don't have any devices installed in the third and forth ranks. You would execute pranks 1 2   * SEE ALSO: environ, sranks 3.9.59  PRINTENV - print one or all GLOBAL environment variables   * SYNOPSIS   + printenv   + printenv name [name ...]   * DESCRIPTION: printenv prints the value (or list of values) associated with the GLOBAL environment variable name. If multiple GLOBAL environment variables are given, each value is printed on a line of it's own. When a GLOBAL environment variable is a list of values, each value is printed on a line of it's own. When printenv is called with no arguments, all GLOBAL environment variables and their associated value(s) are printed, one per line. If a value is a list then each element of the list is separated with the vertical bar () character.   * NOTE: A GLOBAL environment variable is one which is stored in a non volatile area on the RaidRunner and hence is available between successive power cycles or reboots. These variables ARE NOT the same as husky environment variables. The non volatile area is co-located with the RaidRunner Configuration area. If the given variable name is not a GLOBAL environment variable nothing is printed and no error status is set.   * SEE ALSO: setenv, unsetenv, rconf   * PROC - the process file system (device)   * SYNOPSIS:   + bind -k proc bind_point   + cat bind_point   + cat bind_point/0   + cat bind_point/1   + ...   + cat bind_point/N   + signal   + sigpgrp   + status   * DESCRIPTION: The proc device creates a two level directory below its bind_point. The first level entries are the numbers of the processes that are known currently to K9. The second level entries are the filenames: "signal", "sigpgrp" and "status". It is an error to read from the second level directory files "signal" and "sigpgrp". When "status" is read it returns an ASCII string containing the following fields:   + process name   + process identifier (i.e. its pid)   + this process's parent's pid   + pid of process group leader   + process state   + process priority   + maximum stack utilization as % of available stack   + milliseconds of CPU time registered   + semaphore id (if any) this process is waiting on These fields have an appropriate number of spaces between them so they look "reasonable" when output by ps. The process states are listed below:   + I interrupted   + R currently running (or has yielded control)   + S stirred (signaled while waiting)   + W waiting on a semaphore   + Z terminated and parent not waiting It is an error to write to the second level directory file "status". A signal number or signal name (see kill for a list of signal names - which cannot be abbreviated in this case) may be written to the file "signal". This action will send a signal to the associated process. If a signal number or name is written to the file "sigpgrp" then all the processes in the process group which this process belongs to will receive the signal.   * SEE ALSO: ps, kill 3.9.60  PS - report process status   * SYNOPSIS: ps   * DESCRIPTION: ps prints information about all running K9 processes on standard out. The information output includes the process name, its process identification number (PID), its parent's PID, process group, process state, the maximum percentage utilization of its stack and the milliseconds of CPU time its has used. The process states are listed below:   + I interrupted   + R currently running (or has yielded control)   + S stirred (signaled while waiting)   + W waiting on a semaphore   + Z terminated and parent not waiting   * Currently ps is implemented as a K9 Husky script (rather than a built in command). The script source can be found in the file "/bin/ps". The script utilizes the file system proc.   * EXAMPLE: : raid; ps NAME____________________PID__PPID__PGRP_S_P_ST%_TIME(ms)__SEMAPHORE+name hyena                   0      0     0 R 9  18 385930     deadbeef init                    1      0     1 W 0   9 90         8009b1a8   pau SCN2681_reader          4      1     4 W 0   0 0          800702a4   2rd SCN2681_writer          5      1     5 W 0   0 0          8007029c   2wr SCN2681_putter          6      1     6 W 0   0 0          800702ac   2tp DIO_R_drive3_q0        391     1   391 W 0   4 40120      8021a828   Ard DIO_R_drive0_q0        397     1   397 W 0   4 13420      8007ac64   Ard DIO_R_drive1_q0        404     1   404 W 0   5 25570      8007b224   Ard husky                  28      1     1 W 0  10 50         8013a138   pau cache_flusher          424     1   424 W 0  23 17700      8030c2c4   Cfr CIO_R_q0               426     1   426 W 0  96 2320       8030d6f4   Ard CIO_R_q1               427   426   426 W 0  96 2420       8030d6f4   Ard CIO_R_q2               428   426   426 W 0  96 2410       8030d6f4   Ard CIO_R_q3               429   426   426 W 0  96 2430       8030d6f4   Ard CIO_R_q4               430   426   426 W 0  96 2240       8030d6f4   Ard CIO_R_q5               431   426   426 W 0  96 2130       80c37540   Ard CIO_R_q6               432   426   426 W 0  96 2300       8030d6f4   Ard CIO_R_q7               433   426   426 W 0  96 2180       8030d6f4   Ard smon                    65     1     1 W 0   5 30         8008d5e4   Nsl DIO_R_drive2_q0        326     1   326 W 0   5 27680      8007b7e4   Ard /bin/ps                871    28     1 W 0   8 40         80cfd020   pau stargd                 107     1     1 R 0  48 23990      8007a648   Nsl starg_107_L_R          119   107   119 W 0   0 0          8018c608   pau   * The fields are: process name, process identifier (i.e. its pid), this process's parent's pid, pid of process group leader, process state, process priority, maximum stack utilization as % of available stack, milliseconds of CPU time registered, semaphore id (if any) this process is waiting on along with the internal name of the semaphore. If a process is waiting on a semaphore then the last number is the address of the number it is waiting on.   * SEE ALSO: proc 3.9.61  PSCSIRES - print SCSI-2 reservation table for all or specific monikers   * SYNOPSIS:   + pscsires   + pscsires moniker   * DESCRIPTION: pscsires looks up the Global SCSI-2 Reservation table for a given moniker (see smon or stargd) and prints is current SCSI-2 reservation state. When no moniker is given, all entries in the Global SCSI-2 Reservation table are printed. For each table entry to be displayed, the moniker and four integers are printed - the controller number, host port number, reserved scsi id and reservor scsi id. The combination of controller number, host port number, and reservor scsi id can uniquely identify which host system issued the SCSI-2 Reserve command. The reserved scsi id field is used when a Third-Party reservation has been requested by the host system identified by the other three integers. If all four integers are -1, then no scsi target daemon (smon or stargd) on any controller or host port has reserved the moniker.   * SEE ALSO: smon, stargd, mstargd, SCSI-2 Reserve and Release Command Documentation 3.9.62  PSTATUS - print the values of hardware status registers   * SYNOPSIS: pstatus   * DESCRIPTION: pstatus will print the names and values of various hardware status registers. Each value is printed on a lineof it's own. Typical hardware status registers are BCDSW_0, BCDSW_1. Values of BCD host port SCSI ID selector switches   + FANS: Value of Fan status register   + AC_PWR: Value of AC Power supply status   + DC_PWR: Value of DC Power supply status   * NOTE: Not all RaidRunner models support the same status registers, so consult you RaidRunner model's hardware reference manual to see which are supported and what their values imply. 3.9.63  RAIDACTION- script to gather/reset stats or stop/start a raid set's stargd   * SYNOPSIS: raidaction raidname up|down|getstats|getastats|zerostats   * DESCRIPTION: raidaction is a husky script which is used to either start or stop a raid set's scsi target daemon(s) (stargd) or gather/reset statistics about the raid set.   * OPTIONS :   + raidset: Specify the raid set to perform the action on.   + up: Start all the scsi target daemons associated with this raid set.   + down: Stop all the scsi target daemons associated with this raid set.   + getstats: Print the current statistics stored for the given raid set. The first line of output is prefixed with the string "RAIDSET:" and is the output of the stats command with arguments -r raidname -g. The second line of output is prefixed with the string "CACHE:" and is the output of the stats command with arguments -c raidname -g. The next line(s) are prefixed with the string "STARG: c.h.l" and is the output of the mstargd command with arguments -d 0 -v -h -H stargd_pid which stargd_pid is the process id of the scsi target daemon on controller c, host port h with scsi lun l. A line of this type is printed for each scsi target daemon belonging to this raid set.   + getastats: Print the current statistics stored for the given raid set then zero all the stored statistics. By zeroing the stored statistics one can, through repeated timed calls to this code, form an average based on the gathered statistics. The output is the same as for the getstats option.   + zerostats: Zero all statistics stored for the given raid set.   * SEE ALSO: stats, mstargd 3.9.64  RAID0 - raid 0 device   * SYNOPSIS: bind -k {raid0 nbackends} bind_point echo moniker name=raid_set_name > bind_point/ctl echo engage drive=driveNum qlen=queueLen fd=aFdNum blksize=blockSize name =backendname <>[aFdNum] backEnd > bind_point/ctl echo disengage drive=driveNum > bind_point/ctl echo access drive=driveNum read-write > bind_point/ctl echo access drive=driveNum read-only > bind_point/ctl echo access drive=driveNum write-only > bind_point/ctl echo access drive=driveNum offline > bind_point/ctl cat bind_point ctl data repair stats   * DESCRIPTION: raid0 implements a raid 0 device. It has 1 "frontend" (i.e. bind_point/data) and typically multiple "backends" (i.e. one defined by each "engage" message with a new drive number). To associate an internal name (or moniker) with the raid device, send the message "moniker name= internal_name" to the device's control file, bind_point/ctl. This implementation of raid 0 uses nbackends files in its backend. Read and write operations to the frontend (i.e. bind_point/data) must be in integral units of blockSize. Each write of blockSize bytes is written on 1 backend file. The backend "files" referred to here will typically be disks. The name argument allows associates the given backendname string with the appropriate backend. This string will be used in reporting errors on the running raid. The queueLen argument must be 1 or greater and sets the maximum number of requests that can be put in a queue associated with each backend file. A daemon is spawned for each backend file to service this queue called async_io. Each backend file first needs to be identified to the raid0 device via the "engage" string sent to bind_point/ctl. If required a file can have its association with this device terminated with a "disengage" string. Once a backend file is engaged its access level can be varied between "read-write", "read-only", "write-only" and "offline" as required. The default is "offline" so in most initialization situations an "access read-write" string needs to be sent to this device. When the file bind_point/ctl is read then a line is output for every engaged backend file indicating its access status (e.g. "drive 3: engaged, read-write"). Also backend files that have been disengaged and not "re-"engaged output a line (e.g. "drive 5: disengaged"). When the file bind_point/stats is read then a line is output which shows the cumulative number of reads and writes performed (including failures) for each backend of the raid device. The format of this line is D0 r0_cnt r0_fails w0_cnt w0_fails; D1 r1_cnt r1_fails w1_cnt w1_fails; ... which indicates that backend 0 (typically the drive0) has made r0_cnt reads, w0_cnt writes, r0_fails read failures and w0_fails write failures and that backend 1 (drive 1) has made r1_cnt reads, w1_cnt writes, r1_fails read failures and w1_fails write failures and so forth for each backend in the raid set. If the string "zerostats" is written to the file bind_point/stats then all cumulative read and write counts for each backend of the raid set are zeroed.   * EXAMPLE: > /raid0 bind -k {raid0 6} /raid0 echo moniker name=R_0 > /raid0/ctl echo engage drive=0 qlen=8 fd=7 blksize=8192 name=D0 <>[7] /d0/data > /raid0/ctl echo access drive=0 read-write > /raid0/ctl ... echo engage drive=5 qlen=8 fd=7 blksize=8192 name=D5 <>[7] /d5/data > /raid0/ctl echo access drive=5 read-write > /raid0/ctl This example creates the file "/raid0" as a bind point and then binds the raid0 device on it. The first echo command establishes the internal raid device name as R_0. The subsequent echo commands are shown in pairs for each backend file: one sending an "engage" string and the other sending an "access" string to the file "/raid0/ctl". Each "engage" string associates a backend file (via file descriptor 7) with a block size of 8192 bytes and a maximum queue length of 8. The following "access" string adjusts the access level of the backend file from "offline" (the default) to "read-write". This is a six disk raid set.   * NOTES: The size of the resultant raid set will be the size of the smallest backend multiplied by the number of data backends adjusted downwards to align to be a multiple of the raid set's blocksize (blockSize).   * SEE ALSO: raid1, raid3, raid4, raid5 3.9.65  RAID1 - raid 1 device   * SYNOPSIS: bind -k raid1 bind_point echo moniker name=raid_set_name > bind_point/ctl echo engage drive=driveNum qlen=queueLen fd=aFdNum blksize=blockSize name =backendname <>[aFdNum] backEnd > bind_point/ctl echo disengage drive=driveNum > bind_point/ctl echo access drive=driveNum read-write > bind_point/ctl echo access drive=driveNum read-only > bind_point/ctl echo access drive=driveNum write-only > bind_point/ctl echo access drive=driveNum offline > bind_point/ctl cat bind_point ctl data repair stats   * DESCRIPTION: raid1 implements a raid 1 device. Raid 1 is also known as "mirroring". It has 1 "frontend" (i.e. bind_point/data) and 2 "backends" (i.e. one defined by each "engage" message with a new drive number). To associate an internal name (or moniker) with the raid device, send the message "moniker name=internal_name" to the device's control file, bind_point/ctl. Read and write operations to the frontend (i.e. bind_point/data) must be in integral units of blockSize. Each write of blockSize bytes is written on both backend files. A read of blockSize bytes needs only to read 1 backend file (unless there is a problem). The backend file chosen to do the read is the one calculated to have its heads closer to the required block. The backend "files" referred to here will typically be disks. The "logical" block size is currently 512 bytes and the given blockSize must be a power of 2 times 512 (i.e. 2**n * 512 bytes). If, for example, the blockSize was 8 Kb then a write of 8 Kb would cause both backend files to have that 8 Kb written to them. An 8 Kb read would cause the file calculated to have its "heads" closer to be read. If this file was marked "offline", "write-only" or reported an IO error then the other file would be read. The queueLen argument must be 1 or greater and sets the maximum number of requests that can be put in a queue associated with each backend file. A daemon is spawned for each backend file to service this queue called async_io. The name argument allows associates the given backendname string with the appropriate backend. This string will be used in reporting errors on the running raid. Each backend file first needs to be identified to the raid1 device via the "engage" string sent to bind_point/ctl. If required a file can have its association with this device terminated with a "disengage" string. Once a backend file is engaged its access level can be varied between "read-write", "read-only", "write-only" and "offline" as required. The default is "offline" so in most initialization situations an "access read-write" string needs to be sent to this device. When the file bind_point/ctl is read then a line is output for every engaged backend file indicating its access status (e.g. "drive 3: engaged, read-write"). Also backend files that have been disengaged and not "re-"engaged output a line (e.g. "drive 5: disengaged"). When the file bind_point/stats is read then a line is output which shows the cumulative number of reads and writes performed (including failures) for each backend of the raid device. The format of this line is: D0 r0_cnt r0_fails w0_cnt w0_fails; D1 r1_cnt r1_fails w1_cnt w1_fails; which indicates that backend 0 (typically the drive0) has made r0_cnt reads, w0_cnt writes, r0_fails read failures and w0_fails write failures and that backend 1 (drive 1) has made r1_cnt reads, w1_cnt writes, r1_fails read failures and w1_fails write failures. If the string "zerostats" is written to the file bind_point/stats then all cumulative read and write counts for each backend of the raid set are zeroed.   * EXAMPLE > /raid1 bind -k raid1 /raid1 echo moniker name=R_1 > /raid1/ctl echo engage drive=0 qlen=8 fd=7 blksize=8192 name=D1 <>[7] /d0/data > /raid1/ctl echo access drive=0 read-write > /raid1/ctl echo engage drive=1 qlen=8 fd=7 blksize=8192 name=D5 <>[7] /d5/data > /raid1/ctl echo access drive=1 read-write > /raid1/ctl This example creates the file "/raid1" as a bind point and then binds the raid1 device on it. The first echo command establishes the internal raid device name as R_1. The subsequent echo commands are shown in pairs for both backend files: one sending an "engage" string and the other sending an "access" string to the file "/ raid1/ctl". Each "engage" string associates a backend file (via file descriptor 7) with a block size of 8192 bytes and a maximum queue length of 8. The following "access" string adjusts the access level of the backend file from "offline" (the default) to "read-write".   * NOTES: The size of the resultant raid set will be the size of the smallest backend multiplied by the number of data backends (here just 1 as we are a mirror) adjusted downwards to align to be a multiple of the raid set's blocksize (blockSize).   * SEE ALSO: raid0, raid3, raid4, raid5 3.9.66  RAID3 - raid 3 device   * SYNOPSIS bind -k {raid3 nbackends} bind_point echo moniker name=raid_set_name > bind_point/ctl echo engage drive=driveNum qlen=queueLen fd=aFdNum blksize=blockSize name =backendname <>[aFdNum] backEnd > bind_point/ctl echo disengage drive=driveNum > bind_point/ctl echo access drive=driveNum read-write > bind_point/ctl echo access drive=driveNum read-only > bind_point/ctl echo access drive=driveNum write-only > bind_point/ctl echo access drive=driveNum offline > bind_point/ctl cat bind_point ctl data repair stats   * DESCRIPTION: raid3 implements a raid 3 device. It has 1 "frontend" (i.e. bind_point/data) and typically multiple "backends" (i.e. one defined by each "engage" message with a new drive number). To associate an internal name (or moniker) with the raid device, send the message "moniker name=internal_name" to the device's control file, bind_point/ctl. This implementation of raid 3 uses at least 3 files in its backend. Read and write operations to the frontend (i.e. bind_point/data) must be in integral units of blockSize. Each write of blockSize bytes is striped (i.e. divided evenly) across (nbackends - 1) files with the "parity" on the other file. Subsequent writes will NOT rotate the file being used to store parity. [This rotation is a slight extension of the original raid 3 definition.] The backend "files" referred to here will typically be disks. The "logical" block size is currently 512 bytes and the given blockSize must be an integral multiple of (nbackends - 1) * 512 If, for example, the blockSize was 8 Kb and there were 5 backends then a write of 8 Kb would cause 4 backend files to have 2 Kb written on them and the other backend file to have 2 Kb of parity written on it. An 8 Kb read would cause the 4 files known to hold the data (as distinct from the parity) to be read. If any one of these files was marked "offline", "write-only" or reported an IO error then the 5th file containing the parity would be read and the 8 Kb block reconstructed. The queueLen argument must be 1 or greater and sets the maximum number of requests that can be put in a queue associated with each backend file. A daemon is spawned for each backend file to service this queue called async_io. The name argument allows associates the given backendname string with the appropriate backend. This string will be used in reporting errors on the running raid. Each backend file first needs to be identified to the raid3 device via the "engage" string sent to bind_point/ctl. If required a file can have its association with this device terminated with a "disengage" string. Once a backend file is engaged its access level can be varied between "read-write", "read-only", "write-only" and "offline" as required. The default is "offline" so in most initialization situations an "access read-write" string needs to be sent to this device. When the file bind_point/ctl is read then a line is output for every engaged backend file indicating its access status (e.g. "drive 3: engaged, read-write"). Also backend files that have been disengaged and not "re-"engaged output a line (e.g. "drive 5: disengaged"). When the file bind_point/stats is read then a line is output which shows the cumulative number of reads and writes performed (including failures) for each backend of the raid device. The format of this line is D0 r0_cnt r0_fails w0_cnt w0_fails; D1 r1_cnt r1_fails w1_cnt w1_fails; ... which indicates that backend 0 (typically the drive0) has made r0_cnt reads, w0_cnt writes, r0_fails read failures and w0_fails write failures and that backend 1 (drive 1) has made r1_cnt reads, w1_cnt writes, r1_fails read failures and w1_fails write failures and so forth for each backend in the raid set. If the string "zerostats" is written to the file bind_point/stats then all cumulative read and write counts for each backend of the raid set are zeroed.   * EXAMPLE: > /raid3 bind -k {raid3 5} /raid3 echo moniker name=R_3 > /raid3/ctl echo engage drive=0 qlen=8 fd=7 blksize=8192 name=D0 <>[7] /d0/data > /raid3/ctl echo access drive=0 read-write > /raid3/ctl ... echo engage drive=5 qlen=8 fd=7 blksize=8192 name=D5 <>[7] /d5/data > /raid3/ctl echo access drive=5 read-write > /raid3/ctl This example creates the file "/raid3" as a bind point and then binds the raid3 device on it. The first echo command establishes the internal raid device name as R_3. The subsequent echo commands are shown in pairs for each backend file: one sending an "engage" string and the other sending an "access" string to the file "/raid3/ctl". Each "engage" string associates a backend file (via file descriptor 7) with a block size of 8192 bytes and a maximum queue length of 8. The following "access" string adjusts the access level of the backend file from "offline" (the default) to "read-write". This is a 6 disk raid set.   * NOTES: The size of the resultant raid set will be the size of the smallest backend multiplied by the number of data backends adjusted downwards to align to be a multiple of the raid set's blocksize (blockSize).   * SEE ALSO: raid0, raid1, raid4, raid5 3.9.67  RAID4 - raid 4 device   * SYNOPSIS: bind -k {raid4 nbackends} bind_point echo moniker name=raid_set_name > bind_point/ctl echo engage drive=driveNum qlen=queueLen fd=aFdNum blksize=blockSize name =backendname <>[aFdNum] backEnd > bind_point/ctl echo disengage drive=driveNum > bind_point/ctl echo access drive=driveNum read-write > bind_point/ctl echo access drive=driveNum read-only > bind_point/ctl echo access drive=driveNum write-only > bind_point/ctl echo access drive=driveNum offline > bind_point/ctl cat bind_point ctl data repair stats   * DESCRIPTION: raid4 implements a raid 4 device. It has 1 "frontend" (i.e. bind_point/data) and typically multiple "backends" (i.e. one defined by each "engage" message with a new drive number). To associate an internal name (or moniker) with the raid device, send the message "moniker name= internal_name" to the device's control file, bind_point/ctl. This implementation of raid 4 uses at least 3 files in its backend. Read and write operations to the frontend (i.e. bind_point/data) must be in integral units of blockSize. Each write of blockSize bytes is written on 1 backend file. Its neighboring (nbackends - 2) files need to be read at the same offset to calculate a new parity block which needs to be re- written. The nbackends blocks at the same offset on the nbackends backend files are called a slice. The parity block is, like in raid3, fixed as the last backend. A read of blockSize bytes needs only to read 1 backend file (unless there is a problem). The backend "files" referred to here will typically be disks. The "logical" block size is currently 512 bytes and the given blockSize must be an integral multiple of (nbackends - 1) * 512 If, for example, the blockSize was 8 Kb then a write of 8 Kb would cause 1 backend file to have that 8 Kb written to it with the other (nbackends - 2) non-parity files in that slice having 8 Kb read from them in order to generate a new 8 Kb parity block which is then written to the parity file in this slice. An 8 Kb read would cause the file known to hold the data (as distinct from the parity) to be read. If this file was marked "offline", "write-only" or reported an IO error then the other ((nbackends - 1) files in the slice (i.e. (nbackends - 2) data and 1 parity) would be read and the 8 Kb block reconstructed. The queueLen argument must be 1 or greater and sets the maximum number of requests that can be put in a queue associated with each backend file. A daemon is spawned for each backend file to service this queue called async_io. The name argument allows associates the given backendname string with the appropriate backend. This string will be used in reporting errors on the running raid. Each backend file first needs to be identified to the raid5 device via the "engage" string sent to bind_point/ctl. If required a file can have its association with this device terminated with a "disengage" string. Once a backend file is engaged its access level can be varied between "read-write", "read-only", "write-only" and "offline" as required. The default is "offline" so in most initialization situations an "access read-write" string needs to be sent to this device. When the file bind_point/ctl is read then a line is output for every engaged backend file indicating its access status (e.g. "drive 3: engaged, read-write"). Also backend files that have been disengaged and not "re-"engaged output a line (e.g. "drive 5: disengaged"). When the file bind_point/stats is read then a line is output which shows the cumulative number of reads and writes performed (including failures) for each backend of the raid device. The format of this line is D0 r0_cnt r0_fails w0_cnt w0_fails; D1 r1_cnt r1_fails w1_cnt w1_fails; ... which indicates that backend 0 (typically the drive0) has made r0_cnt reads, w0_cnt writes, r0_fails read failures and w0_fails write failures and that backend 1 (drive 1) has made r1_cnt reads, w1_cnt writes, r1_fails read failures and w1_fails write failures and so forth for each backend in the raid set. If the string "zerostats" is written to the file bind_point/stats then all cumulative read and write counts for each backend of the raid set are zeroed.   * EXAMPLE > /raid4 bind -k {raid4 5} /raid4 echo moniker name=R_4 > /raid4/ctl echo engage drive=0 qlen=8 fd=7 blksize=8192 name=D0 <>[7] /d0/data > /raid4/ctl echo access drive=0 read-write > /raid4/ctl ... echo engage drive=5 qlen=8 fd=7 blksize=8192 name=D5 <>[7] /d5/data > /raid4/ctl echo access drive=5 read-write > /raid4/ctl This example creates the file "/raid4" as a bind point and then binds the raid4 device on it. The first echo command establishes the internal raid device name as R_4. The subsequent echo commands are shown in pairs for each backend file: one sending an "engage" string and the other sending an "access" string to the file "/raid4/ctl". Each "engage" string associates a backend file (via file descriptor 7) with a block size of 8192 bytes and a maximum queue length of 8. The following "access" string adjusts the access level of the backend file from "offline" (the default) to "read-write". This is a six disk raid set.   * NOTES: The size of the resultant raid set will be the size of the smallest backend multiplied by the number of data backends adjusted downwards to align to be a multiple of the raid set's blocksize (blockSize).   * SEE ALSO: raid0, raid1, raid3, raid5 3.9.68  RAID5 - raid 5 device   * SYNOPSIS bind -k {raid5 nbackends} bind_point echo moniker name=raid_set_name > bind_point/ctl echo engage drive=driveNum qlen=queueLen fd=aFdNum blksize=blockSize name =backendname <>[aFdNum] backEnd > bind_point/ctl echo disengage drive=driveNum > bind_point/ctl echo access drive=driveNum read-write > bind_point/ctl echo access drive=driveNum read-only > bind_point/ctl echo access drive=driveNum write-only > bind_point/ctl echo access drive=driveNum offline > bind_point/ctl cat bind_point ctl data repair stats   * DESCRIPTION: raid5 implements a raid 5 device. It has 1 "frontend" (i.e. bind_point/data) and typically multiple "backends" (i.e. one defined by each "engage" message with a new drive number). To associate an internal name (or moniker) with the raid device, send the message "moniker name=internal_name" to the device's control file, bind_point/ctl. This implementation of raid 5 uses at least 3 files in its backend. Read and write operations to the frontend (i.e. bind_point/data) must be in integral units of blockSize. Each write of blockSize bytes is written on 1 backend file. Its neighboring (nbackends - 2) files need to be read at the same offset to calculate a new parity block which needs to be re-written. The nbackends blocks at the same offset on the nbackends backend files are called a slice. The parity block is rotated from one slice to the next. A read of blockSize bytes needs only to read 1 backend file (unless there is a problem). The backend "files" referred to here will typically be disks. The "logical" block size is currently 512 bytes and the given blockSize must be an integral multiple of (nbackends - 1) * 512 If, for example, the blockSize was 8 Kb then a write of 8 Kb would cause 1 backend file to have that 8 Kb written to it with the other (nbackends - 2) non-parity files in that slice having 8 Kb read from them in order to generate a new 8 Kb parity block which is then written to the parity file in this slice. An 8 Kb read would cause the file known to hold the data (as distinct from the parity) to be read. If this file was marked "offline", "write-only" or reported an IO error then the other ((nbackends - 1) files in the slice (i.e. (nbackends - 2) data and 1 parity) would be read and the 8 Kb block reconstructed. The queueLen argument must be 1 or greater and sets the maximum number of requests that can be put in a queue associated with each backend file. A daemon is spawned for each backend file to service this queue called async_io. The name argument allows associates the given backendname string with the appropriate backend. This string will be used in reporting errors on the running raid. Each backend file first needs to be identified to the raid5 device via the "engage" string sent to bind_point/ctl. If required a file can have its association with this device terminated with a "disengage" string. Once a backend file is engaged its access level can be varied between "read-write", "read- only", "write-only" and "offline" as required. The default is "offline" so in most initialization situations an "access read-write" string needs to be sent to this device. When the file bind_point/ctl is read then a line is output for every engaged backend file indicating its access status (e.g. "drive 3: engaged, read-write"). Also backend files that have been disengaged and not "re-"engaged output a line (e.g. "drive 5: disengaged"). When the file bind_point/stats is read then a line is output which shows the cumulative number of reads and writes performed (including failures) for each backend of the raid device. The format of this line is D0 r0_cnt r0_fails w0_cnt w0_fails; D1 r1_cnt r1_fails w1_cnt w1_fails; ... which indicates that backend 0 (typically the drive0) has made r0_cnt reads, w0_cnt writes, r0_fails read failures and w0_fails write failures and that backend 1 (drive 1) has made r1_cnt reads, w1_cnt writes, r1_fails read failures and w1_fails write failures and so forth for each backend in the raid set. If the string "zerostats" is written to the file bind_point/stats then all cumulative read and write counts for each backend of the raid set are zeroed.   * EXAMPLE > /raid5 bind -k {raid5 5} /raid5 echo moniker name=R_5 > /raid5/ctl echo engage drive=0 qlen=8 fd=7 blksize=8192 name=D0 <>[7] /d0/data > /raid5/ctl echo access drive=0 read-write > /raid5/ctl ... echo engage drive=5 qlen=8 fd=7 blksize=8192 name=D1 <>[7] /d5/data > /raid5/ctl echo access drive=5 read-write > /raid5/ctl This example creates the file "/raid5" as a bind point and then binds the raid5 device on it. The first echo command establishes the internal raid device name as R_5. The subsequent echo commands are shown in pairs for each backend file: one sending an "engage" string and the other sending an "access" string to the file "/raid5/ctl". Each "engage" string associates a backend file (via file descriptor 7) with a block size of 8192 bytes and a maximum queue length of 8. The following "access" string adjusts the access level of the backend file from "offline" (the default) to "read-write". This is a six disk raid set.   * NOTES: The size of the resultant raid set will be the size of the smallest backend multiplied by the number of data backends adjusted downwards to align to be a multiple of the raid set's blocksize (blockSize).   * SEE ALSO: raid0, raid1, raid3, raid4 3.9.69  RAM - ram based file system   * SYNOPSIS: bind -k ram bind_point   * DESCRIPTION: The ram file system uses memory on the target system's heap to support an hierarchical file system. Typically ram is the root file system in a husky environment. Unlike a normal file system ram lacks persistence. Therefore (assuming the heap resides in _non_ battery backed up RAM) when the target system loses power, the contents of the ram file system are lost. This is similar to the way the "/tmp" file system works in Unix.   * SEE ALSO: mem 3.9.70  RANDIO - simulate random reads and writes   * SYNOPSIS: randio -d device -n nop [-b cnt] [-f fill_ch] [-i init_ch] [-o offset] [-s size] [-G grp_size] [-S seed] [-X xfer_size] [-M rdwr|rdonly] [-T seq|ran] [-vBDON]   * DESCRIPTION: randio defines a range on the device from the start of the device to it's end, or for a length of size IN 512-byte BLOCKS, if specified, in which to perform. nop random write then read operations. All reads and writes are in multiples of 512 byte blocks. First, every block in the given range is initialized with a specific pattern. Then for each operation, a random location in the range is chosen and a random number of blocks are written with a specific data pattern. Once nop write operations have occurred, nop reads are performed at the appropriate locations and lengths to verify the previously written data. Lastly, all data on the device is verified. That is, any unwritten block should have only the initialization pattern and any block that was written to should have the appropriate data pattern. When a write of a given length occurs, each 512 byte block within the write has a header containing the operation number, the start of the write location, the length of write and the data fill pattern. The rest of each 512 byte block is initialized with a fill pattern. The output of this command provides the I/O transfer rates in Megabytes (MB) and million bytes (Mb) for the three phases of I/ O, that is sequential­ writes, random writes then reads, and finally sequential reads. The size and offset values may have a suffix.   * OPTIONS:   + -b cnt: Restrict all read and write operations to be multiples of cnt x 512 blocks. The default is 1.   + -f fill_ch: The data fill pattern used in writes should be fill_ch. fill_ch must be specified as a hex number. The default is 0x7F.   + -i init_ch: The fill pattern used when initialising the device should be init_ch. fill_ch must be specified as a hex number. The default is 0x00.   + -o offset: All reads and writes are to be performed offset 512-byte BLOCKS into the device. All reported values will be relative to this offset. The default is 0 i.e the start of the device.   + -s size: The size, in 512-byte BLOCKS, of the range upon which we perform the io.   + -v: Run in verbose mode printing out initialization information, and the final verification of the whole specified range on the device. If a second -v is given then details of individual read and write operations are printed.   + -B: Align all random write locations to be a multiple of cnt x 512. Where cnt is specified by the -b option. This is useful when writing to raw raid devices which require both a fixed write size multiple and an aligned "start of write" address.   + -G grp_siz: Normally, randio performs nop writes then nop reads. The -G option can change this to perform grp_siz writes then grp_siz reads looping until nop operations have occurred. The default "group size" if nop operations. If grp_siz is negative, then a random group size is chosen.   + -M rdwr|rdonly: By default, a destructive read-write test is performed (-M rdwr). To run randio in a non-destructive way, specify this option with the sub-option of rdonly. This will result in randio only performing reads.   + -N: By default, data read is always compared against what was written. By specifying this option, the comparison will not be made. This option is usefull when you are only concerned with performing random reads and writes and not if the data corrupts.   + -O: By default, the random write locations and sizes are chosen so that they overlap. By specifying this option, overlapped writes can NOT occur. By careful when using this option if your device is small.   + -S seed: Specify a seed for the random number generators.   + -T ran|seq: By default, randio performs a sequence of sequential writes, random writes then reads and finally sequential reads. To have only the sequential activity performed, use the -T seq option. The default is -T ran.   + -X xfer_size: Restrict the maximum write size to be xfer_size 512-byte BLOCKS. The default is 256 blocks - 128K bytes.   * SEE ALSO: speedtst, suffix 3.9.71  RCONF, SPOOL, HCONF, MCONF, CORRUPT-CONFIG - raid configuration and spares management   * SYNOPSIS :   + rconf -add -type raidtype -name raidname -size raidsize -iomode read-write|read-only|write-only -iosize raid- chunksize -hostif mflag controller host_port scsi_lun -backendsize backends_size -backends backend1,backend2,...,backendn [-boot auto|manual] [-cache raidcachesize] [-state active|inactive] [-usespares on|off] [-usezoneio on|off] [-qlen nqueues] [-stargs {xstargd_args}] [-hostif mflag controller host_port scsi_lun]   + rconf -delete raidname   + rconf -list [raidname] [-v]   + rconf [-v]   + rconf -mkfaulty -name raidname -drive drive_no   + rconf -modify -name raidname [-iomode read-write|read-only| write-only] [-newname newraidname] [-hostif mflag controller host_port scsi_lun] [-boot auto|manual] [-cache raidcachesize] [-state active|inactive] [-usespares on|off] [-usezoneio on|off] [-qlen nqueues] [-stargs {xstargd_args}]   + rconf -rebuild -name raidname -drive drive_no [-spare spare_backend]   + rconf -repair -name raidname -drive drive_no -action start|finish [-depth value]   + rconf -replace -name raidname -backend backendn -action start|finish [-depth value]   + rconf -unrepair -name raidname -backend backendn -drive drive_no -depth value   + rconf -init   + rconf -fullinit   + rconf -check   + rconf -maxiochunk   + rconf -validate   + rconf -syslog   + rconf -size   + spool -add -name backend -backendsize backends_size -type hot|warm [-ctlr controller_number]   + spool -delete -name backend   + spool -list   + spool   + hconf -delete controller hostport   + hconf -list [controller hostport]   + hconf   + hconf -modify controller hostport scsi_id   + mconf -add controller hostport scsi_lun [blkprotocol]   + mconf -delete controller hostport scsi_lun   + mconf -sml   + mconf -smldelete controller hostport   + mconf -list [controller]   + mconf [-C] [-c] corrupt-config   * DESCRIPTION: rconf, hconf, mconf and spool manipulate raid set definitions, host port connections and the spares pool on a RaidRunner. All information is stored the RaidRunner configuration area which is stored in the file /dev/flash/conf and duplicated on every disk (/dev/hd /.../rconfig) on the RaidRunner. The corrupt-config command will corrupt the configuration areas and immediately reboot the RaidRunner.   + Raid Sets: A Raid Set has the following attributes   o Type : the Raid Type - either 0 (stripped backends), 1 (mirrored backends), 3 (stripped backends with parity on one backend), 5 (stripped backends with parity spread across all backends).   o Name : the Raid Set's unique name used to make identification and access easier. A Raid Set's name must start with an alphabetic character and then have alphanumeric elsewhere. The maximum length of the name is 32 characters.   o IOmode: the mode of access to the Raid Set. Access modes can be either read-write, read-only or write-only.   o Size: the size (in 512 byte blocks) of the raid set if it's to be less than the calculated size. Normally a raid set's size is a function of the size of the raid set's backends (i.e type 0 - sum of backends, 1 - size of one backend, 3/5 - sum of backends less one). The Size must be a multiple of of the IOsize.   o IOsize: the size (in bytes) of io for read and write operations to the raid set. This is commonly called the chunksize.   o Cachesize: the size (in bytes) of any cache that will front-end the raid set. The cache size must be a minimum of 256K bytes (quarter megabyte). The cache size must also be a multiple of the raid set's IOsize.   o Host Interface(s): the interfaces to use for access by a host. A raid set can be either single-ported or multi-ported. That is to say, a raid set can present different scsi disks all directed at itself. Naturally, the host(s) that access these disks must co-operate to ensure data integrity. Each host interface is a quartet of - a master/slave access flag, the controller number, the host port on that controller and lastly the SCSI LUN that the raid set should propagate on the host port. The master/slave access flag is used at boot time to configure the Host Interface's SCSI target daemon into either a full access (and hence spun up) master SCSI target or a minimal access (and hence spun down) slave SCSI target. See details on the -Z option on the mstargd command. This master/slave concept can be used by co-opera- tive hosts to share access to a single raid set via multiple SCSI target daemons. A raid set can only be multi-ported within a controller.   o Backends: the list of the raid set's backends. Backends can either be disks or other raid sets.   o Backendsize: the size (in 512 byte blocks) of the raid set backends   o Bootmode: the boot mode of the raid set. It can either be autoboot or manual. For the former, when the RaidRunner boots, the raid set is automatically made available on the host interface specified and in the later case, it must be manually made available.   o State : this is the State of the raid set. The state of a raid set can either be active or inactive. If active, then the raid set's data is available for access. If inactive, then data on the raid set is not avail- able and hence the raid set is in a quiescent state.   o ZoneioUsage: this is a flag to indicate to the cache filesystem that all I/O to this raid set be optimized for backend reads and writes based on the backends zone (notch) pages.   o SparesUsage: this is a flag to indicate whether the raid set is to use spares. It's value can be either on or off.   o QueueLen: this is the maximum number of requests than can be put in an io queue associated with each backend of the raid set. It's value must be 1 or greater.   + Spares Pool: A spares pool can be established on a RaidRunner which provides a pool of disks for use by running raid sets which suffer a backend (disk) failure. Each disk in the spool may be either hot or warm. Hot disks are those that are immediately available i.e spun up. Warm disks are those that require spinning-up prior to use. As different raid sets may use different backend (disk) sizes, the spares pool must know the size of each disk in the pool. If the RaidRunner has more than one controller, then the usage of a spare can be restricted to a specific controller. Given a spare is available to a given controller, then the spare is allocated (rconf -repair command based on the following priority - hot spares of the same size on the same rank, hot spares of the same size on another rank, warm spares of the same size on the same rank, warm spares of the same size on another rank, hot spares of a larger size on the same rank, etc.   + The rank is the SCSI ID of the device.   + Host Port SCSI ID Assignment: On each controller in a RaidRunner there are a number of Host Ports through which data on the RaidRunner is accessed. As this host port is effectively a SCSI device then it has to have a SCSI ID assigned to it. There are two methods to assign a SCSI ID to a host port. The first uses a physical selector switch (usually providing selections from 0 to 16) and the second is to internally configure the host port's SCSI ID in the non-volatile raid configuration area - the hconf configuration command performs this assignment.   + SCSI Monitor: On each controller in a RaidRunner one or more Monitors (smon) can be set up to run on a specified host port. This monitor allows a program on a host system to send RaidRunner commands which will be executed in a husky subshell as well as transfer of files to and from the RaidRunner. This monitor simulates a 32K "disk" on a given SCSI ID (as set up by the hconf command or selector switch) and SCSI LUN (which is set up using the mconf command). Execution of commands and file transfers are effected by reading from and writing to specific block locations on the "disk". The reads and writes are to follow a specific protocol based on the block locations and order of the reads and writes. See the smon for details of this protocol. As different systems use specific blocknumbers for their own internal use (eg block number 0 is typically used for disk labelling) writes to any block not used by the protocol are preserved for subsequent reads. Up to 16 blocks are preserved in the raid configuration area and are re-read by smon at boot time so data is preserved. If more than 16 blocks are written to, then this data is silently discarded. The block numbers used in the smon protocol can be changed to suit different systems which may use the default protocol block numbers. By default, the SCSI monitor will always start at boot time, even if the raid configuration area has just been initialized or is corrupt. RaidRunner model specific defaults for the host port, SCSI ID and SCSI LUN for the monitor are compiled into the RaidRunner binary. To prevent the scsi monitor from automatically executing at boot time the monitor will have to be specifically deleted from the controller using the mconf -delete option.   + Manual Backend Reconstruction: Raid types 1, 3 and 5 are resistant to a single backend device failure. That is, the raid set's data is still available even if one of it's backends fail. Should a subsequent failure occur on a different backend then the raid set's data will be inaccessible. When a backend device does fail, the RaidRunner disengages that device from the raid set. To replace a failed backend of a type 1, 3 or 5 raid set, first physically correct the failure (eg replace and test the faulty disk), then engage the backend in write-only mode and read the entire raid set from the repair entry point. This will result in the reconstruction raid set's data and parity (if appropriate) onto the replaced backend. We engage the backend in write-only access mode to prevent the raid set from reading from this drive during the reconstruction process as data on this backend is incomplete. Once we have reconstructed the data on the replaced backend (i.e the read has completed), we then set the backend's access mode to be the access mode of the raid set and thus we now have a "complete" raid set. This reconstruction of data is passive, in that the raid set is still "running" during the reconstruction, although it will be running slowly.   + Automatic Backend Reconstruction: In the case of either a type 1, 3 or 5 raid set, a single backend device failure can result in the automatic replacement of the failed backend and subsequent reconstruction of the raid set. This is accomplished by specifying a husky script to execute when a backend failure occurs - see autorepair. When a backend failure occurs, the backend is disengaged from the raid set, and the husky script, if specified, is executed. The script typically allocates a disk from the spares pool, engages it to the raid set in write-only access mode, then reads the entire raid set (from the repair entry point) which reconstructs the raid set incorporating the new disk. Once complete, the disk's access mode is set to the access mode of the raid set.   + If, during the reconstruction phase (read of whole raid set) the newly engaged backend fails, the raid set's data is still available. All the RaidRunner will do, is disengage the backend and execute the script again. It will continue to do this until no more spares are available. Alternately, if another backend fails during the reconstruction, the raid set will fail.   + Configuration Areas: Typically the RaidRunner stores multiple copies of the raid configuration area. Copies are stored on all disks (in / dev/hd/c.s.l/rconfig) that can be written to and in a section of flash ram (/dev/flash/conf) on the RaidRunner controller board itself. This is done for redundancy. Whenever the raid configuration area is updated, copies are written to each disk (in chip, scsi id (rank), lun order) then lastly to flashram. When the raid configuration is read, it groups, all configuration sources with identical configurations then works out the most "correct" configuration based on the following rules - 1. If all sources are unreadable or corrupt then we used the compiled default scsi host id's and scsi monitor configurations. 2. If all sources are the same (i.e one group) then this is the normal configuration. 3. If we have two groups of configurations, then the group with the highest number of identical areas is used. If both groups have the same number of identical areas then we pick the group with the highest revision. If both groups have the same number of areas and the same revision, the first group is chosen. 4. If we have three groups we choose the group with the highest number of identical areas. If two groups (or all three) meet this criteria we use the defined master configuration area - FLASH RAM. 5. If we have more than three groups, we use the master configuration area - FLASH RAM. The rconf -check option is used to re-read all configuration areas has per the above rules, and if there is differences (more than 1 group) it selects the most "correct" configuration and re-writes it out to all areas. When the RaidRunner initially boots it typically only has access to the flashram (as the disks have not been bound into the filesystem yet), so it uses the configuration stored in flashram to start (in a spun-down state) the scsi monitor(s) and scsi target daemons (stargd's). It then binds in all disks and executes a rconf -check command and if it had to re-write the configuration AND flashram was not in the "correct" group it may reboot the RaidRunner (as now we can assume than the flashram area is now correct).   * OPTIONS - all commands   + -C: This option checks to see if any significant configuration change has been made since boot time to any raid set, host port, scsi monitor or spares pool configuration. An appropriate message is printed and a return status ($status) of 1 is returned if a change has occurred, else 0 for no change. Any modification, addition, deletion is considered to be a significant change. The only NON-significant change is the changing of state of a raid set.   + -c: This option checks to see if any significant configuration change has been made since boot time to the relevant configuration area depending on the command invoked - rconf - raid sets, hconf - host ports (if not using physical selectors), etc. The return status is as per the -C option.   * OPTIONS - mconf   + Specifying no options is the same as specifying the -list option.   + -delete: Delete the scsi monitor given controller number, hostport and SCSI LUN. This option takes three values, the controller number, host port and the SCSI LUN. The deletion of a monitor will not take effect until next boot of the RaidRunner.   + -list: Print the controller/hostport assigned monitor SCSI LUN. If the optional controller number argument is given then print out the controller, hostport and assigned monitor SCSI LUNs for that controller. If a controller has no assigned monitor SCSI LUN, then a minus (-) character will be printed in place of the SCSI LUN. If an error occurs whilst opening or reading the RaidRunner configuration area, a RaidRunner model specific default with be listed.   + -add: Set the monitor SCSI LUN on the given controller's host port. This option takes three mandatory values and one optional value. The first is the controller, the second the host port on that controller and the last is the SCSI LUN to assign the monitor. If the host port has not been assigned a SCSI ID, then the monitor will not be executed at boot. The optional value is a comma separated list of nine (9) numbers which change the default block protocol blocknumbers used by the SCSI monitor. See smon for details.   + -sml: Print each scsi monitor's stored label information. The information is printed as follows - controller number, hostport number, number of stored blocks, followed by a colon, then the block addresses stored. Block number 0's at the end of this list mean that those available blocks have not been written to.   * OPTIONS - hconf   + Specifying no options is the same as specifying the -list option.   + -delete: If the controller does not have a physical SCSI ID selector switch, specifying this option will delete the controller/hostport assigned SCSI ID. This option takes two values, the controller number and host- port. By deleting an assigned scsi id, no other program (eg scsi monitor or raid set) can appear on this host port.   + -list: Print the controller/hostport assigned SCSI ID. If the optional controller and hostport arguments are given then print out the controller, hostport and assigned scsi id for that pair. If no arguments are given, then for each host port on each controller in the RaidRunner, print out the controller number, host port and SCSI ID assigned to that host port. If the controller does not have host port SCSI ID selector switches, then if a host port has not had a SCSI ID assigned to it, a minus (-) character will be printed in place of the SCSI ID. For example (of a RaidRunner with two controllers each of which have two host ports) 0 0 2 0 1 3 1 0 - 1 0 6 Which shows that on the first controller (0), the first host port (0) has been assigned SCSI ID 2, the second port (1) has been assigned SCSI ID 3. The second controller (1) has NO SCSI ID assigned to it's first host port, and SCSI ID 6 assigned to it's port. If an error occurs whilst opening or reading the RaidRunner configuration area, a RaidRunner model specific default with be listed.   + -modify: If the controller does not have a physical SCSI ID selector switch, specifying this option will set the SCSI ID on the given controller's host port. This option takes three values. The first is the controller, the second the host port on that controller and the last is the SCSI ID to assign. You cannot set a host port's SCSI ID if any raid set which uses that port is active.   * OPTIONS - rconf   + Any size sub-options (i.e raidsize, cache, iosize, backendsize) can use the suffix values. Specifying no options is the same as specifying the -list option.   + -add: Add a raid set on the RaidRunner. Sub-options are -   + -type: Specify the raid type. This option takes values of either 0, 1, 3 or 5.   + -name: Specify the raid name. A name must be alphanumeric, with a maximum of 32 characters and start with an alphabetic character.   + -iomode: Specify the raid set's io mode. This option takes values of either RW, RD or WR for read-write, read-only or write-only respectively.   + -size Specify the raid set's size. This is the size of the raid set in 512 byte blocks. It must be a multiple of the -iosize value. If it is not then it will be automatically adjusted to be so and a message (not error) will be printed. In the case of raid type 3 the size must be a multiple of the -iosize value multiplied by the number of data backends in the raid set.   + -iosize: Specify the raid set's chunksize in bytes. All reads and writes on the raid set will be in multiples of this size. When cache is used, this is also the size of the cache buffers created (for Raid type 3 the cache buffer size is the number of data backends times this value). The RaidRunner establishes a maximum number of 512-byte blocks that can be written to at any instant - see the write_limit internal variable (internals). Accordingly iosize is limited to ensure that at least two of these buffers will fit into this "writable" portion of cache.   + -hostif: Specify the raid set's host interface(s). The host interface is the device though which the raid set's data is externally accessed. A host interface is defined by the quartet - master/slave flag (mflag), controller, hostport and scsi lun. Where   o mflag: is a target access flag - master(M): full access, slave (S): limited access   o controller: is the controller on which the raid set is to run   o hostport: is the host port a raid set's IO is to be directed   o scsilun: is the scsi lun that the raid set propagates out the hostport The master/slave flag is either M for master, or S for slave mode. Multiple host interfaces are added by repeating this option.   + -backendsize: Specify the size of the raid set's backends. This is the size, in 512 byte blocks, of the raid set's backends. This value is used when searching for backends in the spares pool.   + -backends: Specify the raid set's backend devices in a comma separated list. A backend device of a raid set can be a disk or the frontend of another raid set. If it is a disk then it will have the format Dc.s.l where c is the channel, s is the scsi id (or rank) and l is the scsi lun of the disk. If it is a raid set, then it will have the format Rraidsetname where raidsetname is the name of the raid set.   + -boot: Specify the raid set's boot mode. Values can either be auto or manual for autobooting or manual booting raids. This option is optional and the default bootmode is auto.   + -cache: Specify the size, in bytes, of cache that should front-end the raid set. The default size is 0, which means that no cache will be used. The cache size must be a multiple of the raid set's iosize.   + -stargs: Specify additional arguments to the stargd process when it is created for the given raid set. The arguments are to be enclosed in braces - {}, IE {-L 16 -I 512}   + -state: Specify the initial state of the raid set. Values can either be active or inactive. This option is optional and default state is inactive.   + -usezoneio: Set the zone io usage flag to either on or off. If on, then the cache filesystem will fragment write and read requests to suit the zone (notch) partitions on the backend disks.   + -usespares: Set the spares usage flag to either on or off. If on, then a running raid set (1, 3 or 5) will, on an I/O error, attempt automatic spares allocation and reconstruction.   + -qlen: Specify the maximum number of io requests that can be queued to a backend. The minimum is 1 and the default is 8.   + -delete: Delete a specified raid set from the RaidRunner. The raid set must be inactive.   + raidsetname: Specify the name of the raid set to delete.   + -list: Print raid set configuration. If a raidsetname name is not specified, all raid set configurations are printed, one per line in the form (if verbose mode is not set (-v) - name type flags size iosize cachesize iomode qlen nhostifs hostifs backendsize backends [optional backend status]. Where the flags is a comma separated list from "Active", "Inactive", "Used", "Unused", "Autoboot", "Manboot" and "Usespares". When a raid set is first flagged to be active, then the "Used" flag is per- manently set. This knowledge that a raid set has been used at some time is used by other programs and utilities on the RaidRunner. nhostifs is the number of host interfaces to present hostifs is a comma separated list of host interface quartets of master/slave flag, controller, host port and scsi lun (each separated by a period). For example M0.0.4,S0.1.0 means two host interfaces - both on controller 0, one on port 0 with a scsi LUN of 4 and the other on port 1 with a scsi LUN of 0. The first is to be a master target which provides full access by the host, and the second is to be a slave which will advise on any host access that the target is spun down. The backends are a comma separated list of backends. If a backend has failed then a "- the spare back-end will also be suffixed. If multiple spare backends have been used (i.e the spares failed) then each failed backend will be suffixed with a (-) D0.2.0-,D1s2l0,D2.2.0,D3.2.0,D4.2.0 means that the backend D0.2.0 has failed and does not have a spare and the other backends are D1s2l0,D2.2.0,D3.2.0,D4.2.0. D0.2.0-D5.2.0,D1s2l0,D2.2.0,D3.2.0,D4.2.0 means that the backend D0.2.0 has failed and the spare backend, D5.2.0, has replaced it and the other backends are D1s2l0,D2.2.0,D3.2.0,D4.2.0. D0.2.0-D5.2.0-,D1s2l0,D2.2.0,D3.2.0,D4.2.0 means that the backend D0.2.0 has failed and the spare backend, D5.2.0, which replaced it has also failed and the other backends are D1.2.0,D2.2.0,D3.2.0,D4.2.0.   + raidsetname: Print the configuration for only the specified raid set.   + -modify: This command allows one to change the raid set's state, name, bootmode, iomode, host interface, spares usage, extra stargd args, cachesize or QueueLen. When specifying a state change, then no other modifications are allowed. To modify any of the preceding raid set attributes (except state), the raid set must be inactive. Sub-options are   + -name: The name of the raid set to modify   + -newname: The new name for the raid set. This new name must still be unique amongst the other defined raid sets.   + -boot: Change the raid set's boot mode. Values can either be auto or manual for autobooting or manual booting raids.   + -cache: Change the size, in bytes, of cache that should front-end the raid.   + -hostif: Change the raid set's host interface(s). The host interface is the device though which the raid set's data is externally accessed. A host interface is defined by the quartet - master/slave flag (mflag), controller, hostport and scsi lun. Where mflag is a target access flag - master(M): full access, slave(S): limited access controller is the controller on which the raid set is to run hostport is the host port a raid set's IO is to be directed scsilun is the scsi lun that the raid set propagates out the hostport The master/ slave flag is either M for master, or S for slave mode. Multiple host interfaces are added by repeating this option. Note you must specify all host interfaces required in the one command. Thus to delete a host interface you modify without that host interface option.   + -iomode: Change the raid set's io mode. This option takes values of either RW, RD or WR for read-write, read-only or write-only respectively.   + -qlen: Change the maximum number of io requests that can be queued to a backend.   + -stargs: Specify additional arguments to the stargd process when it is created for the given raid set. The arguments are to be enclosed in braces - {}, IE {-L 16 -I 512}   + -state: Change state of the raid set. Values can either be active or inactive.   + -usespares: Set the spares usage flag to either on or off.   + -rebuild: The rebuild option is used when a stored raid set configuration has been corrupted and that raid set had faulty backends optionally supplemented with spares. Normally, if the stored raid set configuration was corrupted in some way and it did not have any faulty drives then one would just delete the raid set and then add it. If any of the backends were faulty then we need to store this information in the newly re-created raid set configuration. The rebuild option does this. If the faulty backend did not use any spares, then the -rebuild option will just mark that backend faulty. If it did use spares, then repeated invocations of this command (with the -spare sub-option) will add spares as appropriate, marking the backend faulty and any previously configured spares faulty also. The last invocation with the -spare sub-option will add the spare as in use but not faulty. If the last spare was also faulty, then a final invocation without the -spare sub-option will mark that last spare faulty.   + -name: The name of the raid set.   + -drive: Specify the backend to rebuild. The backend specification takes the form of the index of the backend in the list of backends. The first backend in a raid set has the index 0.   + -spar:e Optionally specify the spare device which is to be configured onto a backend. This device should already be in the spares pool and be unused. If no spares have been assigned to the backend, then the backend is marked faulty and the spare is added and marked as in use. If spares have been assigned to the backend, then the last spare is marked faulty and the additional spare is added and marked as in use.   + -mkfaulty: The mkfaulty option is used to mark a particular backend as faulty. When a backend of a raid set which does not use spares fails, we need to update the RaidRunner configuration of this fact. This command's sub-options are the same (and have the same effect) as the rebuild command's sub-options excluding the -spare sub-option.   + -repair: Reconfigure a raid set to replace a backend with a spare backend allocated from the spares pool. When the -action option is start then the depth of the new spare and the spare device name is printed. The depth is the number of spares allocated to that backend. When the first spare is allocated, then the depth will be 1. When the second is allocated then the depth is 2 and so on. The depth is used when re-engaging a spare which has just been reconstructed. If the depth is different after the reconstruction has occurred then we know that the spare has failed and another spare has been allocated so we wont attempt the re-engage.   + -name The name of the raid set.   + -drive Specify the backend to replace. The backend specification takes the form of the index of the backend in the list of backends. The first backend in a raid set has the index 0.   + -action: Specify the repair action. Values are either start or finished. Normally, when a spare is allocated, the next steps are to reconstruct the data on that spare then re-engage the spare. If a failure occurs on the spare during reconstruction, then the spare should not be re-engaged. Thus, a raid set should know that a reconstruction is in process. Once the reconstruction is complete then the reconstruction knowledge is cleared. The start value will print out the depth of the newly allocated spare and the spare's device name. If no spares are available an error will be printed and the depth of the last spare for the backend will be printed only. The finished action checks the current depth of spares against a given depth and clears the reconstruction flag if the depth is the same (that is, the spare reconstructed correctly).   + -depth: Specify the depth of the spare device when performing a finished action. A check is made to ensure the current depth of the backend is the same is the specified one. This essentially checks to see if another spare has been allocated on this backend during the reconstruction.   + -replace: This command is used when we have physically repaired (or replaced) the original failed backend and we wish to re-integrate it back into the raid set. This is done by deallocating the current running spare and reconstructing the raid on the replaced original backend. This command is similar to the -repair: command except that the working spare is deallocated from the backend and returned to the spares pool PRIOR to the reconstruction. This way, if the newly replaced drive fails, we have a spare to reallocate.   + -name: The name of the raid set.   + -backend: The name of the backend device that is being re-integrated.   + -action: Specify the replacement action. Values are either start or finished. When the start action is specified, the last spare on the backend is returned to the spares pool if it is in a working state. Then the current depth of spares on the backend and the backend's index into the last of backends (the raid set's drive number) is printed out. If a failure occurs on the backend during reconstruction, then the backend should not be re-engaged. Thus, a raid set should know that a reconstruction is in process. Once the reconstruction is complete then the reconstruction knowledge is cleared. The finished action checks the current depth of spares against the given depth and clears the reconstruction flag if the depth is the same (that is, the spare reconstructed correctly) and then releases any spares back to the spares pool (all these spares will be faulty). If the depth is different then the reconstruction has failed and an error message is printed and the spares are left alone. -depth Specify the depth of the spare devices when per- forming a finished action. A check is made to ensure the current depth of spares of the backend is the same is the specified one. This essentially checks to see if a spare has been allocated on this backend during the reconstruction.   + -unrepair: Typically, when we repair a raid set using spares, we modify the raid set's configuration to indicate the faulty drive, the allocation of a spare and it's state of "under construction". We then perform the reconstruction, and on success we re-modify the configuration to clear the "under construction" state. IE a sequence of commands like - rconf -repair .. -action start ... rebuild the raid set rconf -repair .. -action finish ... Now, if the spare we allocated is faulty or the reconstruction fails, we need to turn off the "under construction" state, deallocate the spare and re-mark the original failing drive as faulty (as well as the spare drive if it was faulty). The -unrepair option does this. The options are similar to the   + -repair: options with the addition of the name of the originally faulty backend.   + -name: The name of the raid set.   + -drive: Specify the backend to unrepair. The backend specification takes the form of the index of the backend in the list of backends. The first backend in a raid set has the index 0.   + -backend: The name of the backend device that is being un-repaired.   + -depth: Specify the depth of the spare device as allocated in the original rconf -repair -action start command. This essentially checks to see if another spare has been allocated on this backend during the failed reconstruction or spare testing.   + -init: Initialize the RaidRunner configuration area. With this option all raid set, host port, scsi monitor and spares spool configurations are initialized. The GLOBAL environment variables, and scsi monitor (smon) data blocks which are located in the Raid configuration area, will NOT be deleted. To enable at least one scsi monitor to come up, the scsi monitor lun and it's associated host port scsi id is initialized to a RaidRunner model specific default. Use this option with caution.   + -fullinit: Initialize the RaidRunner configuration area. This option performs the same function as the -init option and additionally clears all GLOBAL environment variables and the scsi monitor (smon) data blocks.   + -check: Cause the multiple configuration areas to be all re-read and, if different, re-written as per the raid configuration area multiple source rules. If more than one raid configuration area differs to the others AND the "correct" area does not include flashram, then rconf returns a non-zero exit status, if all areas are the same or all areas are unreadable (or corrupt) then rconf returns a zero exit status.   + -validate: Perform a consistency check on the current raid sets, spares etc and print out any inconsistencies. If any inconsistencies occur messages will be printed and the return status of rconf will be 1. If no inconsistencies are present, then nothing will be printed and the return status will be NULL.   + -size: Print the number of bytes actually used in the raid configuration store.   + -syslog: Print any syslog entries stored in the configuration area. Notes that these entries will be deleted on an rconf -init or -fullinit.   + -maxiochunk: Search through all raid sets currently configured and print, in 512-byte blocks, the maximum IO Chunk size of all raid sets.   + -v: Set the verbose option. This option only effects the listing of Raid Sets. It prints additional detail when listing the Raid Sets.   * OPTIONS - spool Specifying no options is the same as specifying the -list option.   + -add: Add a disk to the spares pool of the RaidRunner. Sub-options are -   + -name the name of the disk. If the disk is already used by a raid set or is already in the spares pool, an error will be printed.   + -backendsize: the size (in 512 byte blocks) of the added disk. This value can be in the form specified by suffix.   + -type: the type of the spare. Spare devices can either be hot (spun up) or warm (spun down).   + -delete: delete the specified spare from the spares pool. A check is made to see that the spare is not in use.   + -list: Print out the current state of each spare in the pool. The format is Backend BackendSize hot|warm controller_no|Any Used|Unused Faulty|Working comments   + Configuration Corruption: Under certain conditions, it is necessary to corrupt the RaidRunner configuration area. An example of this is a RaidRunner that has been configured in such a way that all memory is consumed and the RaidRunner will not respond to any commands either on the console or via the scsi monitor. Given there is not enough memory to delete raid sets or even execute the reboot process (or any other for that matter), then a command which corrupts all the RaidRunner configuration areas (and hence will prevent the automatic creation of raid sets at reboot - i.e the consumer of memory) and then immediately reboot's without using any additional memory will allow the user to reconfigure the RaidRunner when it has re-booted. The corrupt-config command performs this task.   * RETURN VALUES: If an error occurs, the $status environment variable is set to 1, else null. When the configuration change options (-C or -c) are given, the return value ($status) will be 1 if a configuration has changed since boot else 0.   * BUGS: Currently, if you duplicate sub-options within a command, the last occurrence will be the value set. For example if the command rconf -modify -name F -newname B -newname C -cache 1M -cache 2M is executed then the newname will be "C" and the cache size set to 2M (2 megabytes).   * SEE ALSO: husky, autorepair, replace, suffix, setenv, unsetenv, printenv, smon 3.9.72  REBOOT - exit K9 on target hardware + return to monitor   * SYNOPSIS: reboot [-i] [-s] rboot [-i] [-s] rbootimed   * DESCRIPTION: reboot abruptly leaves the K9 kernel (on target hardware) and either re-enters the underlying hardware monitor or re-initializes the K9 kernel. Aside from flushing the write cache and initialising the battery backed-up ram no cleanup is performed on K9 processes (no signals, no nothing). If no options are given then the reboot will NOT do any memory tests prior re-initialising the K9 kernel. If the -s option is given then memory tests will be performed prior to re-initialising the K9 kernel. During these memory tests, a spinning - is displayed on the RaidRunner's console. This indicates the memory test is executing. If you press the space key during this test, then you will drop into the underlying RaidRunner monitor. If the -i option is given then all I/O to the back-end drives will inhibited. That is, no spun-down drives will be started and no attempt to flush the cache is performed. rboot is an alias for reboot. rbootimed provides a special entry-point into the reboot process and is equivalent to calling reboot with the -i option. Unlike reboot, which is executed as a sub-process, rbootimed is interpreted via husky and immediately commences the reboot process and hence does not use any extra memory. This is useful when the RaidRunner has run out of memory and you need to reboot the RaidRunner into single user state. On simulation platforms such as Unix this command is not implemented (but typically the Unix "interrupt" signal has the similar effect of killing the Unix process containing the K9 simulation). 3.9.73  REBUILD - raid set reconstruction utility   * SYNOPSIS:   + rebuild -d file -s size -b bufsize -D dno -R rname -n nbends -r rtype [-p pri] [-v] [-S sleep_period]   + rebuild -d file -l [-v]   + rebuild -L [-v]   * DESCRIPTION: To reconstruct data onto a newly incorporated drive in a raid set, the raid set's repair entry point is to be completely read. Additionally the read must be in multiples of the raid set's bufsize and the last read must align to the end of the raid set (i.e cannot attempt to read past the end of the raid set). The rebuild utility performs this reconstruction. When rebuild starts, it allocates an internal buffer whose size is maximized to the available memory and which is a multiple of the given buffer size. The raid set's repair entry point is read via these large buffers and the last read is guaranteed to align to the end of the raid set. As rebuild reads from the raid set's repair entry point, a check is made to see if the backend that is being rebuilt is still engaged to the raid set, and if the backend has failed (disengaged from raid set) rebuild prints a message and returns an exit status of 2. If the raid set fails during reconstruction (a second backend fails) then rebuild prints a message and returns an exit status of 1. Rebuilds, by default, consume large amounts of RaidRunner resources which would result in a large reduction in RaidRunner I/O throughput (to the host). A priority can be given to the rebuild program to reduce this demand on resources. This priority ranges between 0 (use minimal resources) and 9 (use maximum resources available). Naturally by specifying a low priority, the time to complete a rebuild will increase. The current state of any completed or "in-progress" rebuild is recorded for informative uses. Up to 10 of these records are kept at any one time. If a new rebuild is requested and we have already used all 10 records, we find the first "inactive" (or completed) record and re-use that one. If all 10 records are currently marked "active" (which means that we currently have 10 rebuilds currently in progress), then no record of the new rebuild will be kept. These records are initialized at boot time.   * OPTIONS   + -d file: Specify the file to read. This is typically the repair entry point of the raid set.   + -s size: Specify the raid set's size, in 512-byte blocks.   + -b bufsize: Specify the raid set's IO chunksize - bufsize, in 512-byte blocks. In the case of a raid type 3, this should be the number of data drives times the size of the IO chunksize.   + -n nbends: Specify the number of backends in the raid set. This is needed along with the bufsize option to ensure that the raid set's repair device is read in multiples of the raid set's stripe size.   + -D dno: Specify the backend index (within the raid set) that is being rebuilt.   + -R rname: Specify the name of the raid set that is being rebuilt. Specify the raid type of the raid set being rebuilt. Must be 1, 3 or 5.   + -p pri: Specify a priority at which the rebuild is to be run at. The priority, pri, must be a number between 0 (lowest) and 9 (highest). When rebuild is executed by automatic or manual scripts an environment vari- able, RebuildPri, is checked to see what priority a raid set should be re-built. If this variable is not set, then the default priority is 5. See environ for details.   + -S sleep_period: Specify a number of milliseconds to sleep between each reconstruct access of the raidset. The period, sleep_period, must be a number between 0 (no sleep) and 60000 (60 seconds). When rebuild is executed by automatic or manual scripts an environment variable, RebuildPri, is checked to see what sleep period a raid set should use. If this variable is not set, then the default period for the given priority is used. See environ for details.   + -v: Set verbose mode. When rebuilding, if verbose mode is set, the current number of reads and total expected number of reads is printed for every 10 percent of the reconstruction completed. When printing the state a specific or all rebuilds, verbose mode will produce a long listing of the rebuild status.   + -l: Print the current state of the specified rebuilding (-d) file.   o By default the following colon separated fields are printed   o the rebuild file (entry point to raid set repair device)   o the size of the file (raid set) in 512 byte blocks   o the IO chunksize of the file (raid set) in 512 byte blocks   o the size of the buffer (in 512 byte blocks) being used for reading   o the total number of reads needed to read the whole file   o the current number of reads performed so far,   o the priority of the rebuild   o the number of milliseconds between each access to sleep   o the state of the rebuild - either "active" or "complete" the error message that this rebuild failed on - set to "none" if the rebuild successfully completed. When verbose mode (-v) is set, the above is printed with appropriate labels.   + -L: Print the rebuild status of ALL rebuilds in progress or completed (in the same form as for the -l option).   * PRIORITY: The speed at which a rebuild occurs and impact it has on the system is determined by the priority and optional sleep period between accesses. When rebuild allocates the buffer to use it will limit this buffer's size to the maximum available contiguous memory segment or 2 Megabytes which ever is the lessor. The memory will then be scaled by pri + 1 / 10. That is, if you specify a priority of 0 then the buffer will be one tenth of the maximum buffer siz, if you specify a priority of 4 then it will be half. This resultant buffer size is then trimmed to ensure it is a multiple of a raid set's stripe size. Between each read/write access of the raid set's repair device, rebuild may sleep for a given period of milliseconds to reduce the overhead of the rebuild process. This sleep period may be given on the command line. If, not then the priority value is used to work out the period. The table below specifies what period is associated with which priority. 0: 2000 milliseconds (2.00 seconds) 1: 1750 milliseconds (1.75 seconds) 2: 1500 milliseconds (1.50 seconds) 3: 1250 milliseconds (1.25 seconds) 4: 1000 milliseconds (1.00 seconds) 5:  750 milliseconds (0.75 seconds) 6:  500 milliseconds (0.50 seconds) 7:  250 milliseconds (0.25 seconds) 8:  100 milliseconds (0.10 seconds) 9:    0 milliseconds (no sleep)   * EXIT STATUS The following exit values are returned:   + 0: Successful completion.   + 1: The raid set failed reconstruction. This means that the raid set has failed.   + 2: The raid set's backend that is being rebuilt has failed. This means that the raid set is still valid.   * SEE ALSO: raid1, raid3, raid5, repair, replace, environ 3.9.74  REPAIR - script to allocate a spare to a raid set's failed backend   * SYNOPSIS: repair raidsetname backend   * DESCRIPTION: repair is a husky script which is used to allocate a spare drive to a failed backend of a raid set. This is typically done when a raid set has a failed backend and no spares were available at the time of failure and now you want to allocate the spare (as opposed to fixing the failed backend). After parsing it's arguments repair get's a spare backend (of appropriate type) from the spares pool. The spare drive is then engaged in write-only access mode in the raid set and a reconstruct of the raid occurs (read of the whole raid set). This read is from the raid file system repair entrypoint. Reading from this entrypoint causes a read of a block immediately followed by a write of that block. The read/ write sequence is atomic (i.e is not interruptible). On successful completion of the reconstruction, the spare is then engaged in the correct iomode for the raid set. The process that reads the repair entrypoint is rebuild. This device reconstruction will take anywhere from 10 minutes to one and a half hours depending on both the size and speed of the backends and the amount of activity the host is generating. During device reconstruction, pairs of numbers will be printed indicating each 10% of data reconstructed. The pairs of numbers are separated by a slash character, the first number being the number of blocks reconstructed so far and the second being the number number of blocks to be reconstructed. Further status about the rebuild can be gained from running rebuild. Checks are made to ensure that the raid set is running, the spare works, and that there is no failure of the spare during reconstruction.   * OPTIONS   + raidsetname: The name of the raid set.   + backend: The name of the failed backend which is being replaced by a spare.   * SEE ALSO: rconf, rebuild 3.9.75  REPLACE - script to restore a backend in a raid set   * SYNOPSIS: replace raidsetname backend   * DESCRIPTION: replace is a husky script which is used to reconfigure back in a physically repaired backend of a type 3 or 5 raid set. After parsing it's arguments replace does a quick read/write test of the specified backend to ensure it's working. If a working spare is currently running in place of the backend, it disengages it and returns it back to the spares pool. The backend drive is then engaged in write-only access mode in the raid set and a reconstruct of the raid occurs (read of the whole raid set). This read is from the raid file system repair entrypoint. Reading from this entrypoint causes a read of a block immediately followed by a write of that block. The read/write sequence is atomic (i.e is not interruptible). On successful completion of the reconstruction, the backend is then engaged in the correct iomode for the raid set and any other faulty spare backends that were associated with that backend are returned to the spares pool. The process that reads the repair entrypoint is rebuild. This device reconstruction will take anywhere from 10 minutes to one and a half hours depending on both the size and speed of the backends and the amount of activity the host is generating. During device reconstruction, pairs of numbers will be printed indicating each 10% of data reconstructed. The pairs of numbers are separated by a slash character, the first number being the number of blocks reconstructed so far and the second being the number number of blocks to be reconstructed. Further status about the rebuild can be gained from running rebuild. Checks are made to ensure that the raid set is running, the backend works, and that there is no failure of the backend during reconstruction.   * OPTIONS   + raidsetname: The name of the raid set.   + backend: The name of the backend which is being restored.   * SEE ALSO: rconf, rebuild 3.9.76  RM - remove the file (or files)   * SYNOPSIS: rm [-f] [ file ... ]   * DESCRIPTION: rm remove the named file (or files). [A directory is considered to be a file.] If all the given files can be removed then NIL (i.e. true) is returned as the status; otherwise the first file name that could not be removed is returned (and this command will continue trying to remove files until the list is exhausted). If the -f option is given, then files that could not be removed are ignored and NIL (i.e. true) is returned as the status.   * BUGS: The use of rm on non-empty directories orphans those child files. 3.9.77  RMON - Power-On Diagnostics and Bootstrap   * SYNOPSIS v1.0 DRAM: 010 Mb Batt: B0000008-B007FFFC PWROK 1:BAD 2:OK 0:720seE 1:720seE 2:720seE 3:720seE 4:720seE 5:720seE 6:770B 7:770B S/N: ABC-12345 B -   * DESCRIPTION: rmon will normally perform a power-on diagnostics and then bootstrap the main raid code. By typing a space before the count-down has finished rmon will abort the power-on diagnostics and prompt for a command to be entered from the console serial port. The first line printed by rmon is it's version number. The second line is the size (in hexa-decimal) of DRAM. The next line indicates the type and usage of Battery-Backup SRAM found. This can be a variety of values and if there are two hexa-decimal values printed, this is then the range of inclusive addresses upon which the power-on diagnostics will test Battery Backed-up SRAM. If there is only one value this indicates the size of Battery Backed-up SRAM, but no power-on diagnostics will performed upon it. All possible values are :-          Batt: 00000000          No Battery Backed-up SRAM present. Batt: B0000000-B007FFFC 512Kb SRAM, B0000004==0 (No data). Batt: B0000008-B007FFFC 512Kb SRAM, B0000004==1 (No data).          Batt: B00XXXXX-B007FFFC 512Kb SRAM, B0000004==2 (Saved data,                                  B0000000== B00XXXXX, where B00XXXXX is                                  start of unused data).          Batt:  B0080000          512Kb SRAM present, but the contents of                                  SRAM is not one of the above, or could                                  be B0000004==2 and B0000000== B0080000.                                  No pon testing done on it.            Batt: B0000000-B03FFFFC 4Mb SRAM, B0000004==0 (No data).          Batt: B0000008-B03FFFFC 4Mb SRAM, B0000004==1 (No data).          Batt: B0XXXXXX-B03FFFFC 4Mb SRAM, B0000004==2 (Saved data,                B0000000==B0XXXXXX, where B0XXXXXX is start of unused                 data).          Batt:  B0400000          4Mb SRAM present, but the contents of                                  SRAM is not one of the above, or could                                  be B0000004==2 and B0000000== B0400000.                             No pon testing done on it. Note the case where location B0000004 has a value of 2 (Saved data present), but where B0000000 (the start of unused data) points to a word address just past the last byte of Battery Backed-up SRAM. This latter case will cause a single value (say B0080000, for 512Kb Battery Backed-up SRAM) to be printed. This case does not indicate that any contents of Battery Backed-up SRAM is deemed incorrect, but that Battery Backed-up SRAM is full of data and that no power-on diagnostics will be performed upon Battery Backed-up SRAM. The DRAM memory sizing algorithm will cater for between 8Mb and 256Mb of DRAM, and there need be only 30 bits out of 32 bits words in each DRAM set that need work properly, for the sizing to function properly. The DRAM consists of two banks of memory, A and B, each of two slots. When using a bank of memory both slots of that bank must be occupied with a Simm of the same type and size. If only one bank is to be used, then it must be Bank A. The type of Simm memory may be single or double sided. Placing two double sided Simms (which, as mentioned above must be both the same size) into Bank A is equivalent to populating all four slots with single sided Simms of a size that is exactly one half of the double sided Simms size. The Bank B in such a case cannot contain any Simms. Each bank can potentially be using single sided Simms of different sizes, but the first bank must have the larger sized Simms.             DRAM Size DRAM last address             Hex Dec   Cached    Uncached             008  8    7FFFFF    A07FFFFF             010  16   FFFFFF    A0FFFFFF             020  32   1FFFFFF   A1FFFFFF             040  64   3FFFFFF   A3FFFFFF             080  128  7FFFFFF   A7FFFFFF             100  256  FFFFFFF   AFFFFFFF On the fourth line is the power supply status. This will be one of two variations. The first for single power supply systems, the power supply summary status is printed at PWROK (for good) or PFAIL if this is not good. The second is for multi-power supply systems, the summary status will be followed by the individual status of each power supply. A PFAIL will be printed if any of the power supplies are faulty. At this point on non-pseudo-static DRAM systems the monitor will hang if the summary status is PFAIL. A pseudo-static DRAM system will never print this line or any of the preceding lines (i.e. version line, DRAM size or Battery Backed-up SRAM size) and always immediately hang. On the fifth line is summary of the SCSI chips used in the system. Each item in this report consists of the index No. for that chip (from 0 to 7), followed by a ':' then a string of text indicating the model. This model consisting of it's chip type and revision level. A summary of the known part No.s vs the model is :-     Bits 7 to 4 of              Model    Part No.    MACNTL CTEST3               720B   609-0391071   0000   0001               720C   609-0391324   0000   0010               720E   609-0391955   0000   0100             720seE   609-0391949   0001   0100               770A   609-0392179   0010   0000               770B   609-0392393   0010   0001 There maybe a sixth indicating the serial number of the unit. If the last 12 bytes of the FlashRam boot partition is left in an unblown state (all ones), then this line will not be appear. Otherwise these last 12 bytes will be printed. These 12 bytes may include trailing blanks. The four bytes of FlashRam preceding the serial no. contain 26 bits representing the revision level of the board. If any of these bits are cleared and there is a serial no. present then the revision level from A to Z will be printed. After printing the above configuration information, the DRAM and Battery Backed-up SRAM (if any found), will have a Knaizuk-Hartmann memory test performed upon them. This memory test does a quick non-exhaustive check for stuck-at faults in both the address lines as well as the data locations. This test is disabled on the following conditions :-   + If the upper 17 bits of first location, contain the hex bit pattern 0x3C1A0000 and there are no parity errors on thence reading all of the rest of DRAM. Such a condition will result in the code starting at the first location to subsequently executed.   + If no DRAM is found. The monitor will not autoboot the main raid code, but print a monitor prompt and wait for a monitor command (see below). By typing a space, any memory tests that may have been started will be aborted. The monitor will print a prompt and wait for a monitor command (see below). All other conditions will result in the main raid code to be read from FlashRam and started. Whilst the memory test is in progress a sequence of -, /, | and characters are printed on the console giving the appearance of a rotating bar. On-board LED 1 will be flashed every 10ms and LED 2 will change state between each phase of the test. As noted above these tests can be abort by typing a space on the console. When rmon is in command mode a variety of commands may be issued. These mostly relate to various diagnostics that can be performed, or to how flash RAM may be upgraded with new firmware.   * GENERAL COMMANDS: All values and addresses used in the monitor are to be in hexadecimal. Commands must start at the beginning of a line and are case sensitive. Commands must be separated by semi-colons ';' or a newline. If a command syntax error occurs, then that command's usage line will be printed. If a command (e.g. command_name) is not recognized, the message unknown command: command_name will be printed. Note that any command that attempts to modify regions of memory that are occupied by the monitor will cause unpredictable results. Currently the monitor uses locations from CPU addresses 801E0000 to 801FFFFF inclusive.   + ?: This command, will print out a summary of the more generally useful commands available in the monitor.   + { some comment }: will ignore all text from the leading open-brace to the first closing - brace. This allows you to comment your monitor scripts (set of monitor commands) prior to downloading. Please NOTE that comments cannot be nested, that is there can only be one opening-brace and closing-brace on a line.   + * count commands ...: will repeat all commands up to the newline count times.   + ( commands ): will execute the sequence of commands from the opening parentheses to the closing parentheses (or new- line) as though they were on a line by themselves. For example to nest repeat (*) commands one could enter the line * 20 (* 8 pb 5F 0); (* 10 pb 6F 1) which will put the byte 0 into memory location 5F eight times, then put the byte 1 into memory location 6F ten times, and repeat both these put byte commands 20 times. Thus the location 5F will have the byte value 0 written into it a total of 160 times and the location 6F will have the byte value 1 writ- ten into it a total of 200 times.   + b: The command will boot the main raid code.   + ba: The command will auto-boot the main raid code, as if from power up.   + g
: Go to (jump to) a code address and start execution at that address.   + l: Download a program or data over the serial port. The data is expected in Motorola Packed S format. If an error occurs during download then the message Unrecoverable error in S-format loader.   + lg: Download a program over the serial port and commence execution of the program. The program data is expected in Motorola Packed S format. After the data or program has been downloaded successfully, then commence execution at the load address (specified within the downloaded file). If an error occurs during download then the message "Unrecoverable error in S-format loader". will be printed and the program will not be executed.   * MEMORY COMMANDS: The following commands are used to display, change or test memory. Most of these commands have three forms depending on the size of memory access. These forms are - a byte form (8-bit), a short word form (16-bit) and long word form (32-bit). When using the short or long word form, any addresses used must be short word (16-bit) or long word (32-bit) aligned. If an unaligned address is given, the error message cmd: bad address alignment is printed. To differentiate between the three forms of memory access, the suffixes b, w and l are used for byte, short word and long word respectively.   + db|dw|dl start_address [end_address]: These commands display the contents of the specified memory location or memory range. Memory can be displayed in byte, short word or long word sizes using the db, dw and dl commands respectively. If no ending address is given, then only the contents of the first address is displayed. Each line of output from these commands have the format: address: contents contents .... contents. Where up to 16 bytes are printed on each line. For example {rmon} db 60 7F 00000060: A0 9F 9E 9D 9C 9B 9A 99 98 97 96 95 94 93 92 91 00000070: 90 8F 8E 8D 8C 8B 8A 89 88 87 86 85 84 83 82 81 {rmon} dw 60 7F 00000060: FFA0 FF9E FF9C FF9A FF98 FF96 FF94 FF92 00000070: FF90 FF8E FF8C FF8A FF88 FF86 FF84 FF82 {rmon} dl 60 7F 00000060: FFFFFFA0 FFFFFF9C FFFFFF98 FFFFFF94 00000070: FFFFFF90 FFFFFF8C FFFFFF88 FFFFFF84   + pb|pw|pl address value ....: These commands put the given value(s) into the specified memory location. Memory can be addressed in byte, short word or long word sizes using the pb, pw and pl commands respectively. If more than one value is given, then the first value is put into the specified memory location, the next value is put into the next aligned memory location, and so forth. For example, the commands {rmon} pb 60 1 2 {rmon} pw 70 04F2 002F {rmon} pl 80 8000FF04 8000FF0F 80FFFFFF will put the value 01 into location 00000060 and 02 into the location 00000061, put the value 04F2 into location 00000070 and 002F into location 00000072, and put the value 8000FF04 into location 00000080, 8000FF0F into location 00000084 and 80FFFFFF into location 00000088 respectively   + fb|fw|fl start_address end_address value: These commands fill the memory given by the start and end address with the given value. Memory can be addressed in byte, short word or long word sizes using the fb, fw and fl commands respectively. The start and ending addresses must be aligned to the given memory sizes. For example, the commands {rmon} fb 60 70 8F {rmon} fw 60 70 FF8F {rmon} fl 80 90 00FFFF11 will put the value 8F into all bytes (inclusive) between locations 00000060 and 00000070, put the value FF8F into all short words (inclusive) between locations 00000060 00000070, and put the value 00FFFF11 into all long words (inclusive) between locations 00000080 and 00000090.   + sb|sw|sl start_address end_address value: These commands search the memory given by the start and end addresses for the first occurrence of the given value and will print out the address where the value was found and the value. If the given value is not found, then nothing will be printed. Memory can be addressed in byte, short word or long word sizes using the sb, sw and sl commands respectively.   + cb|cw|cl start_address end_address value: These commands compare successive memory locations given by the start and end addresses with the given value and will print out the address and it's contents of the end_address, if all the memory was the same, or the address of the first location where the value was different to the given value along with the different value. Memory can be addressed in byte, short word or long word sizes using the cb, cw and cl commands respectively.   + tb|tw|tl start_address end_address seed: These commands generate pseudo-random values and store these values in successive memory locations given by the start and end addresses. When all the memory has been filled, the memory is read and it's contents is compared with what was written. Memory can be addressed in byte, short word or long word sizes using the tb, tw and tl commands respectively. The given seed will initialize the pseudo-random number generator. If successive executions of this command use the same seed, then the random values will always be the same. When the random values are being written, a dot ('.') will be printed for each location. When the read is being performed, a dot ('.') will be printed if the data equals what was written, else an 'X' will be printed. For example {rmon} tw 60 70 7B Writing: Checking: {rmon}   + m start_address end_address mask: This command will run low level memory tests on the given memory range. The start and end addresses must be long word aligned. If the test passes the message Passed. will be printed, and if it fails the message Failed. will be printed. The given mask will determine what type of tests are run. The table below shows what each mask will do. The tests essentially write to memory and then read from memory comparing what was read with what was written. The first twelve tests below first write to all the memory specified and then reads all the memory specified, the next twelve tests do an immediate read after the write. The masks can be or'd together to run multiple tests. 0x0001 Fill each byte with 0x00 0x0002 Fill each byte with 0xFF 0x0004 Fill alternate bytes with 0x55 then 0xAA 0x0008 Fill each byte with the two's complement of it's address 0x0010 Fill each word with 0x0000 0x0020 Fill each word with 0xFFFF 0x0040 Fill alternate words with 0x5555 then 0xAAAA 0x0080 Fill each word with the two's complement of it's address 0x0100 Fill each long word with 0x00000000 0x0200 Fill each long word with 0xFFFFFFFF 0x0400 Fill alternate long words with 0x55555555 then 0xAAAAAAAA 0x0800 Fill each long word with the two's complement of it's address 0x0001000 Fill each byte with 0x00 0x0002000 Fill each byte with 0xFF 0x0004000 Fill alternate bytes with 0x55 then 0xAA 0x0008000 Fill each byte with the two's complement of it's address 0x0010000 Fill each word with 0x0000 0x0020000 Fill each word with 0xFFFF 0x0040000 Fill alternate words with 0x5555 then 0xAAAA 0x0080000 Fill each word with the two's complement of it's address 0x0100000 Fill each long word with 0x00000000 0x0200000 Fill each long word with 0xFFFFFFFF 0x0400000 Fill alternate long words with 0x55555555 then 0xAAAAAAAA 0x0800000 Fill each long word with the two's complement of it's address 0x1000000 Fill alternate long words with 0x00000000 then 0xFFFFFFFF   + r start_address end_address read_size:: This command will generate and print a 32-bit Cyclic-Redundancy-Check value for the block of memory extending from start_address to end_address using reads of the size specified by read_size which can either be 1, 2 or 4 for byte, word or long word reads respectively. The start and end addresses must be long word aligned.   * FLASH MEMORY COMMANDS: The Flash RAM used is AMD's Am29F040, a 4Mb (512k x 8bit) device that is comprised of 8 equi-sized sectors with each sector of 64Kb. The board may potentially have a second Flash RAM chip. The total storage, regardless of the number of Flash RAM chips is divided into 4 regions, the first 3 are each 1 (64kB) sector and contain the Bootstrap, Raid Configuration and Husky Scripts at the CPU addresses of BFC00000, BFC10000 and BFC20000 respectively. The Main region contains 5 64Kb sectors (starting at BFC30000) and optionally all 8 64Kb sectors of the second FLASHram (starting at BFC80000). If this second chip is present there is then a total of 13 64Kb sectors for the main raid code. The Manufacture ID and Device ID for the AMD Am29F040 is 01 and A4 respectively. Thus the text "Flash: 01A4" is printed on either of the E command or the W command accessing this device. Attempting to erase or write to the Boot sector will always cause an "Illegal Flash RAM address range".   + E sector_address: An entire region can be bulk erased by the 'E' command by specifying an address in that region, the main region will erase all sectors starting with that sector containing the nominated address and erase to the end of that region, but not crossing a chip boundary. Thus the 'E BFC40000' command will erase the last 4 sectors of the first chip. The 'E BFC30000; E BFC8000' commands will erase the last 5 sectors of the first chip and then erase the entire second Flash RAM chip (if it is present). Attempting to erase a non-existent second chip cannot report an error, however a negative number is printed to indicate other errors, as described in the following table. Error Fault Description -1 Illegal Flash RAM address range. -2 A sector within given range that is Write Protected.   + W sector_address source_address byte_count: This command will copy byte_count bytes of data from source_address to EEPROM locations starting at sector_address. The number of bytes copied or on error, a negative number will be printed. The negative error numbers are described in the table below. Optionally a W command may appear without any arguments, in this case the sector_address will be that specified by a previous E command and the source_address and byte_count will be that given to and that printed by, respectively, a previous T command (see below). This command, unlike the E command, will modify the contents of the second Flash RAM chip, if it is implied by sector_address plus byte_count being equal or greater than the CPU address BFC80000. An attempt to write with a byte_count of 0 is not an error, and will cause only the Manufacturer and Device ID to be printed. Error Fault Description -1 Illegal Flash RAM address range -2 A sector within the address range that is Write Protected -3 On a failed Byte Program command   * NETWORK COMMANDS: The second RS232 port on the RaidRunner can be used to download programs and data into memory from a host. This port can run the TCP/IP SLIP protocol and the Trivial File Transfer Protocol (TFTP) is available for file transfer. Three variable are available to effect download file transfers from a remote host. These are the remote host IP address, the local IP address (of the RaidRunner) and the name of the file on the remote host that is to be downloaded. These variables have default values of C02BC60E (192.43.198.14), C02BC6FD (192.43.198.253) and /usr/raid/lib/raid.bin where are the remote host IP address, local IP address and remote host's file to down- load respectively. These default values are dynamic in that, if you change them, they will revert back to their default values at power-on or whenever the boot monitor is started. The following commands manage these variables and file transfers.   + A [remote_ip_address [local_ip_address]]: With no arguments, this command prints, in hexadecimal, the dynamic remote and local IP addresses used by TFTP and SLIP. To change the remote IP address, execute this command with only one argument - the IP address (in hexadecimal) of the remote host. To change the local IP address (of the RaidRunner), the execute this command with two arguments, the first being the IP address of the remote host and the second being the IP address of the RaidRunner. Remember both addresses must be in hexadecimal.   + F [filename]: With no arguments, this command prints the filename of the file to download from the remote host. To change the filename, specify it as the argument to this command. Remember, the permissions on this file on the remote machine, must be set to allow remote TFTP access by the RaidRunner.   + P: This command will ping (send an ICMP ECHO request packet and time the response) the remote internet host. If the remote host responds, the message Sending echo request... Got response after 0x0000nn ms will be printed. nn is the number of milliseconds it took for the remote host to respond. If the remote host does not respond within 10 seconds, the message Sending echo request... timeout after 10 seconds will be printed.   + T start_address end_address: This command will transfer from the remote host the stored filename into a block of memory starting at   + start_address and limited to end_address. The number of bytes that will be transferred will be either the size of the remote file or the value computed by end_address - start_address + 1, whichever is the lessor. Whilst the data is being transferred, a dot ('.') will be printed for each packet of data transferred. At the end of a successful transfer, the number of bytes transferred will be printed. If the transfer fails, an appropriate error message will be printed. The following two examples show the transfer of the raid binary and the transfer of the RaidRunner's /bin directory from a remote host. The remote host's IP address is 192.43.198.101 (C02BC665), the RaidRunner's IP address is 192.43.198.200 (C02BC6C8) and the two files concerned will be /usr/raid/lib/raid.bin and /usr/raid/lib/ raid.rc (raid binary and /bin directory respectively). The raid.bin file will be written into the bank of flash ram starting at address bfc10000 and the raid.rc file will be written into the two consecutive banks of flash ram at address bfc08000 and bfc0c000. {rmon} {first transfer the raid binary - /usr/raid/lib/raid.bin} {rmon} F /usr/raid/lib/raid.bin {rmon} A C02BC665 C02BC6C8 {rmon} P Sending echo request... Got response after 0x000013 ms {rmon} T 80300000 803fffff {rmon} E bfc10000;  {erase the Flash EEPROM address for raid.bin to be stored} ** 70000 {rmon} W bfc10000 80300000 5BDA0;  {copy downloaded raid.bin into flash} 5BDA0 {rmon} {rmon} {now transfer the raid /bin directory - /usr/raid/lib/ raid.rc} {rmon} F /usr/raid/lib/raid.rc {rmon} A C02BC665 C02BC6C8 {rmon} P Sending echo request... Got response after 0x000013 ms {rmon} T 80300000 803fffff {rmon} E bfc08000;  {erase the two consecutive blocks of flash ram where raid.rc} 4000 {rmon} E bfc0c000; {is to be stored} 4000 {rmon} W bfc08000 80300000 4000; {copy the first 0x4000 bytes} 4000 {rmon} W bfc0c000 80304000 4000; {copy the next 0x4000 bytes} 4000 {rmon}   + Sno ABC-12345: Prints / sets the serial number. Without an argument the current serial number will be printed, if no serial number has been set then nothing is printed. If there any trailing characters after 'Sno' command, then this will indicate that these characters are to be set permanently as a serial number. A serial number may only be set once, there after any attempt to set another one will be ignored and the current serial number printed. If the serial number has to be re-set then the flashram needs to be reblown as if it were brand new. Note that when setting the the serial number for the first time, the exact format of the command is very important. There will be exactly one space after the the command 'Sno', after which the next 12 printable characters are taken as the serial number. If there are less than 12 characters (before the carriage return), then blank characters are padded on the right-hand side. It is suggested that a serial number be no greater than 10 characters and consist of upper case letters, decimal digits and only a limited number of special characters such as '-' or '.' characters. The main raid code will truncate the serial number to 10 characters if a revision level is present. The main raid code will, if the revision level is present, append the revision level immediately after the serial number and present the combined string as the serial number in an inquiry command.   + Rev A: Prints / sets the board revision level. Without an argument the current revision level will be printed as an upper case letter (from 'A' to 'Z'). If immediately following the 'Rev' command there is a space and a single upper case letter then this will indicate that this revision level is to set permanently. The revision level can be increased (e.g. from 'A' to 'B' to 'C' etc.), but can not ever be decreased.   * DIAGNOSTICS: x000 General form of a diagnostic command it that is it preceded by an x and then immediately followed by a hexi-decimal no. Most diagnostics will loop permanently requiring a power off/on cycle to stop the diagnostic.   + x000 chip: Will continually assert / de-assert most SCSI bus signals. These include DB0-15, I/O, REQ, C/D, SEL, MSG, RST, ACK, BSY, ATN. The DP0 and DP1 (i.e. parity bits) can not be assert / de-asserted in this fashion.   + x001 chip: Will continually assert / de-assert most SCSI bus signals in such a fashion as to indicate the number of the SCA connector that this pin would be connected to. This count is w.r.t. to the LED1 and LED2 when used to trigger a scope (on the falling edge). Each complete group of ten counts are coalesced to facilitate counting on a scope. The counts are:- 7  - DB11       14 - SEL         21 - DB7        28 - DB0 8  - DB10       15 - MSG         22 - DB6        29 - DP1 (*2) 9  - DB9        16 - RST (*1)    23 - DB5        30 - DB15 10 - DB8        17 - ACK         24 - DB4        31 - DB14 11 - I/O        18 - BSY         25 - DB3        32 - DB13 12 - REQ        19 - ATN         26 - DB2        33 - DB12 13 - C/D        20 - DP0 (*2)    27 - DB1 Note that the RST line (marked *1 above) is pulsed only once after the last pin is pulsed (i.e after DB12) and that the parity lines (marked as *2 above) can not be pulsed.   + x002 dst_addr src_addr length: Calls the C library routine memcpy with three arguments of destination address, source address and length (all in hex).   + x003 200000 ffffff and x004 200000 ffffff: Two commands are used to exercise Battery Backed-up DRAM. The x003 command will set the memory range as suggested above. The value of 200000 is one location past the last used flash monitor memory location, ffffff is the last memory location of 8Mb of DRAM (this latter figure can be varied according to available DRAM memory). The DRAM starting at 0 will be modified to contain :- 00: 0x3C1A9FC0   [ lui k0,0x9fc0 ] 04: 0x3C1B3C1A   [ lui k1,0x3c1a ] 08: 0xAC1B0000   [ sw  k1,(zero) ] 0C: 0x3C01A000   [ lui at,0xa000 ]  Location 0 is the start of the restart pseudo interrupt handler. The x004 command is then used after restoring power to check for this bit pattern. It is important that the arguments given to x003 are exactly the same as those given to x004. Any words detected that are different are printed, if there are no differences then nothing (but the next prompt) is printed.   * NOTE: As a side effect of the above executing the above restart pseudo interrupt handler on restoring power, location 0 will be patched with 0x3C1A0000 to prevent memory tests and autobooting the main raid code. This feature can be re-enabled by depositing 0 into location 0 (i.e `pl 0 0`). 3.9.78  RRSTRACE - disassemble scsihpmtr monitor data   * SYNOPSIS: rrstrace -f monitor_file [-b bcnt] [-v]   * DESCRIPTION: rrstrace will disassemble the monitoring data collected by scsihpmtr which has been transferred to a host via ssmon. rrstrace will additionally analyze all scsi reads and look for sequences of reads and report them. This repeating sequence is referred to as a 'natural read'. This data can be then used for a more optimal configuration of the raid. The natural reads are reported via a starting block size followed by the total length (in blocks) of the natural read, how many reads made up the natural read and finally how many times it occurred. At the end the number of natural reads found are printed along with how many were looked for and how many actually existed. By default up to 256 different natural reads can be scanned for, but if more are required then the -b bcnt option can be given to allocate a deeper search. Note that a natural read that occurs only once is recorded but not printed out.   * OPTIONS:   + -f monitor_file: Specify the name of the file from which the data is to be read   + -b bcnt: Specify a larger natural read analysis count   + -v: Turn on verbose mode and hence print out every scsi instruction. In verbose mode, each line of output from rrstrace is of the form time chipno lun ssid msg_id tag command, where:   o time: is the time the command occurred   o chipno: is the chip number (hostport) upon which the command came in   o lun: is the target lun the command was directed to   o ssid: is the scsi id of the initiator (i.e host)   o msg_id: is the scsi message id of the command   o tag : is the scsi tag number of the command or -1 if no tag is associated   o command: is the scsi command itself in text or if unknown to rrstrace the 12 byte scsi command block will be printed. In the case of read and write commands, the block address and length is printed along with the block address of the next sequential read/ write command if the next command were to be sequential (this address is parenthesized). If the command has the 'Force Unit Access' bit set, then FUA is appended to the line. The scsi Read_6, Write_6, Read_10 and Write_10 commands will be printed as R6, W6, R10 and W10 respectively.   * SEE ALSO: ssmon, scsihpmtr 3.9.79  RSIZE - estimate the memory usage for a given raid set   * SYNOPSIS: rsize type cachesize iosize nbackends   * DESCRIPTION: rsize prints an estimate of the amount of memory (in bytes) a running raid set will use, given the raid set type, the amount of cache allocated to it, the raid set iosize and the number of backends in the raid set.   * SEE ALSO: rconf 3.9.80  SCN2681 - access a scn2681 (serial IO device) as console   * SYNOPSIS: bind -k {scn2681 [ value ]} bind_point   * DESCRIPTION: scn2681 allows the target hardware using the SCN2681 serial IO chip to be accessed via the bind_point in the K9 namespace. If no optional argument is given channel A in the chip is used. If value is 0 then channel A (serial port) is used. If value is 1 then channel B (serial port) is used. If value is 2 then the input port (parallel) is used. If value is 3 then the output port (parallel) is used. The special file scn2681 is only available when the target is special hardware (containing a SCN2681).   * SEE ALSO: cons 3.9.81  SCSICHIPS - print various details about a controller's scsi chips   * SYNOPSIS: scsichips   * DESCRIPTION: scsichips prints details about the RaidRunner scsi chips.   * OPTIONS: Three sets of numbers are printed. The first two numbers are the number of hostports (chips) and the number of backend channels (chips). The rest of the numbers printed are the hostport then backend channel chip indexes. The chip addresses can be used in binding them to a entrypoint in the kernel.   * EXAMPLES: For a RaidRunner controller with one hostport and six backend channels scsichips will print 1 6 6 0 1 2 3 4 5 which says that there is 1 hostport chip, 6 backend channels, the address of the hostport chip is 6 and the addresses of the backend channel chips are 0, 1, 2, 3, 4 and 5. For a RaidRunner controller with two hostports and six backend channels scsichips will print 2 6 6 7 0 1 2 3 4 5  which says that there are 2 hostport chips, 6 backend channels, the addresses of the hostport chips are 6 and 7 and the addresses of the backend channel chips are 0, 1, 2, 3, 4 and 5.   * SEE ALSO: hwconf, scsihpfs, scsihdfs 3.9.82  SCSIHD - SCSI hard disk device (a SCSI initiator)   * SYNOPSIS bind -k {scsihd chipNumber scsiTargetId [scsiTargetLUN]} bind_point cat bind_point ctl raw partition data rconfig   * DESCRIPTION: scsihd is a SCSI hard disk device. In SCSI jargon it is called the "initiator" while the disk it is talking to is referred to as the "target". The disk is known in SCSI as a "direct access device". Currently this device adopts the SCSI id 7 and attempts to access a disk with a SCSI id of scsiTargetId. The "logical unit number" (LUN) within the scsiTargetId that is accessed is scsiTargetLUN (or LUN 0 if not given). Multiple SCSI buses are supported (with typically one device per bus) and are distinguished by the chipNumber. The SCSI disk device is bound into the namespace at bind_point. A one level directory is made at the bind_point containing the files: "raw", "data", "partition", "rconfig" and "ctl". The "partition" file is a small portion at the end of the disk used for storing partition information (in plain ASCII). The "ctl" and "rconfig" files are for internal RaidRunner use. (The "rconfig" file will be a backup copy of the RaidRunner configuration area). The "data" file is the usable part (i.e. the vast majority) of the space available on the SCSI disk connected to the SCSI bus on which this device is an initiator. The file "raw" addresses the whole disk less the partition block. The current raid hardware has six fast wide SCSI-2 buses numbered 0 to 5 and two fast wide buses num- bered 6 and 7. Usually the first six fast wide SCSI-2 buses are set up as initiators (i.e. bind scsihd) while the two fast wide SCSI buses are set up as targets (i.e. bind scsihp). Current implementations make the device name hd synonymous with scsihd. Therefore: bind -k {hd chipNumber scsiTargetId [scsiTargetLUN]} bind_point has the same effect as the similar line in the synopsis.   * SEE ALSO: scsihp 3.9.83  SCSIHP - SCSI target device   * SYNOPSIS bind -k {scsihp chipNumber scsiid} bind_point cat bind_point scsiid # this device's scsi id cat bind_point/scsiid 0 ... # this device's LUN within SCSI id scsiid 7 cat bind_point/scsiid/0 data # scsi data channel for SCSI id scsiid LUN 0 cmnd # scsi command channel for SCSI id scsiid LUN 0   * DESCRIPTION: scsihp is a SCSI target device. A "target" in SCSI jargon performs commands issued by an "initiator" which is usually a general purpose computer. Normally a SCSI target is an integral part of a "direct access device" (i.e. a disk) but in a raid, the controller needs to look like a target to external users. Multiple SCSI buses are supported (with typically one device per bus) and are distinguished by the chipNumber. The SCSI target device is bound into the namespace at bind_point. A three level directory is made at the bind_point. The first level is the SCSI id (as specified by scsiid) for this device. The second level is the LUN within that scsi id. The third level contains the files: "data" and "cmnd" which are used for SCSI data phases and command/status phases respectively. A scsi target command interpreter called stargd is designed to "mate" with this device. The current raid hardware has six fast narrow SCSI buses numbered 0 to 5 and two fast wide buses num- bered 6 and 7. Usually the six fast narrow SCSI buses are set up as initiators (i.e. bind scsihd) while the two fast wide SCSI buses are set up as targets (i.e. bind scsihp).   * SEE ALSO: scsihd, stargd 3.9.84  SET - set (or clear) an environment variable   * SYNOPSIS   + set name   + set name value ...   * DESCRIPTION: set allows the value (or list of values) associated with an environment variable to be changed. When no value is given then the environment variable name has its value set to NIL. When one value is given then the environment variable name is set to that value. When multiple values are given then the environment variable name is set to that list of values.   * NOTE: set is a "built-in" in the Husky shell. This means a new shell is _not_ thrown to execute this command unless some other action forces it. Setting an environment variable in a sub-shell has no effect once the sub shell exits to its parent shell. This is why throwing a sub-shell to execute set would be pointless. Operations such as command substitution (i.e. `set ...') and command line file redirection force a command to be executed in a sub-shell and hence defeat the purpose of set. Hence the set command (or at least "built-in") should appear in a simple expression to avoid surprises. 3.9.85  SCSIHPMTR - turn on host port debugging   * SYNOPSIS:   + scsihpmtr -c portnumber storage_size   + scsihpmtr -b   + scsihpmtr -d   + scsihpmtr -e   + scsihpmtr -p count   + scsihpmtr -r   + scsihpmtr -w   * DESCRIPTION: scsihpmtr, if enabled in the RaidRunner kernel, will create a volatile internal store and save details of all scsi commands which come in on a specified host port or set of host ports. This data can then be uploaded to the host via the scsi monitor (ssmon) and analyzed as appropriate.   * OPTIONS:   + -c portnumber storage_size: Specify and allocate a volatile storage area of storage_size bytes and monitor and save all scsi commands that appear on the given portnumber. The portnumber is the chip number of the host port on the raid controller. If you specify a port number of 8, then all host ports will be monitored.   + -b: Perform natural read analysis for use in possible reconfiguration of RaidRunner.   + -d: Delete any volatile storage containing monitored data and hence stop monitoring.   + -e: Temporarily disable the monitoring (and hence storage) of scsi commands. Monitoring must be disabled BEFORE uploading the monitored data to the host via ssmon.   + -p count: Print out both details of what is being monitored and the first count stored scsi commands. The details printed are the total number of commands that can be recorded, the current number of recorded commands, the current monitoring state (0 disabled, 1 enabled) and the monitored host port number. Each line of scsi commands printed will time the command occurred a hex word containing lower 3 bits the chip number (hostport) the command was on next 3 bits the scsi id of the command initiator (i.e host) next byte is the scsi message id of the command next two bytes is the scsi tag number of the command a 12 byte scsi command header block   + -r: Reset the monitoring of scsi commands. This will cause all previously monitored data to be lost.   + -s: Re-enable the monitoring (and hence storage) of scsi commands.   + -w: Enable wrap around mode. By default, scsihpmtr will stop recording scsi commands once it has used all the storage area allocated. If wrap around mode is enable, then scsi commands will continue to be recorded, wrapping round the storage area.   * SEE ALSO: smon, ssmon, SCSI Standards for formats of scsi commands. 3.9.86  SETENV - set a GLOBAL environment variable   * SYNOPSIS: setenv name value ...   * DESCRIPTION: setenv allows the value (or list of values) associated with a GLOBAL environment variable to be changed (or created). When one value is given then the GLOBAL environment variable name is set to that value. When multiple values are given then the GLOBAL environment variable name is set to that list of values.   * NOTE: A GLOBAL environment variable is one which is stored in a non volatile area on the RaidRunner and hence is available between successive power cycles or reboots. These variables ARE NOT the same as husky environment variables. The non volatile area is co-located with the RaidRunner Configuration area.   * SEE ALSO: printenv, unsetenv, rconf 3.9.87  SDLIST - Set or display an internal list of attached disk drives   * SYNOPSIS   + sdlist   + sdlist -p   + sdlist backend_triplett_list   * DESCRIPTION: sdlist maintains a list of disk backends, in rank, chip, scsi lun form for a RaidRunner. This list is used by various commands to save those commands from continuously probing for all possible backends. When run without arguments, sdlist zero's the internal list and probes for all disk backends rebuilding the list. Typically, sdlist will be run with a list provided to it by way of it's arguments. For example, on a RaidRunner with one rank (at scsi id 1) and six disks with all luns at zero the following would be run. sdlist `hwconf -D' which would equate to sdlist 0.1.0 1.1.0 2.1.0 3.1.0 4.1.0 5.1.0 and those six tripletts would be stored in the list, with sdlist having already deleted any already stored list. To see what this internal list contains, the -p option can be given and the list will be printed. This command is typically executed during autoboot and would not be executed interactively unless the user is performing unusual backend manipulations and is debugging the process.   * SEE ALSO: rconf 3.9.88  SETIV - set an internal RaidRunner variable   * SYNOPSIS:   + setiv   + setiv name value   * DESCRIPTION: setiv allows the value of an internal RaidRunner variable to be changed or lists all changeable internal variables (no argument). When a value is given then the internal RaidRunner variable name is set to that value. If no value is given the all settable internal variables are listed.   * NOTES: As different models of RaidRunners have different settable internal variables see your RaidRunner's Hardware Reference manual for a list of variables together with the effect of changing them. These variables are run-time variables and hence must be set each time the RaidRunner is booted.   * SEE ALSO: getiv 3.9.89  SHOWBAT - display information about battery backed-up ram   * SYNOPSIS: showbat [-a] [-n] [-s] [-t]   * DESCRIPTION: showbat will print (to standard output) information about the installed battery backed-up ram.   * OPTIONS :   + -a: Print address information about battery backed-up ram. The addresses printed are the base address, start of tag address, start of reserved area address, start of cache data address and the first unused byte address.   + This is the default if no options are given.   + -n: Print the number of tags in battery backed-up ram.   + -s: Print the status of the battery backed-up ram.   + -t: For each tag in battery backed-up ram, print it's value and the location of it's data along with details of the cache range it belongs to. If the cache filesystem has not been created and the ranges added then this option will only print out the tag addresses and inform you that no ranges are available. If the data associated with the tag has partially written data in it, the tag will be flagged in the list.   * SEE ALSO: cachedump, cacherestore, cacheflush 3.9.90  SHUTDOWN - script to place the RaidRunner into a shutdown or quiescent state   * SYNOPSIS: shutdown   * DESCRIPTION: shutdown is a husky script which is used to place the RaidRunner into the "shutdown" or quiescent state. The "shutdown state" is one in which the raid sets configured cannot be accessed by a host and all devices (disks) attached to the RaidRunner are in a low power usage state (i.e spun down). shutdown first set's all stargd's to a spin state of 2 which means that all host scsi medium-access commands will result in a CHECK CONDITION status and set the mode sense key to NOT READY and the additional mode sense code and code qualifier to "LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED". It will then flush the cache and finally spin down all backend devices attached. Note that the scsi monitors - smon, are not effected.   * SEE ALSO: stargd, mstargd, cache, spind 3.9.91  SLEEP - sleep for the given number of seconds   * SYNOPSIS: sleep seconds   * DESCRIPTION: sleep causes the current K9 process to "sleep" for the given number of seconds. The argument seconds may be any integer or a fixed point number (e.g. 1.5). The resolution of timing is one clock tick (which varies according to the target hardware). If the sleep is interrupted before it completes then a "sleep interrupted" status message is returned otherwise NIL (i.e. true) is returned.   * SEE ALSO: K9sleep 3.9.92  SMON - RaidRunner SCSI monitor daemon   * SYNOPSIS: smon [-b blkprotocol] [-d debug_level] -m moniker -h hport [-n] [-r] [-1 stdout_fifo_file] [-2 stderr_fifo_file] [-3 status_file] -s capacity device id lun   * DESCRIPTION: smon is a daemon process that simulates a SCSI-2 disk with modified Read and Write SCSI commands that implement a simple protocol which provides file transfer to and from the RaidRunner and remote command execution on the RaidRunner. The protocol is based on the SCSI Read and SCSI Write starting block addresses. smon usually works closely with a special device (file system) that does the hardware interfacing and message interpretation for a SCSI-2 target (see scsihp). smon has 3 mandatory switches, 8 optional switches and 3 mandatory positional arguments. The mandatory switches are the -s switch which specifies the size of the SCSI-2 monitor store, the -m switch which specifies the moniker (name) of the target and the -h switch which specifies the controller host port the monitor will communicate over. The 3 mandatory positional arguments are device, id and lun. These arguments are concatenated together to make the filename of the low level SCSI target device driver. The second argument is the SCSI identifier number this target will respond to while the third argument is the logical unit number (LUN) within that SCSI identifier which the target represents. An error will occur during daemon initialization unless the following files (with appropriate characteristics) are found:   + device/id/lun/cmnd   + device/id/lun/data   * OPTIONS The optional switches may appear in any order but if present must be before the positional arguments.   * -b blkprotocol: Change the block numbers used in the protocol (see below). By default the block numbers used in the protocol are 7, 8, 9 10, 11, 12, 13, 14 and 15 for protocol elements BLK_STDOUT, BLK_STDERR, BLK_STATUS, BLK_ENDPROC, BLK_UPLOAD, BLK_STDIN, BLK_MONITOR BLK_RESET and BLK_RCONFIG respectively. To change this default list of block numbers, specify a comma separated list of nine(9) unique block numbers.   * -d debuglevel: This switch is a debug flag. When given the command line arguments and other information derived during initialization are echoed back to standard out and the debug level is set to 1. The debug levels are:   + 0 no debug messages (default when "-d" not given)   + 1 debug messages on all non-read/write commands   + 2 debug messages on all commands   * Debug messages are typically a single line per command and are sent to standard out. Serious errors are reported to standard error irrespective of the current debug level. If the error is related to incoming data (from the SCSI initiator) then the daemon will continue. If the error cannot be recovered from, then the daemon terminates with an error message as its exit status. The debug level can be changed during the execution of a smon daemon by using the "-d" option on mstargd.   * -m moniker: This switch associates the moniker (i.e name) with the scsi monitor that smon is about to execute on. It provides a method of naming scsi monitors. The moniker can be a maximum of 32 characters. This switch is mandatory.   * -h hport: This switch advises smon which controller host port number it will be communicating over. This switch is mandatory.   * -n: This switch stops statistics being gathered. The default action is to collect statistics.   * -r: This switch is for restarting the daemon with as little disturbance to the initiator as possible. Usually the SCSI Unit attention condition is set on reset (and when this daemon commences). When this switch is given the daemon starts in the idle state. Use with care. The default action (i.e. when the "-r" is not given) is to set a SCSI Unit attention condition so that the first SCSI command received will have its status returned as "Check condition" (as required by the SCSI-2 standard).   * -1 stdout_fifo_file: Specify the name of the fifo data file where the standard output of any executed commands is to be sent (and read from).   * -2 stderr_fifo_file: Specify the name of the fifo data file where the standard error output of any executed commands is to be sent (and read from).   * -3 status_file: Specify the name of the fifo data file where the exit status of any executed command is to be written to (and read from).   * -s capacity: This mandatory switch informs the daemon what the size in blocks this SCSI target will represent. The value for capacity effects the response this target makes to the SCSI Read Capacity and Mode Sense (page 4) commands. To simplify writing out large numbers certain suffixes can be used, see suffix.   * PROTOCOL: smon appears as a disk with the given capacity, scsi id and scsi lun to a host computer. By reading and writing to specific block locations on this disk, the host computer can send and have executed husky commands on the RaidRunner and read back the command's standard output, error and status. Additionally files can be both downloaded from the host onto the RaidRunner and uploaded from the RaidRunner onto the local host computer. All reads and writes by the host need to be in multiples of 512 byte blocks with null padding as appropriate. When WRITEing (from the host) to smon, the data, depending on the block number, contained in the write will be treated as follows -   + BLK_ENDPROC: Will kill any processes, started by smon, currently running on the RaidRunner. The contents of the write's data will be read but ignored.   + BLK_RESET: Will terminate any file transfers, kill any process(s) (started by smon) currently running on the RaidRunner, close any files opened by smon and reset smon to it's initial state. The contents of the write's data will be read but ignored.   + BLK_STDIN: If smon is in DOWNLOAD file transfer mode, the contents of the write's data will be written out to the download file on the RaidRunner. If not in DOWNLOAD file transfer mode, will asynchronously execute the contents of the write as a husky command. Before execution of the command, the command is checked to see if it is an internal file transfer smon command, and if so, will initialize for file transfer.   + Any other: Will perform the write to the internal store. Up to 16 blocks of data will be stored in the raid configuration area.   + When READing (by the host) from smon, the data, depending on the block number, contained in the read will be treated as follows -   + BLK_STDOUT: This read will return the currently available standard output from the previously executed command. If the number of bytes of stan­ dard output is less than the size of the read's data the data will be padded with NULL's ('\0'). If the number of bytes of standard output is more than the size of the read, then subsequent reads from BLK_STDOUT will transfer the rest of the standard output. Normally one would loop reads from BLK_STDOUT until the last character in the read buffer is NULL. If the previous command was a write to start a download, then a read of one block will next be expected and will return either a block of NULL's or an error message indicating a problem with the initialization of the download.   + BLK_STDERR: This read will return the currently available standard error out­ put from the previously executed command in the same manner as BLK_STDOUT.   + BLK_ENDPROC: If the previously executed command (write to block BLK_STDIN) has not completed this read will return with it's buffer starting with "Process not finished" and NULL padded. If the command has completed then it will return a NULL filled buffer.   + BLK_STATUS: A read from this block will return the command's husky exit status (the $status variable).   + BLK_UPLOAD: Will first return either the UPLOAD handshake command (response to a UPLOAD write command) which is of the form UPLOAD nbytes where nbytes is the number of bytes in the file that has just been requested to be uploaded from the RaidRunner or an error message, then all reads from this block onwards will contain the next buffer from the uploaded file.   + BLK_MONITOR: Will first return either the MONITOR handshake command (response to a MONITOR write command) which is of the form MONITORSIZ nbytes where nbytes is the number of bytes of monitor data that will need to be transferred up to the host OR an error message if there is no monitoring data. All subsequent reads will upload the monitoring data.   + BLK_RCONFIG: Successively reading in data from this block will return successive data from the in-core raid configuration area. When a BLK_RESET is written, the internal smon index into the in-core raid configuration area is reset (set to 0). When a number of blocks from BLK_RCONFIG are this internal index is incremented by the amount of data read. If more data is read than is available in the in-core raid configuration area, then the index is reset to 0.   + Any other: Will perform the read from the internal store.   + Internal Store: Typically, a host computer stores labels, boot and disk partition information on all disks. On most host computers, they usually only store this information in a few blocks either at the start of the disk and/or near the end of the disk. Rather than smon reserving this space as a continuous piece of memory, most of which would not be used, an internal store of 16 512-byte blocks is created and maintains a mapping of host block addresses to the store. This store is copied to the raid configuration area noting the controller number and host port number smon is running on. If the store fills as a host computer writes many blocks to the smon disk, the additional data is discarded. If a host computer attempts to read blocks that have not been stored, then null fill data is returned.   * SIGNALS: Two K9 signals are interpreted specially by smon.   + If the signal K9_SIGINT is received then smon will check to see whether there is a command in progress and if so terminate it with a "Check condition" status and set the associated sense key to "Command aborted". smon will then return to its command (waiting) mode.   + If the signal K9_SIGTERM is received then smon will do the same as it does for K9_SIGINT except it will exit the process rather than going to command (waiting) mode. The SCSI initiator should interpret a sense key of "Command aborted" as the target unilaterally aborting a command in progress. The SCSI-2 standard suggests the initiator should retry the aborted command.   * EXAMPLE smon -s 2080 -h 1 -m SMON167 /dev/hostbus/1 6 7 This line will invoke this daemon and try and open the following files: /dev/hostbus/1/6/7/cmnd and /dev/hostbus/1/6/7/data The "-s 2080" switch instructs this daemon to tell SCSI initiators that it is a 1040K disk.   * SEE ALSO: scsihp, suffix, mstargd, stargd, mconf 3.9.93  SOS - pulse the buzzer to emit sos's   * SYNOPSIS: sos [count]   * DESCRIPTION: sos will sound the buzzer in a plaintif "SOS" in Morse. If a count is given, it will repeat count times. The default count is 3.   * SEE ALSO: buzzer, warble 3.9.94  SPEEDTST - Generate a set number of sequential writes then reads   * SYNOPSIS: speedtst -d device -n io_cnt -X bufsize [-o offset] [-STRW]   * DESCRIPTION: By default speedtst will perform io_cnt sequential writes of size bufsize bytes onto the device from the start of the device. It will then do io_cnt sequential reads of size bufsize bytes from the start of the device device. By default reads and writes are not checked for success. The output of this command provides the I/O transfer rates in Megabytes (MB) and million bytes (mb) for both the write and read sequences. The bufsize and offset values may have a suffix.   * OPTIONS:   + -o offset: All reads and writes are to be performed offset bytes into the device. All reported values will be relative to this offset. The default is 0 i.e the start of the device.   + -r drives: The bufsize (set by -X bufsize) is expected to be to iosize of a Raid 5 raid set where drives is the number of data and parity drive. A single bufsize operation is performed to a drive in each strip of the raidset in such a fashion that either the data drive and the parity drive (the only two drives written too in a write operation) will not be the same drives as would have been written too, if the previous operation on the previous stripe had been a write. The default value is 0, causing normal sequential operations. This option is used to simulate random operations on a raidset without the penalty of significant seek overheads.   + -R: By default speedtst will perform a write test then a read test. Specifying this option, the write test will NOT be performed.   + -S: Normally the device, device, is opened in a default manner which may allow the host operating system to provide buffers when writing data to it. This may result in incorrect I/O through put rates. By setting this option, the device is opened in such a way to ensure that writes do not use any buffering and the individual write system calls do not return until the data is on the device.   + -T: Normally the read and write operations are NOT checked for success, that is the return values for the read and write system calls are not checked for errors. To ensure each read and write is checked for any system error, use this flag.   + -W: By default speedtst will perform a write test then a read test. Specifying this option, the read test will NOT be performed.   * SEE ALSO: randio, suffix 3.9.95  SPIND - Spin up or down a disk device   * SYNOPSIS:   + spind up c.s.l   + spind down c.s.l   * DESCRIPTION: spind will either spin down or spin up a nominated disk drive via the SCSI-2 START command.   * OPTIONS:   + up: Spin up the disk drive. If the drive is already spun up nothing will occur.   + down: Spin down the disk drive. If the drive is already spun down nothing will occur.   + c.s.l: Identify the disk device by specifying it's channel, SCSI ID (rank) and SCSI LUN provided in the format "c.s.l"   * SEE ALSO: Product manual for disk drives used in your RaidRunner. 3.9.96  SPINDLE - Modify Spindle Synchronization on a disk device   * SYNOPSIS   + spindle -c -p c.s.l [-o rpl_offset]   + spindle -m|M -p c.s.l [-o rpl_offset]   + spindle -s -p c.s.l [-o rpl_offset]   + spindle -g -p c.s.l   * DESCRIPTION: spindle will either modify or report on a disk device's spindle synchronization. This command will set or clear the Rotational Position Locking (RPL) bits in the disk device's GeometryParameter's mode sense page. If spindle correctly modifies the device's RPL bits, it will print out the resultant RPL bits and the Rotational Offset byte of the device's Geometry Parameter mode sense page.   * OPTIONS:   + -p c.s.l: Identify the disk device to modify by specifying it's channel, SCSI ID (rank) and SCSI LUN provided in the format "c.s.l"   + -c: Clear the RPL bits.   + -s: Set the RPL bits to 01b (0x1) which typically sets the device to operate as a synchronized-spindle slave.   + -m: Set the RPL bits to 10b (0x2) which typically sets the device to operate as a synchronized-spindle master.   + -M: Set the RPL bits to 11b (0x3) which typically sets the device to operate as a synchronized-spindle master control.   + -g: Get and print the current RPL bits and the Rotational Offset byte from the device's Geometry Parameters Page.   + -o rpl_offset: When clearing (-c) or setting (-m, -M, -s) spindle synchronization additionally set the rotational off- set to the given value. Must be in range from 0 to 256. This value is the numerator of a fractional multiplier that has 256 as it's denominator (eg a value of 128 indicates a one-half revolution skew). A value of zero (0) indicates that rotational offset shall not be used.   * SEE ALSO: The mode sense geometry page references in the relevant product manual for the disks used in the RaidRunner. 3.9.97  SRANKS - set the accessible backend ranks for a controller   * SYNOPSIS: sranks controller_id:ranklist [controller_id:ranklist]   * DESCRIPTION: sranks allows you to set backend rank restrictions, as specified by the arguments, for controller on which the command is set. Each argument is of the form controller_id the id of the controller (0, 1, ...). ranklist a comma separated list of rank id's (scsi id's) for which the given controller is to have access. sranks will check each argument looking for the controller id corresponding to the controller it's running on and set the backend rank access as per the ranklist. Typically, this command is executed with the output of the BackendRanks GLOBAL environment variable at boot time.j   * SEE ALSO: environ, pranks 3.9.98  STARGD - daemon for SCSI-2 target   * SYNOPSIS: stargd [-c] [-d] -m moniker -h hport -s capacity [-n] [-r] [-L cnt] [-P nprocs] [-R] [-S sectorsize] [-nr nrread:nrlen] [-C] [-stripesize stripesize] [-irgap nblks] [-stripe stripe_args] device id lun store [store2]   * DESCRIPTION: stargd is a daemon process that interprets SCSI-2 commands as a "target". [In simple SCSI configurations the host computer is a SCSI "initiator" while its disk is a SCSI "target".] stargd usually works closely with a special device (file system) that does the hardware interfacing and message interpretation for a SCSI-2 target (see scsihp). When stargd first starts, it's "spin state", is set to indicate that thestore file has yet to "spin-up" and hence is NOT READY. This means that until the spin state is changed to be marked as READY (via mstargd "-o" option ) all supported SCSI-2 medium-access commands will result in a CHECK CONDITION with the sense key set to "NOT READY" and additional sense information set to "LOGICAL UNIT IN PROCESS OF BECOMMING READY" When the spin state is marked as READY all supported SCSI-2 medium-access commands are processed correctly. Additional NOT READY states can be set by mstargd's -Z option.   * OPTIONS: stargd has 3 mandatory switches, 8 optional switches and 4 mandatory positional arguments and an optional trailing positional argument. The mandatory switches are the -m switch which associates the moniker (i.e name) with the raid set that stargd is about to execute on (this provides a method of naming raid sets), the -c switch which specifies the capacity of the backend and the -h switch which advises stargd which controller host port it will be communicating over. The 4 mandatory positional arguments are device, id, lun and store. The first 3 are concatenated together to make the filename of the low level SCSI target device driver.   + The second argument is the SCSI identifier number this target will respond to while the third argument is the logical unit number (LUN) within that SCSI identifier which the target represents. An error will occur during daemon initialization unless the following files (with appropriate characteristics) are found: device/id/lun/cmnd device/id/lun/data   + The fourth positional argument is store. It will typically be a cache device although it could be a single disk or a raid level. The trailing optional positional argument is store2. If the filename store2 can be opened and the "-c" argument is _not_ given then writes to store are echoed to store2. The SCSI commands that cause writes at this level are Write_6 and Write_10. If store2 cannot be opened (for writing) or the "-c" argument is given then a warning message is output during stargd initialization and it continues as if store2 had not been given.   + All switches may appear in any order but if present must be before the positional arguments.   + The "-L cnt" switch indicates that stargd is to implement a lookahead process for SCSI-2 reads (Read_6 and Read_10) which pre-fetches data into the cache based on the last cnt SCSI-2 reads. The default is a readahead of 16 reads (i.e -L 16). The minimum readahead is 2 and the maximum is 63. You can specify readahead to be 0 which turns off all lookahead. This switch can only be used when cache (-c) is being used. The "-nr nread:nrlen" is used to specify 'natural read' information to stargd. Some applications may perform reads of a certain size which is greater than the largest read the host operating system will allow. In this case, the host operating system will break the application's read up into smaller reads. stargd can trigger on a certain discontinuous read (by specifying a size in blocks - nread) and will prefetch prefetch the rest of the application read (nrlen). For example, if an application performs a read of 304 blocks on a host operating system which has a maximum read size of 128 blocks, then you could set the -nr arguments nread: nrlen to 128:304 which will cause stargd to, when it sees a discontinuous read of size 128 blocks, service the read and also prefectch into cache, the next 304 - 128 = 176 blocks of data. See scsihpmtr for analysis of host operating system reads.   + The "-stripesize stripesize" switch informs stargd the stripe size of the raid set which stargd is fronting. If the additional -C switch toggles whether this value (stripesize) will be used to calculate a cylinder size (heads x sectors per track) that ensures that a cylinder is a multiple of the given stripe size. If the stripe size is such that there is no multiple of heads and sectors that, when multiplied together, is not a multiple of the stripe size, then the default head and sectors per track sizes are used - currently 16 head, 128 sectors per track.   + The "-stripe stripe_args" switch informs stargd that a number of host ports will be concurrently accessing this raid set. This switch is only usefull when the host based disk access software can "stripe" io to N different devices and each access (read or write) is always less than a set stripesize.   + The stripe_args are of the form N:S:I=c.h.l,I=c.h.l, where: N is the number of stripes (or host ports) that will be concurrently accessing the raid set, S is the stripe size (in 512-byte blocks) that the host will perform i/o in, I is an index or "stripe number" specifier ranging from 0 to N. c.h.l is the controller, host port, lun triplet, associated with the given index. For example, if we have two host ports and the given stripe-size from the host is, say, 1368 blocks, we would then have the following raid: set additions (in agui form) Host Interfaces (2): M 0.0.0 M 0.1.0 Additional stargd args: -stripe 2:1368:0=0.0.0,1=0.1.0 NOTE: that it has to be guaranteed that the host stripe software will send block 0 (size 1368 or less) to controller, host port, lun triplet 0.0.0 and block 1 (size 1368 or less) controller, host port, lun triplet 0.1.0, block 2 (size 1368 or less) to controller, host port, lun triplet 0.0.0, and so forth.   + The "-C" switch is used to toggle whether stargd calculates a cylinder size such that it is a multiple of the given stripe size. The default is to calculate a cylinder size based on the given stripe size.   + The "-irgap nblks" switch, specifies the inter-read gap, nblks, (in blocks). When sequential reads arrive from a host there may be a small gap between successive reads. Normally the lookahead algorithm will ignore these gaps providing they are no larger than the average length of the group of sequential reads that have occurred. By specifying this value, you can increase this gap. This switch has no meaning if the lookahead feature is turned off (i.e specifying "-L 0").   + The "-R" switch indicates that store should have read-only access. If this switch is NOT present, then the store is assumed to have read-write access.   + The "-c" switch indicates that store is a cache (rather than a disk or something else remote). This knowledge decreases the number of internal copies this daemon needs to do so it is a performance enhancement. The default (i.e. when this switch is not present) is to do the extra copy which is the safe course if the exact identity of the store is uncertain.   + The "-d" switch is a debug flag. When given the command line arguments and other information derived during initialization are echoed back to standard out and the debug level is set to 1. The debug levels are:   o 0 no debug messages (default when "-d" not given)   o 1 debug messages on all non-read/write commands   o 2 debug messages on all commands Debug messages are typically a single line per command and are sent to standard out. Serious errors are reported to standard error irrespective of the current debug level. If the error is related to incoming data (from the SCSI initiator) then the daemon will continue. If the error cannot be recovered from, then the daemon terminates with an error message as its exit status. The debug level can be changed during the execution of a stargd daemon by using the "-d" option on mstargd.   + The "-m" switch associates the moniker (i.e name) with the raid set that stargd is about to execute on. It provides a method of naming raid sets. The moniker can be a maximum of 32 characters. This switch is mandatory.   + The "-h" switch informs stargd which controller host port it will be communicating over. This switch is mandatory.   + The "-n" switch toggles statistics being gathered. The default action is to collect statistics.   + The "-r" switch is for restarting the daemon with as little disturbance to the initiator as possible. Usually the SCSI Unit attention condition is set on reset (and when this daemon commences). When this switch is given the daemon starts in the idle state. Use with care. The default action (i.e. when the "-r" is not given) is to set a SCSI Unit attention condition so that the first SCSI command received will have its status returned as "Check condition" (as required by the SCSI-2 standard).   + The "-s capacity" switch informs the daemon what the size in blocks this SCSI target will represent. The value for capacity effects the response this target makes to the SCSI Read Capacity and Mode Sense (page 4) commands. This switch is mandatory. To simplify writing out large numbers certain suffixes can be used, see suffix.   + The "-P nprocs" switch causes the daemon to spawn off nprocs copies of it self to allow concurrent processing of SCSI commands. If not specified, the default number of stargd processes created is 4. This value can range from 1 to 8 inclusive. The additional processes will not be created until the first access from the host port. The "-S sectorsize switch informs the daemon what sector size, in bytes, this SCSI target will present. The value defaults to 512 - the typical disk block size. To simplify writing out large numbers certain suffixes can be used, see suffix.   * SIGNALS: Two K9 signals are interpreted specially by stargd.   + If the signal K9_SIGINT is received then stargd will check to see whether there is a command in progress and if so terminate it with a "Check condition" status and set the associated sense key to "Command aborted". stargd will then return to its command (waiting) mode.   + If the signal K9_SIGTERM is received then stargd will do the same as it does for K9_SIGINT except it will exit the process rather than going to command (waiting) mode.   + The SCSI initiator should interpret a sense key of "Command aborted" as the target unilaterally aborting a command in progress. The SCSI-2 standard suggests the initiator should retry the aborted command.   * EXAMPLE: stargd -c -s 2M -m RS -h 1 /dev/hostbus/1 6 0 /cache/data This line will invoke this daemon and try and open the following files: /dev/hostbus/1/6/0/cmnd and /dev/hostbus/1/6/0/data The file "/cache/data" will be used as store. The "-c" switch identifies this file id as a cache to stargd. The "-s 2M" switch instructs this daemon to tell SCSI initiators that it is a 1 GigaByte disk (i.e. 2 MegaBlocks).   * SUPPORTED SCSI-2 COMMANDS: The table below lists the supported SCSI-2 commands (Code and Name) and their action under stargd. 00: Test Unit Ready If backend is ready returns GOOD Status, else sets Sense Key to Not Ready and returns CHECK CONDITION Status 01: Rezero Unit      Does nothing, returns GOOD Status 03: Request Sense    Sense data held on a per initiator basis  (plus extra for bad                       LUN's) 04: Format Unit      Does nothing, returns GOOD Status 07: Reassign Blocks  Consumes data but does nothing, returns GOOD Status 08: Read_6           DPO, FUA and RelAdr not supported 0a: Write_6          DPO, FUA and RelAdr not supported 0b: Seek_6           Does nothing, returns GOOD Status 12: Inquiry          Only standard 36 byte data format supported  (not vital product                       data pages) 15: Mode Select      Support pages 1, 2, 3, 4, 8 and 10  (but none writable) 16: Reserve          Doesn't support extents + 3rd parties 17: Release          Doesn't support extents + 3rd parties 1a: Mode Sense       Support pages 1, 2, 3, 4, 8 and 10. 1b  Start Stop       If Start is requested and the Immediate bit is 0 then waits for                       backend to become ready, else does nothing and returns GOOD                       Status. If backend does not become ready within 20 seconds set                       Sense Key to Not Ready and returns CHECK  CONDITION Status 1d  Send Diagnostics Returns GOOD Status when self test else complains  (does nothing                       internally) 25  Read Capacity    RelAdr, PMI and logical address >  0 are not supported 28  Read_10          Same as Read_6 2a  Write_10         Same as Write_6 2b  Seek_10          Does nothing, returns GOOD Status 2f  Verify           Does nothing, returns GOOD Status 55  Mode Select_10   Same as Mode Select      5a  Mode Sense_10    Same as Mode Sense   * SEE ALSO: scsihp, suffix, mstargd 3.9.99  STAT - get status information on the named files (or stdin)   * SYNOPSIS: stat [-b] [file...]   * DESCRIPTION: stat gets status information on each given file, or the standard input when a file named `-' is given, and sends it to the standard output. For files that are found a line with 6 columns is output. The meaning of each column is summarized below:   + 1st filename   + 2nd type of file system containing this file   + 3rd instance of the containing file system   + 4th unique (internal) file identifier   + 5th version number of this file   + 6th length of this file (in bytes)   * If the -b option is given, then the length of each file (6th field) is printed in 512-byte blocks. Only whole blocks are reported, so files with lengths less than 512-byte blocks will report is having a block size of zero (0).   * EXAMPLE : raid; stat /bin/ps ps                               ram    0 0x00049049    2 144 : raid;   * SEE ALSO: K9getstat, intro 3.9.100     STATS - Print cumulative performance statistics on a Raid Set or Cache Range   * SYNOPSIS: stats [-c cache_moniker] [-r raid_set] [-g] [-z]   * DESCRIPTION: stats is a process that will print and or zero the cumulative performance statistics which Raid Set's and Cache Ranges maintain.   * OPTIONS:   + -c cache_moniker: Zero (-z) and or print (-g) the cache statistics of the cache range specified by cache_moniker which was set in the add moniker=cache_moniker first= ... command (see cache). The statistics printed are the number of cache hits, cache misses, cache probes per hit and probes per miss. Output is in the form moniker : hits+misses hits misses hit+miss_probes hit_probes miss_probes   + r raidset_name: Zero (-z) and or print (-g) the raid set statistics of the raid set specified by raidset_name. When printing (-g) statistics, for each backend in the raid set, the cumulative number of reads, writes, raid failures and write failures to that backend are printed. Output is in the form D0 r0_cnt r0_fails w0_cnt w0_fails; D1 r1_cnt r1_fails w1_cnt w1_fails; D...;   * GENERAL: When gathering (or zeroing) a Cache Range's statistics, a special system call to the RaidRunners cache is made. When gathering (or zeroing) a Raid Set's statistics, a read or write (of the control string "zerostats" is made to the raid_bind_point/stats control file.   * SEE ALSO: cache, raid0, raid1, raid3, raid5 3.9.101     STRING - perform a string operation on a given value   * SYNOPSIS: string option value   * DESCRIPTION: string provides various string type operations on the given value which is treated as a string. Options are length, range and split.   * OPTIONS:   + length string: The length option returns the number of characters in the given string.   + range string first last: The range option returns the substring from the given string that lies between the indices given by first and last. An index of 0 refers to the first character in the string. If last is beyond the length of the string, then it becomes the index of the last character in the string. If first is less than last then the substring is extracted backwards.   + split string split_ch: The split option will replace each occurrence of the character split_ch in the given string with a space.   * EXAMPLES: Some simple examples: set string ABCDEFGHIJ              # create the string set subs `string length $string'   # get it's length echo $subs 10 set subs `string range $string 2 2'     # extract character at index 3 echo $subs C set subs `string range $string 3 6'     # extract from indices 3 to 6 echo $subs DEFG set subs `string range $string 6 3'     # backwards echo $subs GFED set subs `string range $string 4 70'    # extract from index 4 to 70  (or end) echo $subs EFGHIJ set string D1,D2,D4,D8             # create the string set subs `string split $string ,'  # split the string echo $subs D1 D2 D4 D8  3.9.102     SUFFIX - Suffixes permitted on some big decimal numbers   * DESCRIPTION: In some commands that can take big decimal numbers as arguments certain suffixes are allowed. The suffix is a single alphabetical character that must follow immediately after the decimal number it is qualifying. The alphabetical character may be upper or lower case. Negative numbers cannot have suffixes. The accepted suffixes are:   + w: multiply number by 2   + b: multiply number by 512   + k: multiply number by 1024   + m: multiply number by 1048576   + g: multiply number by 1073741824 The resulting number must fit in a 32 bit unsigned number which means "3G" can be represented (== 3,221,225,472) but "4G" cannot. "0G" is allowable and is interpreted as zero.   * COMMANDS USING SUFFIXES: dd, stargd 3.9.103     SYSLOG - device to send system messages for logging   * SYNOPSIS: bind -k syslog bind_point   * DESCRIPTION: The syslog file system provides user level entry of messages into the system logging facility (see syslogd). Writes to this device will be time-stamped and stored with a message class of "NOTICE" in the system log. Reads from this device will return EOF. By default the system logging device is bound at /dev/syslog.   * EXAMPLE: > /dev/syslog bind -k syslog /dev/syslog ls -l /dev/syslog /dev/syslog                   syslog    0 0x00000000    0 0 echo {Some important message} > /dev/syslog syslogd 1: INFO: RaidRunner Syslog Boot 10: NOTICE: Some important message   * SEE ALSO: syslogd 3.9.104     SYSLOGD - initialize or access messages in the system log area   * SYNOPSIS: syslogd [-I] [-p cnt]   * DESCRIPTION: syslogd either initialize the system message logging area or print stored messages from that area.   * OPTIONS:   * -I: Initialize the system message logging area. This command is usually executed at boot time and is not normally invoked by the user.   * -p cnt: Print the last cnt messages stored in the system message logging area. /nr This is the default if no options are given and cnt is set to 20. Messages are printed in the format - timestamp: message class: message where timestamp is the time the message was logged recorded as the number of seconds from the time the RaidRunner was booted, message class is the type of message logged indication the importance (or class) of the message. message is the message itself   * MESSAGE CLASS There are currently nine (9) message classes 1. EMERG: messages of an extremely serious nature from which the RaidRunner cannot recover 2. ALERT: messages of a serious nature from which the RaidRunner can only partially recover 3. CRIT: messages of a serious nature from which the RaidRunner can almost fully recover 4. ERR: messages indicating internal errors 5. WARNING: messages of a serious from which the RaidRunner can fully recover, for example automatic allocation of hot spare to Raid 1, 3 or 5 file system. 6. NOTICE :messages logged via writes to syslog device 7. INFO: informative messages 8. DEBUG: debugging messages options are given and cnt is set to 20. 9. REPEATS: Indicates that the previous message has been repeated N times every S seconds since it's initial entry.   * SYSLOG OUTPUT: When messages are logged, they are stored in the system message logging area. If the global environment variable, SYSLOG_CONF, is set to a value of 'C' or is not set at all then messages will be printed to the console as well. If the message has a suffix of RPT N/S then this message has been logged N times every S seconds since it's initial entry into the system message logging area.   * SEE ALSO: syslog, setenv 3.9.105     TEST - condition evaluation command   * SYNOPSIS: test expr   * DESCRIPTION: test evaluates the expr and if its value is true then it returns a K9 status of NIL which is equivalent to the true command. Alternatively if test evaluates the expr and if its value is false then it returns a K9 status of "false" which is equivalent to the false command. If no arguments are given then a K9 status of "false" is returned. If the expression doesn't obey the syntax given below then an error message is written to standard error and returned as the K9 status. Note that any non-NIL K9 status is interpreted as false by husky condition logic (e.g. "if"). The basic expressions use by test fall into 3 categories: file tests, string tests and numeric tests. These basic expressions can be modified or combined into larger expressions by a 4th category called operators.   * FILE TESTS: The following file tests are supported:   + -d file: True if file exists and is a directory.   + -f file: True if file exists.   + -s file: True if file exists and is non-zero length.   * STRING TESTS: The command "test {}" is syntactically correct and has the value "false" because "{}" is an empty string. The command "test" by itself has the value "false". The commands "test hello" and "test -z" both have the value "true". The following string tests are supported:   + -n str: True if the length of str is non-zero.   + -z str: True if the length of str is zero.   + str1 = str2: True if str1 and str2 are identical. Spaces surrounding "=" are required.   + str1 != str2: True if str1 and str2 are not identical. Spaces surrounding "!=" are required.   + str: True if str is not a null (empty) string   * NUMERIC TESTS: These numeric arguments are evaluated as signed integers. Currently an internal 32 bit integer representation is used limiting the range of valid integers from -2,147,483,648 to 2,147,483,647 (inclusive). Thus fixed point and floating point numbers cannot be compared. The spaces surrounding the numeric comparison tokens (e.g. "-eq") are required. Instead of a number an expression of the form "-l str" can be given. This evaluates to the number of characters in the string str. The following numeric tests are supported:   + n1 -eq n2: True if n1 and n2 are numerically equal   + n1 -ge n2: True if n1 is greater than or equal to n2   + n1 -gt n2: True if n1 is greater than n2   + n1 -le n2: True if n1 is less than or equal to n2   + n1 -lt n2: True if n1 is less than n2   + n1 -ne n2: True if n1 is not equal to n2   * OPERATORS: The following operators are supported:   + !: Unary negation operator. Appears to the left of the expression it is negating.   + -a: Binary AND operator. Appears between the 2 expressions it is logically combining.   + -o: Binary OR operator. Appears between the 2 expressions it is logically combining.   + ( expr ): Parentheses are used to group expressions. Since parentheses are husky special characters they need to be quoted or escaped.   + The relative precedence of these operators from high to low is: () ! -a -o. Thus the expression: test ! -f /bin/ls -a -d /env is the same as: test {(} ! -f /bin/ls {)} -a -d /env The "!" operator should appear to the left of other unary operators. Basic binary operators have higher precedence than "-a" and "-o".   * SEE ALSO: husky 3.9.106     TIME - Print the number of seconds since boot (or reset of clock)   * SYNOPSIS: time   * DESCRIPTION: time will print the time in seconds since the RaidRunner was booted or since the clock was last reset. 3.9.107     TRAP - intercept a signal and perform some action   * SYNOPSIS:   + trap   + trap n ...   + trap {} n ...   + trap arg n ...   * DESCRIPTION: trap allows signals directed at this process to intercept signals and potentially take some special action. When trap is used with no arguments then all current (non- default) traps for this process are printed. When the first argument is a number (optionally followed by other numbers) then this number is interpreted as a signal number whose action is to be set to the default for that signal (extra numbers are treated in a similar fashion). When the first argument is an empty string (i.e. {}) then the signals nominated by the following numbers are ignored. When the first argument is a non-empty string then if the signals nominated by the following numbers occur, then "arg" will be executed. N.B. Signal number 4 (kill) can be neither caught nor ignored. The following table maps signal numbers to an explanation:   + 0 unused signal   + 1 hangup   + 2 interrupt (rubout)   + 3 quit (ASCII FS)   + 4 kill (cannot be caught or ignored)   + 5 write on a pipe with no one to read it   + 6 alarm clock   + 7 software termination signal   + 8 child process has changed state   + 9 process could not obtain memory (from heap)   * SEE ALSO: kill 3.9.108     TRUE - returns the K9 true status   * SYNOPSIS: true   * DESCRIPTION: true does nothing other than return the K9 true status. K9 processes return a pointer to a C string (null terminated array of characters) on termination. If that pointer is NULL then a true exit value is assumed while all other returned pointer values are interpreted as false (with the string being some explanation of what went wrong). This command returns a NULL pointer value as its return value. Returning a NULL pointer value is sometimes referred to as returning NIL. The husky shell interprets the token ":" to have the same meaning as true.   * SEE ALSO: false 3.9.109     STTY or TTY - print the user's terminal mount point or terminfo status   * SYNOPSIS:   + tty   + stty [ttyname]   * DESCRIPTION: tty examines the device on which the it is running, and if the device is on a DUART, then the mount point of the device is printed. If the device is not a DUART, then an appropriate message is printed. stty prints the terminfo structure associated with either the terminal device that stty is being executed from or the given terminal device - ttyname. If an invalid device name is given or the device stty is being executed from is not a DUART, an appropriate message is printed. This command is mainly useful for debugging the RaidRunner kernel.   * RETURN STATUS: On success a return code of 0 is returned, else 1 is returned. 3.9.110     UNSET - delete one or more environment variables   * SYNOPSIS   + unset name   + unset name name ...   * DESCRIPTION: unset removes the given environment variable(s). This is done by removing the given variable(s) from the /env file system.   * SEE ALSO: set, env 3.9.111     UNSETENV - unset (delete) a GLOBAL environment variable   * SYNOPSIS: unsetenv name [name ...]   * DESCRIPTION: unsetenv deletes the GLOBAL environment variable name along with it's contents. If multiple names are given, then each GLOBAL environment variable is deleted. If the given name is not GLOBAL environment variable then nothing is done and no error status is set.   * NOTE: A GLOBAL environment variable is one which is stored in a non volatile area on the RaidRunner and hence is available between successive power cycles or reboots. These variables ARE NOT the same as husky environment variables. The non volatile area is co-located with the RaidRunner Configuration area.   * SEE ALSO: printenv, setenv, rconf 3.9.112     VERSION - print out the version of the RaidRunner kernel   * SYNOPSIS: version   * DESCRIPTION: version prints out the version of the RaidRunner code which comprises the version number, date of creation and creator. 3.9.113     WAIT - wait for a process (or my children) to terminate   * SYNOPSIS: wait [pid]   * DESCRIPTION: wait either waits for the given pid (process identifier) to terminate or, if no argument is given, waits for all the children of this process to terminate. When a pid is given then the exit status of that process is returned by this command. When no pid is given then this command returns a NIL status (when all children have terminated).   * SEE ALSO: K9wait, K9kill 3.9.114     WARBLE - periodically pulse the buzzer   * SYNOPSIS: warble period count   * DESCRIPTION: warble will save the current state of the buzzer (on or off), then turn the buzzer on for period milliseconds, then off for period/2 and repeat this for count times.   * SEE ALSO: buzzer, sos 3.9.115     XD- dump given file(s) in hexa-decimal to standard out   * SYNOPSIS: xd [ file ... ]   * DESCRIPTION: xd dumps each given file in hexadecimal format to standard out. If no file is given (or it is "-") then standard in is used. 3.9.116     ZAP - write zeros to a file SYNOPSIS: zap [-b blockSize] [-f byteVal] count offset <>[3] store DESCRIPTION: zap writes count * 8192 bytes of zeros at byte position offset * 8192 into file store (which is opened and associated with file descriptor 3). Both count and offset may have a suffix. The optional "-b" switch allows the block size to be set to blockSize bytes. The default block size is 8192 bytes. The optional "-f" switch allows the fill character to be set to byteVal which should be a number in the range 0 to 255 (inclusive). The default fill character is 0 (i.e. zero). Every 100 write operations the current count is output (usually overwriting the previous count output). Errors on the write operations are ignored. SEE ALSO: suffix 3.9.117     ZCACHE - Manipulate the zone optimization IO table of a Raid Set's cache   * SYNOPSIS: zcache -c cache_moniker [-E extents] [-P portion] [-W 0|1] [-p]   * DESCRIPTION: zcache is a utility that will print and or modify the zone optimized IO table of a given cache range.   * OPTIONS:   + -c cache_moniker: Specify the cache range, cache_moniker, to print the zone table or modify it's contents. cache_moniker, is the name which was set in the add moniker=cache_moniker first= ... command (see cache). If no other options are given the zone table is printed. When the zone table is printed it is in the form of n state z1_lo..z1_hi@z1_blks z2_lo..z2_hi@z2_blks ... zn_lo..zn_hi@zn_blks   + where n is the number of IO optimized zones, state is either "on" for zone optimized IO, "off" for no zone optimized IO or "none" if no zones are available. z1_lo..z1_hi@z1_blks ... zn_lo..zn_hi@zn_blks lists the starting and ending blocks and the optimized block count for each zone.   + -E extents: Specify the number of data drives in the Raid Set for which the cache range exists.   + -P portion: Specify the percentage, limited to be between 50 and 300, that the optimized block count for each zone is to be adjusted. That is, if you need to reduce the optimized block count for each zone by say 10% you would set portion to 90.   + -W 0|1: Turn on (1) or off (0) zone IO optimizations for the given cache range.   * SEE ALSO: cache 3.9.118     ZERO - file when read yields zeros continuously   * SYNOPSIS: bind -k {zero [ # ]} bind_point   * DESCRIPTION: zero is a special file that when written to is an infinite sink of data (i.e. anything can be written to it and it will be disposed of quickly). When zero is read it is an infinite source of zeros (i.e. the byte value 0). The zero file will appear in the K9 namespace at the bind_point. If the optional "#" is given then the zero file still is an infinite sink of data. However, when read, each 512 bytes is viewed as a block (starting at block 0 at the beginning of the file). Each block contains 128 4 byte integers ("unsigned long" in C) each of which contain the current block number. This option is meant as a debugging aid.   * EXAMPLE: Husky installs a zero special file as follows: bind -k zero /dev/zero Example of use to make 32 Kilobyte file (called "/fill")  full of zeros. dd if=/dev/zero of=/fill bs=8k count=4   * SEE ALSO: null, log 3.9.119     ZLABELS - Write zeros to the front and end of Raid Sets   * SYNOPSIS   + zlabels -a   + zlabels raidset [raidset .... ]   * DESCRIPTION: zlabels is a husky script which writes zeros to both the front and end of either all configured raid sets or to given raid sets. Typically a host operating system will write label(s) onto a disk. Nearly all operating systems write a few blocks at the start of a disk and some write copies at the end of the disk as well. As this label contains formatting and modified disk geometry information a change to a raid set that effects the geometry of it's offered disk drive will conflict with any labels written before. zlabels writes zero's at the start of a raid set and also at the end. The number of blocks zeroed is dependant on the raid set type and io chunksize. Typically 50 io chunk size blocks are written at the start and 49 at the end. In the case of a raid type 3, the number of data drives times 50 (and 49) are written.   * OPTIONS :   + -a: All configured raid sets will be zeroed.   + raidset: The named raid set, or raid sets, are zeroed.   * SEE ALSO: dd 3.10  Advanced Topics: SCSI Monitor Daemon (SMON) Another way of communicating with the onboard controller from the host operating system is using the SCSI Monitor (SMON) facility. SMON provides an ASCII communication channel on an assigned SCSI ID and LUN. The commands discussed in section 7 may also be issued over this channel to manipulate the RAID configuration and operation. This mechanism is utilized under Solaris to provide a communication channel between an X Based GUI and the RAID controller. It is currently un-utilized under Linux. See the description of the smon daemon in the 5070 command reference above. 3.11  Further Reading   * The Linux software-RAID-HOWTO by Linas Vepstas   * The Plan9 pages at AT&T Bell Labs: http://plan9.bell-labs.com/plan9/ index.htm ----------------------------------------------------------------------------- This document was translated from LATEX by HEVEA. Apache Compile HOWTO Luc de Louw                      Revision History Revision 1.9.18 2003-02-09 Added XML and Sablotron support to PHP, dropped support for mod_jserv, added mod_jk support, enhanced support for Tomcat, updated software mentioned in the HOWTO, minor SGML enhancements Revision 1.9.17 2002-10-16 Updated software mentioned in the HOWTO, Further SGML enhancements and cleanups like more metadata, callouts and others. Revision 1.9.16 2002-07-04 Updated the software mentioned in the HOWTO, added LogFormat config for mod_gzip. Added gdbm to prerequisites. Lot of SGML enhancements like more metadata, and a revised FAQ section. Revision 1.9.15 2002-06-19 Updated to mod_ssl-2.8.9-1.3.26 and removed the temporary patch. Revision 1.9.14 2002-06-19 Updated to Apache 1.3.26 to fix security-hole CERT CA-2002-17 it is strongly recommended that users should update immediately, Added (temporary) patch to get mod_ssl 2.8.8 working with 1.3.26, Added --without-debug to MySQL configure Revision 1.9.13 2002-06-15 updates of software mentioned in the HOWTO, added how to bind MySQL to a specific IP, some minor changes and corrections Revision 1.9.12 2002-04-22 Added mod_gzip and mod_gunzip, Corrected some typos, updates of software mentioned in the HOWTO, separated the additional modules into an own section. Revision 1.9.11 2002-04-07 Corrected lots of typos (non-technical), updates of software mentioned in the HOWTO Revision 1.9.11-pre1 2002-03-15 Corrected some grammar, updates of software mentioned in the HOWTO Revision 1.9.10 2002-03-09 Corrected some grammar, updates of software mentioned in the HOWTO Revision 1.9.9 2002-02-11 Fixed a major bug in openssl config, restructured the document, added sources for further informations Revision 1.9.8 2002-02-08 Updates of software mentioned in the HOWTO, and fixed some bugs Revision 1.9.7 2001-12-26 Updates of software mentioned in the HOWTO, tested the HOWTO procedures on Linux running on IBM S/390 (zSeries) Machines (See "platforms" for more info), Added some basic support for Tomcat (Binaries only) Revision 1.9.6 2001-10-27 Updates of software mentioned in the HOWTO, and fixed some bugs Revision 1.9.5 2001-08-27 Yet another rewrite in DocBook 3.1 Revision 1.9.4 2001-08-26 Updated the Software-Versions mentioned in the document, corrected some typos Revision 1.9.3 2001-06-23 Current Version 2.0.0-pre3 in Linux DocBook format Revision 1.0.0 2000-08-05 First publication of the html-based document This document describes howto compile the Apache Webserver with the most important modules like mod_perl, mod_dav, mod_auth_ldap, mod_dynvhost, mod_roaming, mod_jserv, and mod_php ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. Contributors and Contacts 1.2. Why I wrote this document 1.3. What this document is supposed to be 1.4. What this document doesn't do for you 1.5. Platforms 1.6. Copyright Information 1.7. Disclaimer 1.8. New Versions 1.9. Credits 1.10. Feedback 1.11. Translations 1.12. About the author 2. Prerequisites 2.1. General 2.2. OpenSSL 2.3. GNU Database System 2.4. MySQL 2.5. Building mm 3. Getting, build and install Apache with its basic modules 3.1. Get and untar the Apache Source 3.2. mod_ssl 3.3. mod_perl 3.4. Configure and build Apache 4. Additional modules 4.1. mod_dav 4.2. auth_ldap 4.3. mod_auth_mysql 4.4. mod_dynvhost 4.5. mod_roaming 5. Compressed delivery 5.1. mod_gzip 5.2. mod_gunzip 6. mod_php and its prerequisites 6.1. What is mod_php 6.2. Prerequisites 6.3. Building and installing PHP4 7. PHP extensions 7.1. APC (Alternative PHP-cache) 7.2. Zend-Optimizer (Do _NOT_ combine with APC-Cache!) 8. Jakarta Tomcat 8.1. What is Tomcat 8.2. Prerequisites 8.3. Download the binaries 8.4. mod_jk 9. Further Information 9.1. News groups 9.2. Mailing Lists 9.3. HOWTO 9.4. Local Resources 9.5. Web Sites 10. Questions and Answers Warning Security hole in Apache older than 1.3.26   Do NOT use any Apache version older than 1.3.26. See [http:// www.cert.org/advisories/CA-2002-17.html] http://www.cert.org/ advisories/CA-2002-17.html for more information ----------------------------------------------------------------------------- 1. Introduction 1.1. Contributors and Contacts First I would thank all those people who send questions and suggestions that made a further development of this document possible. It shows me, sharing knowledge is the right way. I would encourage you to send me more suggestion, just write me an email . ----------------------------------------------------------------------------- 1.2. Why I wrote this document All Linux distributions I tested had a non-optimal default setup of Apache. Additionally all major distributions don't have current versions of Apache. Finally most commercial Unix are delivered without pre-installed Apache, or using a very strange setup. Since I am installing a lot of customized webservers on different Unixes therefor I wrote a plaintext document and placed it on my website so I can access it at work. Later a friend posted the URL to a mailinglist, and the first questions arrived. So I decided to put more information on the page. After a lot of people requested the document as an »official« HOWTO written in SGML, I decided to prepare it to be one. ----------------------------------------------------------------------------- 1.3. What this document is supposed to be Compiling all the items described below needs a lot of configure-options that nobody can memorize. This is supposed to be a copy-paste-ready text to compile Apache and friends. Also, people should learn how to build a full-featured Apache webserver by themself to be independent from any Linux distributors. ----------------------------------------------------------------------------- 1.4. What this document doesn't do for you It is just a Document, not a script that makes the work for you. You have to do all the steps by yourself. ----------------------------------------------------------------------------- 1.5. Platforms The original document was for all major Unix platforms. Now the HOWTOs are separated for each platform. You will find the same document adapted for:   * Linux (This Document)   * IBM AIX 4.3 and 5.1L   * Sun Solaris 6/7/8   * Hewlett-Packard HP-UX 11   * {Free|Net|Open}-BSD Important Notice for users running Linux on IBM S/390 (zSeries): PostgreSQL and Jserv wont compile on that system. All other programs and modules mentioned in the HOWTO are working perfectly Other Unix platforms: Feel free to create a guest-account for me on your Unix platform, so I can have a look at the differences. Windows-Users: I'm sorry, I'm too young for a heart-attack, You will need to upgrade your machine to a »real« operating system ;-) ----------------------------------------------------------------------------- 1.6. Copyright Information This document is copyrighted (c) 2000, 2001, 2002, 2003 Luc de Louw and is distributed under the terms of the Linux Documentation Project (LDP) license, stated below. Unless otherwise stated, Linux HOWTO documents are copyrighted by their respective authors. Linux HOWTO documents may be reproduced and distributed in whole or in part, in any medium physical or electronic, as long as this copyright notice is retained on all copies. Commercial redistribution is allowed and encouraged; however, the author would like to be notified of any such distributions. All translations, derivative works, or aggregate works incorporating any Linux HOWTO documents must be covered under this copyright notice. That is, you may not produce a derivative work from a HOWTO and impose additional restrictions on its distribution. Exceptions to these rules may be granted under certain conditions; please contact the Linux HOWTO coordinator at the address given below. In short, we wish to promote dissemination of this information through as many channels as possible. However, we do wish to retain copyright on the HOWTO documents, and would like to be notified of any plans to redistribute the HOWTOs. If you have any questions, please contact ----------------------------------------------------------------------------- 1.7. Disclaimer No liability for the contents of this documents can be accepted. Use the concepts, examples and other content at your own risk. As this is a new edition of this document, there may be errors and inaccuracies, that may of course be damaging to your system. Proceed with caution, and although this is highly unlikely, the author(s) do not take any responsibility for that. All copyrights are held by their by their respective owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark. Naming of particular products or brands should not be seen as endorsements. You are strongly recommended to take a backup of your system before major installation and backups at regular intervals. ----------------------------------------------------------------------------- 1.8. New Versions This is the 15th Revision New revisions of this document will be announced at [http://freshmeat.net/ projects/apache-compile-howto/?topic_id=905] http://freshmeat.net/projects/ apache-compile-howto/?topic_id=905 The latest version of this document is to be found at [http://www.delouw.ch/ linux] http://www.delouw.ch/linux   * [http://www.delouw.ch/linux/Apache-Compile-HOWTO/html/index.html] HTML.   * [http://www.delouw.ch/linux/Apache-Compile-HOWTO/Apache-Compile-HOWTO.ps] Postscript (ISO A4 format).   * [http://www.delouw.ch/linux/Apache-Compile-HOWTO/ Apache-Compile-HOWTO.pdf] Acrobat PDF.   * [http://www.delouw.ch/linux/Apache-Compile-HOWTO/ Apache-Compile-HOWTO.sgml] SGML Source.   * [http://www.delouw.ch/linux/Apache-Compile-HOWTO/ Apache-Compile-HOWTO.html.tar.gz] HTML gzipped tarball. ----------------------------------------------------------------------------- 1.9. Credits I would thank all the nice people at < discuss at linuxdoc.org> for supporting me in writing HOWTOs ----------------------------------------------------------------------------- 1.10. Feedback Feedback is most certainly welcome for this document. Without your submissions and input, this document wouldn't exist. Please send your additions, comments and critics to the following email address : . ----------------------------------------------------------------------------- 1.11. Translations At the moment there are translations available for:   * [http://www.delouw.ch/linux/DE-Apache-Compile-HOWTO/html/index.html] German   * [http://www.delouw.ch/linux/FR-Apache-Compile-HOWTO/html/index.html] French Translations to other languages are always welcome. If you translated this document, please let me know, so I can set a link here. ----------------------------------------------------------------------------- 1.12. About the author Luc (in english Luke) is 29 years old, playing around with computers since 20years. Currently he is working as Unix System Engineer for an IT-corporation located in Kloten (Zurich), Switzerland. Main-focus is developing all flavors of innovative Systems running on Linux (and other Un* xes) . Further, for all major Un*x platforms all the ??impossible?? tasks will end up on his desk (yes, its funny and he loves it!) ----------------------------------------------------------------------------- 2. Prerequisites 2.1. General   * flex 2.54   * bison 1.28   * autoconf 2.52   * automake 1.4   * libtool 1.4   * yacc 91.7.30   * freetype2-devel [1]   * re2c [2] To be continued All major distributions should include this general prerequisites. ----------------------------------------------------------------------------- 2.2. OpenSSL 2.2.1. What is OpenSSL   The OpenSSL Project is a collaborative effort to develop a   robust, commercial-grade, full-featured, and Open Source toolkit implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols as well as a full-strength general purpose cryptography library. The project is managed by a worldwide community of volunteers that use the Internet to communicate, plan, and develop the OpenSSL toolkit and its related documentation. OpenSSL is based on the excellent SSLeay library developed by Eric A. Young and Tim J. Hudson. The OpenSSL toolkit is licensed under an Apache-style license, which basically means that you are free to get and use it for commercial and non-commercial purposes subject to some simple license conditions. From authors points of view, its the basic to build a secure Unix-Server with Opensource Software, its needed for all major products like mod_ssl, OpenSSH and lot of other stuff that provides encrypted Data-processing --www.openssl.org   OpenSSL provides the libraries and include-files needed be the products mentioned above and also provides a Application to build Server and client-Certificates. ----------------------------------------------------------------------------- 2.2.2. Download the source Origin-Site [http://www.openssl.org] http://www.openssl.org ----------------------------------------------------------------------------- 2.2.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | |tar -xvzf openssl-0.9.7.tar.gz | | | |cd openssl-0.9.7 | | | |./config shared | | | |make | |make test | |make install | | | |echo "/usr/local/ssl/lib" >> /etc/ld.so.conf | |ldconfig | +---------------------------------------------------------------------------+ Tip Select your CPU to improve speed   By default the Makefile generates code for the i486 CPU. You can change this by editing the Makefile after running config shared. Search for -m486 and replace it i.e with -march=athlon ----------------------------------------------------------------------------- 2.3. GNU Database System 2.3.1. What is gdbm   GNU dbm is a set of database routines that use extensible   hashing. It works similar to the standard UNIX dbm routines. --www.gnu.org/software/gdbm   The GNU dbm is a very important application used by almost every distribution. So it is installed by default on all distributions I tested. In all probability the needed header files which are mandatory to build Apache with mod_rewrite and PHP are not installed by default. Please consult your distributions CD/DVD and install the devel package (The version can vary): +---------------------------------------------------------------------------+ |rpm -i gdbm-devel-1.8.0-546 | +---------------------------------------------------------------------------+ This procedure is verified for SuSE and Redhat. Please confirm for other RPM based systems like Mandrake. Debian will follow as soon as possible. Users of Debian bases systems can install gdbm as follow: +---------------------------------------------------------------------------+ |apt-get install libgdbmg1-dev | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.3.2. Building and installing by yourself In the unlikely case that your distribution does not contain gdbm here the instructions how to build it. +---------------------------------------------------------------------------+ |./configure | | | |make | |make install | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.4. MySQL 2.4.1. What is MySQL MySQL is a very fast, powerful and very nice to handle Database. Especially for webapplications where most access is read and few write, MySQL is the first choice. The newest Version is also transaction-capable. If you plan a Webapplication, that writes a lot of Data into the DB, maybe PostgreSQL is better suited for your project see Section 6.2.4 for installation hints You need the C-API from MySQL for compiling PHP if you wish MySQL-Support in PHP. It is also needed if you want to use mod_authmysql, See Section 4.3 for more information ----------------------------------------------------------------------------- 2.4.2. Download Origin-Site: [http://www.mysql.com/downloads/] http://www.mysql.com/downloads / ----------------------------------------------------------------------------- 2.4.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | |tar -xvzf mysql-3.23.55.tar.gz | |cd mysql-3.23.55 | | | |./configure \ | |--prefix=/usr/local/mysql \ | |--enable-assembler \ | |--with-innodb \ | |--without-debug | | | |make | |make install | | | |/usr/local/mysql/bin/mysql_install_db | |echo /usr/local/mysql/lib/mysql >> /etc/ld.so.conf | |ldconfig | +---------------------------------------------------------------------------+ For security-improvement add a MySQL-user on your system e.g. »mysql«. +---------------------------------------------------------------------------+ |chown -R mysql /usr/local/mysql/var | +---------------------------------------------------------------------------+ You may wish to start MySQL automatically at boottime, copy /usr/local/mysql/ share/mysql/mysql.server to /etc/init.d/ (or wherever your rc-script are located) and create the corresponding symbolic link in the runlevel directories. +---------------------------------------------------------------------------+ |cp /usr/local/mysql/share/mysql/mysql.server /etc/init.d/ | | | |ln -s /etc/init.d/mysql.server /etc/init.d/rc3.d/S20mysql | |ln -s /etc/init.d/mysql.server /etc/init.d/rc3.d/K20mysql | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.4.4. Securing MySQL This part is only optional, and describes how to bind the MySQL daemon to the localhost IP I suggest to just bind MySQL to the loopback-interface 127.0.0.1. This makes sure nobody can connect to your MySQL-Daemon via the network. But of course it only makes sense if MySQL runs on the same box like the webserver. edit /etc/init.d/mysql.server and edit line 107 as following: Original line: +---------------------------------------------------------------------------+ |$bindir/safe_mysqld --datadir=$datadir --pid-file=$pid_file& | +---------------------------------------------------------------------------+ Changed line: +---------------------------------------------------------------------------+ |$bindir/safe_mysqld --datadir=$datadir --pid-file=$pid_file \ | |--bind-address=127.0.0.1& (1) | +---------------------------------------------------------------------------+ (1) Here you can define to which interface MySQL should be bound Alternatively you can completely disable the networking functionality of MySQL. +---------------------------------------------------------------------------+ |$bindir/safe_mysqld --datadir=$datadir --pid-file=$pid_file \ | |--skip-networking & | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 2.5. Building mm 2.5.1. What is mm   The MM library is a 2-layer abstraction library which   simplifies the usage of shared memory between forked (and this way strongly related) processes under Unix platforms. On the first layer it hides all platform dependent implementation details (allocation and locking) when dealing with shared memory segments and on the second layer it provides a high-level malloc(3)-style API for a convenient and well known way to work with data-structures inside those shared memory segments. --www.engelschall.com   It is a common library that enables Unix programmers to simplify shm (Shared memory) accesses. It is used by many products, e.g. PHP and mod_ssl ----------------------------------------------------------------------------- 2.5.2. Download Origin Site: [ftp://ftp.ossp.org/pkg/lib/mm/mm-1.2.2.tar.gz] ftp:// ftp.ossp.org/pkg/lib/mm/mm-1.2.2.tar.gz ----------------------------------------------------------------------------- 2.5.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf mm-1.2.2.tar.gz | | | |cd mm-1.2.2 | | | |./configure | |make | |make test | |make install | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 3. Getting, build and install Apache with its basic modules 3.1. Get and untar the Apache Source 3.1.1. What is Apache   The Apache Project is a collaborative software development   effort aimed at creating a robust, commercial-grade, featureful, and freely-available source code implementation of an HTTP (Web) server. The project is jointly managed by a group of volunteers located around the world, using the Internet and the Web to communicate, plan, and develop the server and its related documentation. These volunteers are known as the Apache Group. In addition, hundreds of users have contributed ideas, code, and documentation to the project. This file is intended to briefly describe the history of the Apache Group and recognize the many contributors. --www.apache.org   It is simply the best Webserver-Software, it is very flexible to configure to match your needs, and it is E-X-T-R-E-M-E stable. I personally never experienced a crash in a productive (=non-experimental stuff) environment ----------------------------------------------------------------------------- 3.1.2. Download the source Origin-Site [http://www.apache.org/dist/httpd/] http://www.apache.org/dist/ httpd/ +---------------------------------------------------------------------------+ |cd /usr/local/ | | | |tar -xvzf apache_1.3.27.tar.gz | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 3.1.3. Patch for large-scale sites If your webserver should answer very much requests at the same time, and your machine is strong enough to serve such an amount of requests, you can change the limit of maximum running processes Download the patch from: [http://www.delouw.ch/linux/ apache-patch_HARD_SERVER_LIMIT.txt] http://www.delouw.ch/linux/ apache-patch_HARD_SERVER_LIMIT.txt +---------------------------------------------------------------------------+ |--- httpd.h Thu Mar 21 18:07:34 2002 | |+++ httpd.h-new Sun Apr 7 13:34:11 2002 | |@@ -320,7 +320,7 @@ | | #elif defined(NETWARE) | | #define HARD_SERVER_LIMIT 2048 | | #else | |-#define HARD_SERVER_LIMIT 256 | |+#define HARD_SERVER_LIMIT 512 | | #endif | | #endif | +---------------------------------------------------------------------------+ This patch does increase the maximum concurrent accessing clients to 512. Feel free to increase it further, if you hacked your kernel and edited your / etc/security/limits.conf Caution Avoid running out of tasks   With wrong settings this could end as a » self-denial-of-service-attack« Be sure you have enough processes left for root) Apply the patch using: +---------------------------------------------------------------------------+ |cd /usr/local/apache_1.3.27/src/include | | | |patch -p0 < apache-patch_HARD_SERVER_LIMIT.txt | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 3.2. mod_ssl 3.2.1. What is mod_ssl   This module provides strong cryptography for the Apache 1.3   webserver via the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols by the help of the Open Source SSL/TLS toolkit OpenSSL, which is based on SSLeay from Eric A. Young and Tim J. Hudson. --www.modssl.org   This module is needed to enable Apache for SSL-Requests (https). It applies a patch to the Apache source-code and extends its API (Application Programming Interface). The result is called EAPI (Extended Application Programming Interface). Caution Use of compilerflags while compiling modules   Make sure any module for your Apache server is compiled with the compiler-flag -DEAPI, or your Webserver might crash or can not be started. Almost all modules I know adds the -DEAPI flag by themself except mod_jserv and mod_jk ----------------------------------------------------------------------------- 3.2.2. Download the source Origin-Site:[http://www.modssl.org] http://www.modssl.org ----------------------------------------------------------------------------- 3.2.3. Applying the patch to the Apache source +---------------------------------------------------------------------------+ |cd /usr/local/ | | | |tar -xvzf mod_ssl-2.8.12-1.3.27.tar.gz | |cd mod_ssl-2.8.12-1.3.27/ | | | |./configure --with-apache=../apache_1.3.27 | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 3.3. mod_perl 3.3.1. What is mod_perl   With mod_perl it is possible to write Apache modules   entirely in Perl. In addition, the persistent interpreter embedded in the server avoids the overhead of starting an external interpreter and the penalty of Perl start-up time. --perl.apache.org   mod_perl is a kind of substitute for cgi-bin's. cgi's typically forks a new process for each request, and produces overhead. With mod_perl the perl-interpreter is loaded persistent in the Apache server and does not need to fork processes for each request. ----------------------------------------------------------------------------- 3.3.2. Download the source Origin-Site: [http://www.apache.org/dist/perl] http://www.apache.org/dist/ perl ----------------------------------------------------------------------------- 3.3.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf mod_perl-1.27.tar.gz | | | |cd mod_perl-1.27 | | | |perl Makefile.PL \ | |EVERYTHING=1 \ | |APACHE_SRC=../apache_1.3.27/src \ | |USE_APACI=1 \ | |PREP_HTTPD=1 \ | |DO_HTTPD=1 | | | |make | |make install | +---------------------------------------------------------------------------+ Caution Mod_perl can not be compiled as DSO   Do not compile mod_perl as DSO (Dynamic Shared Object)! According to various sources, Apache will crash (I never tried). ----------------------------------------------------------------------------- 3.4. Configure and build Apache Now the two static modules mod_ssl and mod_perl are configured and the Apache Source has been patched, and we can proceed with building Apache. ----------------------------------------------------------------------------- 3.4.1. Building and installing +---------------------------------------------------------------------------+ |EAPI_MM="/usr/local/mm-1.2.2" SSL_BASE="/usr/local/ssl" \ | |./configure \ | |--enable-module=unique_id \ | |--enable-module=rewrite \ | |--enable-module=speling \ | |--enable-module=expires \ | |--enable-module=info \ | |--enable-module=log_agent \ | |--enable-module=log_referer \ | |--enable-module=usertrack \ | |--enable-module=proxy \ | |--enable-module=userdir \ | |--enable-module=so \ | |--enable-shared=ssl \ | |--enable-module=ssl \ | |--activate-module=src/modules/perl/libperl.a \ | |--enable-module=perl | | | |make | |make install | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 3.4.2. Create self-signed SSL-certificate +---------------------------------------------------------------------------+ |cd /usr/local/ssl/bin | | | |./openssl req -new > new.cert.csr | |./openssl rsa -in privkey.pem -out new.cert.key | |./openssl x509 -in new.cert.csr -out new.cert.cert \ | |-req -signkey new.cert.key -days 999 | | | |cp new.cert.key /usr/local/apache/conf/ssl.key/server.key | |cp new.cert.cert /usr/local/apache/conf/ssl.crt/server.crt | +---------------------------------------------------------------------------+ Tip Common name   OpenSSL asks for different things. A common error is to enter a wrong "common name". This should be the FQHN (Fully Qualified HostName) of your Server, i.e www.foo.org ----------------------------------------------------------------------------- 4. Additional modules 4.1. mod_dav 4.1.1. What is mod_dav   mod_dav is an Apache module to provide DAV capabilities (RFC   2518) for your Apache web server. It is an Open Source module, provided under an Apache-style license. --www.webdav.org   From the authors point of view: DAV means: »Distributed authoring and Versioning«. It allows you to manage your Website similar to a filesystem. It is meant to replace ftp-uploads to your webserver. DAV is supported by all major web development tools (newer versions) and is going to be a widely accepted standard for webpublishing. ----------------------------------------------------------------------------- 4.1.2. Download the source Origin-Site: [http://www.webdav.org/mod_dav/] http://www.webdav.org/mod_dav/ ----------------------------------------------------------------------------- 4.1.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf mod_dav-1.0.3-1.3.6.tar.gz | |cd mod_dav-1.0.3-1.3.6 | | | |./configure --with-apxs=/usr/local/apache/bin/apxs | | | |make | |make install | +---------------------------------------------------------------------------+ Tip Confusing filename   The filename mod_dav-1.0.3-1.3.6 suggests that it will only run with Apache 1.3.6 but it actually will run with all Apaches >= 1.3.6 ----------------------------------------------------------------------------- 4.2. auth_ldap 4.2.1. What is auth_ldap   auth_ldap is an LDAP authentication module for Apache, the   world's most popular web server. auth_ldap has excellent performance, and supports Apache on both Unix and Windows NT. It also has support for LDAP over SSL, and a mode that lets Frontpage clients manage their web permissions while still using LDAP for authentication. --www.rudedog.org   From the authors point of view: If you like to consolidate your login-facilities to a common user/passwd base, LDAP (Lightweight Directory Access Protocol) is the right way. LDAP is an open standard and widely supported. Login-facilities for LDAP: Unix-Logins for Linux, Solaris (others?) FTP-Logins (some ftp-daemons) http Basic Authentication Tarantella Authentication and Role-Management Samba Authentication (2.2.x should support this) LDAP is role based. That means, i.e. you can define a role »manager« assign a user as member and that user can login wherever a manager is allowed to login. ----------------------------------------------------------------------------- 4.2.2. Download the source Origin-Site: [http://www.rudedog.org/auth_ldap/] http://www.rudedog.org/ auth_ldap/ ----------------------------------------------------------------------------- 4.2.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf auth_ldap-1.6.0.tar.gz | | | |cd auth_ldap-1.6.0 | | | |./configure --with-apxs=/usr/local/apache/bin/apxs \ | |--with-sdk=openldap | | | |make | |make install | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 4.3. mod_auth_mysql 4.3.1. What is mod_auth_mysql It is a http-Basic Authentication Module. It allows to maintain your user comfortable in a MySQL-Database ----------------------------------------------------------------------------- 4.3.2. Download the source Origin-Site: [ftp://ftp.kciLink.com/pub/mod_auth_mysql.c.gz] ftp:// ftp.kciLink.com/pub/mod_auth_mysql.c.gz ----------------------------------------------------------------------------- 4.3.3. Building and installing +---------------------------------------------------------------------------+ |gunzip mod_auth_mysql.c.gz | | | |/usr/local/apache/bin/apxs \ | |-c -I/usr/local/mysql/include \ | |-L/usr/local/mysql/lib/mysql \ | |-lmysqlclient -lm mod_auth_mysql.c | | | |cp mod_auth_mysql.so /usr/local/apache/libexec/ | +---------------------------------------------------------------------------+ Add this line in your httpd.conf: +---------------------------------------------------------------------------+ |LoadModule mysql_auth_module libexec/mod_auth_mysql.so | +---------------------------------------------------------------------------+ And where the other modules are added: +---------------------------------------------------------------------------+ |AddModule mod_auth_mysql.c | +---------------------------------------------------------------------------+ Take care that the path of MySQL libraries and includes are correct. Tip Library path   Be sure that /usr/local/mysql/lib/mysql is in /etc/ld.so.conf before compiling ----------------------------------------------------------------------------- 4.3.4. Sample configuration Example 1. /usr/local/apache/conf/httpd.conf +---------------------------------------------------------------------------+ | | | AuthType Basic | | AuthUserfile /dev/null | | AuthName Testing | | AuthGroupFile /dev/null | | AuthMySQLHost localhost | | AuthMySQLCryptedPasswords Off | | AuthMySQLUser root | | AuthMySQLDB users | | AuthMySQLUserTable user_info | | | | require valid-user | | | | | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 4.3.4.1. Script for creating the MySQL-Database just type: +---------------------------------------------------------------------------+ |mysql < authmysql.sql | +---------------------------------------------------------------------------+ The File authmysql.sql contents: Example 2. authmysql.sql +---------------------------------------------------------------------------+ | create database http_users; | | connect http_users; | | | | CREATE TABLE user_info ( | | user_name CHAR(30) NOT NULL, | | user_passwd CHAR(20) NOT NULL, | | user_group CHAR(10), | | PRIMARY KEY (user_name); | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 4.4. mod_dynvhost 4.4.1. What is mod_dynvhost It is a module that allows to define new Virtual Host "on-the-fly". Just create a new Directory in your vhost-path, thats it. It is not need to restart your Webserver It is a good solution for Mass-Virtual-hosting for ISP's ----------------------------------------------------------------------------- 4.4.2. Download the source Origin-Site: [http://funkcity.com/0101/projects/dynvhost/mod_dynvhost.tar.gz] http://funkcity.com/0101/projects/dynvhost/mod_dynvhost.tar.gz ----------------------------------------------------------------------------- 4.4.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf mod_dynvhost.tar.gz | | | |cd dynvhost/ | | | |/usr/local/apache/bin/apxs -i -a -c mod_dynvhost.c | +---------------------------------------------------------------------------+ Tip Check httpd.conf   Notice: Take a look at httpd.conf if mod_dynvhost.so is loaded at startup: +-----------------------------------------------------------------------+ |LoadModule dynvhost_module libexec/mod_dynvhost.so | +-----------------------------------------------------------------------+ ----------------------------------------------------------------------------- 4.4.4. Sample configuration Example 3. /usr/local/apache/conf/httpd.conf +---------------------------------------------------------------------------+ | | | HomeDir / | | | +---------------------------------------------------------------------------+ Now create a Directory for each virtualhost in /usr/local/apache/htdocs/ vhosts/ i.e. /usr/local/apache/htdocs/vhosts/foo.bar.org You don't need to restart your Webserver ----------------------------------------------------------------------------- 4.5. mod_roaming 4.5.1. What is mod_roaming   With mod_roaming you can use your Apache webserver as a   Netscape Roaming Access server. This allows you to store your Netscape Communicator 4.5 preferences, bookmarks, address books, cookies etc. on the server so that you can use (and update) the same settings from any Netscape Communicator 4.5 that can access the server. --www.klomp.org/mod_roaming/   From the authors point of view: Mod_roaming is indeed valuable. Unfortunately it does not work over proxy-connection. You can keep your Netscape 4.x bookmarks etc. synchronized on different machines. It is not supported by any other browsers, including Mozilla and Netscape 6.x. ----------------------------------------------------------------------------- 4.5.2. Download the source Origin-Site: [http://www.klomp.org/mod_roaming/] http://www.klomp.org/ mod_roaming/ ----------------------------------------------------------------------------- 4.5.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf mod_roaming-1.0.2.tar.gz | | | |cd mod_roaming-1.0.2 | | | |/usr/local/apache/bin/apxs -i -a -c mod_roaming.c | +---------------------------------------------------------------------------+ Tip Check httpd.conf   Check httpd.conf if mod_roaming is loaded at startup: +-----------------------------------------------------------------------+ |LoadModule roaming_module libexec/mod_roaming.so | +-----------------------------------------------------------------------+ ----------------------------------------------------------------------------- 4.5.4. Sample configuration Example 4. /usr/local/apache/conf/httpd.conf +---------------------------------------------------------------------------+ |RoamingAlias /roaming /usr/local/apache/roaming | | | | AuthUserFile /usr/local/apache/conf/roaming-htpasswd | | AuthType Basic | | AuthName "Roaming Access" | | | | require valid-user | | | | | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 5. Compressed delivery There are basically two modules available for output compression: mod_gzip and mod_gunzip. They are using different approaches to reach the the goal of bandwidth reduction. mod_gunzip expects compressed file on the filesystem, and uncompress them if the browser cannot handle compressed data. The benefit is a low cpu-usage, because most browsers are capable to handle gzipped content. On the oder side, most of today's content is served dynamically i.e. PHP, and this content will be delivered uncompressed. mod_gzip does not need compressed files on the system, all defined content will be compressed before delivery. The benefit is to have the dynamically generated content also compressed, the other side is a higher cpu-usage, because every request has to be compressed on-the-fly. Mod_gzip can handle already compressed data i.e. index.html.gz and send it as-is. The conclusion: You carefully have to make a decision which of the two modules makes more sense for you. If you have to pay for every GB delivered and CPU-power does not matter, then mod_gzip is the choice for you. If response time matters (delay between request and delivery), and your bandwidth is cheap or unlimited, mod_gunzip matches your needs better. A good page that helps you to make this decision is Martin Kiff's document about mod_gunzip [http://www.innerjoin.org/apache-compression/howto.html] http://www.innerjoin.org/apache-compression/howto.html ----------------------------------------------------------------------------- 5.1. mod_gzip 5.1.1. Download the source Origin-Site: [http://prdownloads.sourceforge.net/mod-gzip/ mod_gzip-1.3.26.1a.tgz?download] http://prdownloads.sourceforge.net/mod-gzip/ mod_gzip-1.3.26.1a.tgz?download ----------------------------------------------------------------------------- 5.1.2. Building and installing To successfully compile mod_gzip you need to edit the Makefile and provide the correct path to apxs +---------------------------------------------------------------------------+ |make | |make install | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 5.1.3. Sample configuration Put the following in your /usr/local/apache/conf/httpd.conf: Example 5. /usr/local/apache/conf/httpd.conf +---------------------------------------------------------------------------+ |mod_gzip_on Yes | |mod_gzip_can_negotiate Yes | |mod_gzip_dechunk Yes | |mod_gzip_minimum_file_size 600 | |mod_gzip_maximum_file_size 0 | |mod_gzip_maximum_inmem_size 100000 | |mod_gzip_keep_workfiles No | |mod_gzip_temp_dir /usr/local/apache/gzip | |mod_gzip_item_include file \.html$ | |mod_gzip_item_include file \.txt$ | |mod_gzip_item_include file \.jsp$ | |mod_gzip_item_include file \.php$ | |mod_gzip_item_include file \.pl$ | |mod_gzip_item_include mime ^text/.* | |mod_gzip_item_include mime ^application/x-httpd-php | |mod_gzip_item_include mime ^httpd/unix-directory$ | |mod_gzip_item_include handler ^perl-script$ | |mod_gzip_item_include handler ^server-status$ | |mod_gzip_item_include handler ^server-info$ | |mod_gzip_item_exclude file \.css$ | |mod_gzip_item_exclude file \.js$ | |mod_gzip_item_exclude mime ^image/.* | +---------------------------------------------------------------------------+ You may whish to log the result of the compression to your accesslog. This can be done by changing the LogFormat directive in /usr/local/apache/conf/ httpd.conf +------------------------------------------------------------------------------------------------------------------------------+ |LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" mod_gzip: %{mod_gzip_compression_ratio}npct." combined| +------------------------------------------------------------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 5.2. mod_gunzip 5.2.1. Download the source Origin-Site: [http://www.oldach.net/mod_gunzip.tar.gz] http://www.oldach.net/ mod_gunzip.tar.gz ----------------------------------------------------------------------------- 5.2.2. Building and installing +---------------------------------------------------------------------------+ |tar -xvzf mod_gunzip.tar.gz | |cd mod_gunzip-2 | | | |/usr/local/apache/bin/apxs -i -a -c -lz mod_gunzip.c | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 5.2.3. Sample configuration Put the following in your /usr/local/apache/conf/httpd.conf: Example 6. /usr/local/apache/conf/httpd.conf +---------------------------------------------------------------------------+ |AddType text/html .htmz | |AddHandler send-gunzipped .htmz | +---------------------------------------------------------------------------+ Now you can gzip your html files and rename them to i.e: +---------------------------------------------------------------------------+ |gzip index.html | |mv index.html.gz index.htmz | +---------------------------------------------------------------------------+ Of course you have to change all links to htmz, i.e. Some page ----------------------------------------------------------------------------- 6. mod_php and its prerequisites 6.1. What is mod_php   PHP is a server-side, cross-platform, HTML embedded   scripting language. In the beginning it was just a simple guestbook-processor, and it was growing and growing Since Version 3 it is really powerful Webdevelopment-language --www.php.net   Since Version 4 PHP capable and robust enough for enterprise webapplications. It is powerful, supports almost all important databases natively, and other through ODBC (Open DataBase Connectivity). It a few times faster than ASP on Windows Systems on the same Hardware. There are other extensions available like APC (Alternative PHP Cache), which speed up processing about 50-400% (depends on the php-code you wrote) ----------------------------------------------------------------------------- 6.2. Prerequisites Depending on your needs there are some software to install first. One already installed Software according this document is MySQL, because its needed by mod_auth_mysql. ----------------------------------------------------------------------------- 6.2.1. IMAP client 6.2.1.1. What is IMAP client IMAP means »Internet Mail Application Protocol« and is a substitute for the POP (Post Office Protocol) protocol. It allows to keep all Mails in different folders on the server, which (should) be backed up - Never again lose important email, because your local harddrive crashed ----------------------------------------------------------------------------- 6.2.2. Download the source Origin-Site: Origin-Site: [http://www.washington.edu/imap/] http://www.washington.edu/imap / ----------------------------------------------------------------------------- 6.2.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvfz imap.tar.Z | | | |cd imap | | | |make slx SSLTYPE=nopwd (1) | +---------------------------------------------------------------------------+ (1) With the parameter SSLTYPE=unix you define if you need SSL support or not. Omitting means no SSL support. Tip Filename to download   imap.tar.Z is usually a symlink to the latest release, today its linked to imap-2001a.tar.Z ----------------------------------------------------------------------------- 6.2.4. PostgreSQL 6.2.4.1. What is PostgreSQL PostgreSQL is a very powerful and fast Database Like MySQL wonderful for Webapplications. From my Point of view, not as comfortable to handle as MySQL. If your Webapplication performs mostly writes, or you need proofed transaction-capabilities, PostgreSQL is your friend ----------------------------------------------------------------------------- 6.2.4.2. Download the source Origin-Site: [http://www.postgresql.org] http://www.postgresql.org (Select a mirror close to you) ----------------------------------------------------------------------------- 6.2.4.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf postgresql-7.3.2.tar.gz | | | |cd postgresql-7.3.2 | | | |./configure \ | |--with-perl \ | |--enable-odbc \ | |--with-unixodbc \ | |--with-pam \ | |--with-openssl \ | | | | | |make | |make install | | | |echo /usr/local/pgsql/lib >> /etc/ld.so.conf | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 6.2.5. Sablotron 6.2.5.1. What is Sablotron   Sablotron is a fast, compact and portable XML toolkit   implementing XSLT 1.0, DOM Level2 and XPath 1.0. Sablotron is an open project; other users and developers are encouraged to use it or to help us testing or improving it. The goal of this project is to create a lightweight, reliable and fast XML library processor conforming to the W3C specification, which is available for public and can be used as a base for multi-platform XML applications. --http://www.gingerall.com/charlie/ga/xml/p_sab.xml   ----------------------------------------------------------------------------- 6.2.5.2. Download the source Origin-Site: [http://download-2.gingerall.cz/download/sablot/ Sablot-0.97.tar.gz] http://download-2.gingerall.cz/download/sablot/ Sablot-0.97.tar.gz ----------------------------------------------------------------------------- 6.2.5.3. Building and installing +---------------------------------------------------------------------------+ |tar -xvzf Sablot-0.97.tar.gz | |cd Sablot-0.97 | | | |./configure | |make | |make install | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 6.2.6. pdflib 6.2.6.1. What is pdflib   PDFlib: A library for generating PDF on the fly PDFlib is   the premier software component if you want to generate PDF on your server, convert text and graphics, or implement PDF output in your own products. --www.pdflib.com   From the authors point of view: Caution This is a commercial product   PDFLIB is a commercial Product. Read the license carefully to see if you need a commercial license or not ----------------------------------------------------------------------------- 6.2.6.2. Download the source Origin-Site: [http://www.pdflib.com/pdflib/download/pdflib-4.0.3.tar.gz] http://www.pdflib.com/pdflib/download/pdflib-4.0.3.tar.gz ----------------------------------------------------------------------------- 6.2.6.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local/ | |tar -xvzf pdflib-4.0.3.tar.gz | | | |cd pdflib-4.0.3 | | | |./configure --enable-shared-pdflib --enable-cxx | | | |make | |make install | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 6.2.7. gettext 6.2.7.1. What is gettext gettext is a library for i18n (Internationalization, "I", 18 chars and "n") of software, and needed by php ----------------------------------------------------------------------------- 6.2.7.2. Download the source Origin-Site: [ftp://ftp.gnu.org/gnu/gettext] ftp://ftp.gnu.org/gnu/gettext (select a mirror close to you) ----------------------------------------------------------------------------- 6.2.7.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf gettext-0.11.2.tar.gz | | | |cd gettext-0.11.2 | | | |./configure | | | |make | |make check | |make install | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 6.2.8. zlib 6.2.8.1. What is zlib zlib is a lossless data-compression library for use on virtually any computer hardware and operating system ----------------------------------------------------------------------------- 6.2.8.2. Download the source Origin-Site: [ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.4.tar.gz] ftp: //ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.4.tar.gz (select a mirror close to you) ----------------------------------------------------------------------------- 6.2.8.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf zlib-1.1.4.tar.gz | | | |cd zlib-1.1.4/ | | | |./configure | | | |make | |make test | |make install | | | |ldconfig | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 6.3. Building and installing PHP4 +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf php-4.3.0.tar.gz | | | |cd php-4.3.0 | | | |export LDFLAGS=-lstdc++ | | | |./configure \ | |--with-apxs=/usr/local/apache/bin/apxs \ | |--with-mysql=/usr/local/mysql \ | |--with-pgsql=/usr/local/pgsql \ | |--enable-track-vars \ | |--with-openssl=/usr/local/ssl \ | |--with-imap=/usr/local/imap \ | |--with-gd --with-ldap \ | |--enable-ftp \ | |--enable-sysvsem \ | |--enable-sysvshm \ | |--enable-sockets \ | |--with-pdflib=/usr/local \ | |--with-gettext \ | |--with-mm=/usr/local/mm-1.1.3 \ | |--with-jpeg-dir=/usr/lib \ | |--with-zlib-dir=/usr/local \ | |--enable-wddx \ | |--with-mcrypt \ | |--with-mhash \ | |--with-mcal=/usr \ | |--enable-exif \ | |--enable-xslt \ | |--with-xslt-sablot=/usr/local \ | |--with-dom \ | |--with-dom-xslt \ | +---------------------------------------------------------------------------+ Edit the Makefile and add -lstdc++ to the variable EXTRA_LIBS. This is currently only needed, when using Sablotron version 0.9.7 +---------------------------------------------------------------------------+ |make | |make install | +---------------------------------------------------------------------------+ After installing your httpd.conf is modified by axps. It should now look as follows: +---------------------------------------------------------------------------+ | | |LoadModule ssl_module libexec/libssl.so | |LoadModule php4_module libexec/libphp4.so | | | +---------------------------------------------------------------------------+ If you compiled Apache with mod_ssl then the php-module will only be loaded when staring Apache with ssl (apachectl startssl). If you will start Apache without ssl support (but compiled like described in this document) you need to change this: +---------------------------------------------------------------------------+ | | |LoadModule ssl_module libexec/libssl.so | | | |LoadModule php4_module libexec/libphp4.so | +---------------------------------------------------------------------------+ Copy the sample php.ini-dist to /usr/local/lib/php.ini +---------------------------------------------------------------------------+ |cp /usr/local/php-4.3.0/php.ini-dist /usr/local/lib/php.ini | +---------------------------------------------------------------------------+ uncomment (remove the # at begin of line) the following lines in /usr/local/ apache/conf/httpd.conf Apache 1.3.27 default httpd.conf does lack of this entries. You have to add them instead of uncommenting +---------------------------------------------------------------------------+ |AddType application/x-httpd-php .php | |AddType application/x-httpd-php .phtml | |AddType application/x-httpd-php .php3 | | | |# If you want to display PHP source | | | |AddType application/x-httpd-php-source .phps (1) | +---------------------------------------------------------------------------+ (1) This line is only needed if you like to display sourcecodes in the browser. The fileextension of such files should be phps. /para> Tip register_globals   Since PHP Version 4.2.1, »register_globals« are set OFF. This can bring you in problems running PHP-Code not using the $HTTP_GET_VARS methods. To enable register_globals edit the following line in your /usr/local/lib/ php.ini: +-----------------------------------------------------------------------+ |register_globals = On | +-----------------------------------------------------------------------+ Please be sure, if you write new software, to use the new methods. Support of old methods will be dropped sooner or later Restart Apache by issuing the following command: +---------------------------------------------------------------------------+ |/usr/local/apache/bin/apachectl restart | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 7. PHP extensions There are many different extensions available for php, which can be added in your php.ini ----------------------------------------------------------------------------- 7.1. APC (Alternative PHP-cache) 7.1.1. What is APC   APC is the Alternative PHP Cache. It was conceived of to   provide a free, open, and robust framework for compiling and caching php scripts. APC was conceived of to provide a way of boosting the performance of PHP on heavily loaded sites by providing a way for scripts to be cached in a compiled state, so that the overhead of parsing and compiling can be almost completely eliminated. There are commercial products which provide this functionality, but they are neither open-source nor free. Our goal was to level the playing field by providing an implementation that allows greater flexibility and is universally accessible. We also wanted the cache to provide visibility into it's own workings and those of PHP, so time was invested in providing internal diagnostic tools which allow for cache diagnostics and maintenance. Thus arrived APC. Since we were committed to developing a product which can easily grow with new version of PHP, we implemented it as a zend extension, allowing it to either be compiled into PHP or added post facto as a drop in module. As with PHP, it is available completely free for commercial and non-commercial use, under the same terms as PHP itself. APC has been tested under PHP 4.0.3, 4.0.3pl1 and 4.0.4. It currently compiles under Linux and FreeBSD. Patches for ports to other OSs/ PHP versions are welcome. --www.apc.communityconnect.com/   The author made some performance-Tests with apc and it was real surprise. A PHP-Webpage with MySQL-queries in a loop (total 10 queries) was more than 50% faster Contra APC: If you have other users on the system coding php they maybe are not comfortable with APC, because the changes are all ignored unless you reset the cache or restart Apache. The other way, namely that APC checks the php-script for a newer version before every run costs speed. ----------------------------------------------------------------------------- 7.1.2. Download the source Origin-Site: [http://apc.communityconnect.com/sources/apc-cvs.tar.gz] http:// apc.communityconnect.com/sources/apc-cvs.tar.gz ----------------------------------------------------------------------------- 7.1.3. Building and installing +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf apc-cvs.tar.gz | | | |cd apc | | | |./configure --enable-apc --with-php-config=/usr/local/bin/php-config | | | |make | |make install | | | |cp modules/php_apc.so /usr/local/lib/php/extensions | | | |echo 'zend_extension="/usr/local/lib/php/extensions/php_apc.so"' \ | |>> /usr/local/lib/php.ini | |echo ??apc.mode = shm?? >> \ | |apc.mode = shm | +---------------------------------------------------------------------------+ Restart your Apache-Webserver. Try it out, create a php-file with the following content: Example 7. apctest.php +---------------------------------------------------------------------------+ | | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 7.2. Zend-Optimizer (Do _NOT_ combine with APC-Cache!) 7.2.1. What is Zend-optimizer   The Zend Optimizer goes over the intermediate code generated   by the standard Zend run-time compiler located in the Zend Engine, and then optimizes it for faster execution. --www.zend.com   Zend-Optimizer is a freeware closed source product. On the same testcode used for the APC-test, there was speed-decrease of about 5% compared to PHP without APC. You have to make your own test, to see, if you have some improvements with your own code. Be sure not to NOT use Zend Optimizer together with APC, or your whole setup will not work. ----------------------------------------------------------------------------- 7.2.2. Download the binary Origin-Site: [https://www.zend.com/store/free_download.php?pid=13] https:// www.zend.com/store/free_download.php?pid=13 Tip Register at zend.com   You have to register yourself at zend.com to get access to the download-page ----------------------------------------------------------------------------- 7.2.3. Installing There is noting to build, this product is closed-source and so only available as binary for different platforms. The filename varies according your platform, the sample is for Linux on IA32. +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf ZendOptimizer-2.1.0-Linux_glibc21-i386.tar.gz | | | |cd ZendOptimizer-2.1.0-Linux_glibc21-i386 | | | |./install.sh | +---------------------------------------------------------------------------+ The install script is self-explanatory, if you compiled Apache and PHP like described in this document, you can just press ENTER on all questions about the pathnames. ----------------------------------------------------------------------------- 8. Jakarta Tomcat 8.1. What is Tomcat   Tomcat is the servlet container that is used in the official   Reference Implementation for the Java Servlet and JavaServer Pages technologies. The Java Servlet and JavaServer Pages specifications are developed by Sun under the Java Community Process. From the authors point of view: --jakarta.apache.org   Tomcat is the successor of jserv which is no longer developed. Tomcat supports the latest jsp and servlet-APIs defined by sun. Unfortunately Tomcat is very difficult to build from source, because it is using its own building-system called "ant". There is also a very long list of prerequisites if you want to build from source. See [http://jakarta.apache.org/tomcat/ tomcat-4.0-doc/BUILDING.txt] http://jakarta.apache.org/tomcat/tomcat-4.0-doc/ BUILDING.txt for more details - Good luck, and give some feedback to the author. In the meantime the HOWTO is providing some basic support for Tomcat installed from binaries. The Author is searching for some volunteers who tries to build Tomcat from source and tells what steps are required ----------------------------------------------------------------------------- 8.2. Prerequisites 8.2.1. Java2 8.2.1.1. What is Java2 Please see java.sun.com Too much for this HOWTO, please see [http://java.sun.com/j2se/1.3/docs/ relnotes/features.html] http://java.sun.com/j2se/1.3/docs/relnotes/ features.html ----------------------------------------------------------------------------- 8.2.2. Download the binaries Go to [http://java.sun.com/j2se/1.3/] http://java.sun.com/j2se/1.3/ [3] ,choose your platform and follow the steps on the site. ----------------------------------------------------------------------------- 8.2.3. Installing the binaries Execute the binary: +---------------------------------------------------------------------------+ |chmod +x j2sdk-1_3_1_02-linux-i386.bin | | | |./2sdk-1_3_1_02-linux-i386.bin | +---------------------------------------------------------------------------+ After accepting the license, unpack the stuff and move the resulting directory to /usr/lib and set an appropriate symbolic link ----------------------------------------------------------------------------- 8.3. Download the binaries Origin-Site: [http://jakarta.apache.org/builds/jakarta-tomcat-4.0/release/ v4.1.18/src/jakarta-tomcat-4.1.18-src.tar.gz] http://jakarta.apache.org/ builds/jakarta-tomcat-4.0/release/v4.1.18/src/ jakarta-tomcat-4.1.18-src.tar.gz ----------------------------------------------------------------------------- 8.3.1. Installing the binaries +---------------------------------------------------------------------------+ |cd /usr/local | | | |tar -xvzf jakarta-tomcat-4.1.8.tar.gz | | | |cd jakarta-tomcat-4.1.8 | | | |cd bin | | | |rm *.bat | | | |echo export JAVA_HOME=/usr/lib/java/ >> /etc/profile | |. /etc/profile | +---------------------------------------------------------------------------+ To enable the Tomcat manager, you need to modify /usr/local/ jakarta-tomcat-4.1.8/conf/tomcat-users.xml add a user »admin« or with the role »manager«. The result should look like this: +---------------------------------------------------------------------------+ | | | | | | | | | | | | | | | | | | | | +---------------------------------------------------------------------------+ Now you should be able to startup tomcat: +---------------------------------------------------------------------------+ |/usr/local/apache/jakarta-tomcat-4.1.8/bin/startup.sh | +---------------------------------------------------------------------------+ You should now be able to connect to: [http://localhost:8080/index.jsp] http: //localhost:8080/index.jsp ----------------------------------------------------------------------------- 8.4. mod_jk 8.4.1. Download the source If you like to have a native interface into your Apache Webserver, you need to build mod_jk with must be downloaded separately here: [http:// jakarta.apache.org/builds/jakarta-tomcat-4.0/release/v4.1.18/src/ jakarta-tomcat-connectors-4.1.18-src.tar.gz] http://jakarta.apache.org/builds /jakarta-tomcat-4.0/release/v4.1.18/src/ jakarta-tomcat-connectors-4.1.18-src.tar.gz. ----------------------------------------------------------------------------- 8.4.2. Building and installing +---------------------------------------------------------------------------+ |tar -xvzf jakarta-tomcat-connectors-4.1.18-src.tar.gz | | | |cd jakarta-tomcat-connectors-4.1.18-src/jk/native | | | |./buildconf | |./configure --with-apxs=/usr/local/apache/bin/apxs | | | |make | |make install | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 8.4.3. Customizing Now follows the annoying part, the customizing of the config files. First edit /usr/local/jakarta-tomcat-connectors-4.1.18-src/jk/conf/ workers.properties, and copy the file to /usr/local/apache/conf I made a sample workers.properties that works with the example JSPs and servlets that comes with the Tomcat distribution. It is based on the sample workers.properties from Tomcat Example 8. workers.properties +-------------------------------------------------------------------------------------------------+ |workers.tomcat_home=/usr/local/jakarta-tomcat-4.1.18 | | | |# workers.java_home should point to your Java installation. Normally | |# you should have a bin and lib directories beneath it. | |# | |workers.java_home=/usr/lib/java2 | | | |# You should configure your environment slash... ps=\ on NT and / on UNIX | |# and maybe something different elsewhere. | |# | |ps=/ | | | |# The workers that your plugins should create and work with | |# | |worker.list=worker1 | | | |#------ DEFAULT ajp13 WORKER DEFINITION ------------------------------ | |#--------------------------------------------------------------------- | |# Defining a worker named ajp13 and of type ajp13 | |# Note that the name and the type do not have to match. | |# | |worker.worker1.port=8009 | |worker.worker1.host=localhost | |worker.worker1.type=ajp13 | | | |#------ CLASSPATH DEFINITION ----------------------------------------- | |#--------------------------------------------------------------------- | |# Additional class path components. | |# | |worker.inprocess.class_path=$(workers.tomcat_home)$(ps)lib$(ps)tomcat.jar | | | |# The JVM that we are about to use | |# | |# Unix - Sun VM or blackdown | |worker.inprocess.jvm_lib=$(workers.java_home)$(ps)jre$(ps)lib$(ps)i386$(ps)classic$(ps)libjvm.so | | | |# Setting the place for the stdout and stderr of tomcat | |# | |worker.inprocess.stdout=$(workers.tomcat_home)$(ps)logs$(ps)inprocess.stdout | |worker.inprocess.stderr=$(workers.tomcat_home)$(ps)logs$(ps)inprocess.stderr | +-------------------------------------------------------------------------------------------------+ Next, you need to configure your apache config file httpd.conf. The following example matches the examples provided by Tomcat. +---------------------------------------------------------------------------+ |LoadModule jk_module libexec/mod_jk.so | |AddModule mod_jk.c | | | |JkWorkersFile /usr/local/apache/conf/workers.properties | |JkLogFile /var/log/httpd/mod_jk.log | |JkLogLevel info | |JkLogStampFormat "[%a %b %d %H:%M:%S %Y] " | |JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories | |JkRequestLogFormat "%w %V %T" | |JkMount /examples/servlet/* worker1 | |JkMount /examples/*.jsp worker1 | +---------------------------------------------------------------------------+ After restarting Apache, you should now be able to connect to your JSP's via Apache. I.e: [http://localhost/examples/jsp/num/numguess.jsp] http:// localhost/examples/jsp/num/numguess.jsp For the further steps like installing your servlets and jsp-files, you are responsible by yourself... Tip Environment Variables   If Tomcat fails to start and/or your servlets could not be started the most common error made is having not all needed classes in the CLASSPATH variable. ----------------------------------------------------------------------------- 9. Further Information Here are some other resources available on the internet ----------------------------------------------------------------------------- 9.1. News groups Some of the most interesting news groups are:   * [news:alt.apache.configuration] alt.apache.configuration   * [news:comp.infosystems.www.servers.unix] comp.infosystems.www.servers.unix   * [news:alt.comp.lang.php] alt.comp.lang.php   * [news:alt.php] alt.php   * [news:comp.databases] comp.databases Also check out your country newsgroups e.g ch.comp.os.linux Most newsgroups have their own FAQ that are designed to answer most of your questions, as the name Frequently Asked Questions indicate. Fresh versions should be posted regularly to the relevant newsgroups. If you cannot find it in your news spool you could go directly to the [ftp://rtfm.mit.edu/] FAQ main archive FTP site. The WWW versions can be browsed at the FAQ main archive WWW site. ----------------------------------------------------------------------------- 9.2. Mailing Lists ----------------------------------------------------------------------------- 9.2.1. Send an empty email to Before writing to the list, check out the archive: [http:// marc.theaimsgroup.com/?l=apache-httpd-users] http://marc.theaimsgroup.com/?l= apache-httpd-users ----------------------------------------------------------------------------- 9.2.2. Send an mail to with the content (not subject): +---------------------------------------------------------------------------+ | subscribe modperl | +---------------------------------------------------------------------------+ Before writing to the list, check out the archive: [http:// outside.organic.com/mail-archives/modperl/] http://outside.organic.com/ mail-archives/modperl/ ----------------------------------------------------------------------------- 9.2.3. Send an mail to with the content (not subject): +---------------------------------------------------------------------------+ | subscribe openssl-users | +---------------------------------------------------------------------------+ Before writing to the list, check out the archive: [http:// www.mail-archive.com/openssl-users@openssl.org/] http://www.mail-archive.com/ openssl-users@openssl.org/ ----------------------------------------------------------------------------- 9.2.4. Send an mail to with the content (not subject): +---------------------------------------------------------------------------+ | subscribe modssl-users | +---------------------------------------------------------------------------+ Before writing to the list, check out the archive: [http:// www.mail-archive.com/modssl-users@modssl.org/] http://www.mail-archive.com/ modssl-users@modssl.org/ ----------------------------------------------------------------------------- 9.2.5. Send an empty mail to Before writing to the list, check out the archive: [http://lists.mysql.com/ cgi-ez/ezmlm-cgi/] http://lists.mysql.com/cgi-ez/ezmlm-cgi/ ----------------------------------------------------------------------------- 9.2.6. Fill out the subscription form at [http://developer.postgresql.org/ mailsub.php] http://developer.postgresql.org/mailsub.php Before writing to the list, check out the archive: [http:// archives.postgresql.org/pgsql-general/] http://archives.postgresql.org/ pgsql-general/ ----------------------------------------------------------------------------- 9.2.7. Fill out the subscription form at [http://www.php.net/mailing-lists.php] http://www.php.net/mailing-lists.php There are several php related mailinglist to subscribe, some of them are also available on php.net's newsserver Before writing to the list, check out the archive that are linked also on the subscription-page ----------------------------------------------------------------------------- 9.2.8. Send an mail to with the content (not subject): +---------------------------------------------------------------------------+ | subscribe | +---------------------------------------------------------------------------+ ----------------------------------------------------------------------------- 9.3. HOWTO These are intended as the primary starting points to get the background information. They also show you how to solve a specific problem. Some relevant HOWTOs are [http://www.linuxdoc.org/HOWTO/Apache-Overview-HOWTO.html] Apache-Overview-HOWTO , [http://www.linuxdoc.org/HOWTO/ Apache-WebDAV-LDAP-HOWTO/index.html] Apache-WebDAV-LDAP-HOWTO , [http:// www.linuxdoc.org/HOWTO/LDAP-HOWTO.html] LDAP-HOWTO , [http://www.linuxdoc.org /HOWTO/LDAP-Implementation-HOWTO/index.html] LDAP-Implementation-HOWTO and the [http://www.linuxdoc.org/HOWTO/PHP-HOWTO.html] PHP-HOWTO The main site for these is the [http://www.linuxdoc.org/] LDP archive ----------------------------------------------------------------------------- 9.4. Local Resources Usually distributions install some documentation on your system. Usually they are located in /usr/share/doc/packages or /usr/local/share/doc The software products mentioned here provide a lot of documentation in their source-directories. Apache does install its documentation in the default DocumentRoot /usr/local/apache/htdocs/manual ----------------------------------------------------------------------------- 9.5. Web Sites There are a large number of informative web sites available. By their very nature they change quickly, so do not be surprised if these links become outdated very fast. A good starting point is of course the Linux Documentation Project home page, a central information repository for documentation, project pages and much more. To get more information about the Software mentioned in this document, then the following sites are good starting points.   * [http://httpd.apache.org] http://httpd.apache.org   * [http://www.openssl.org] http://www.openssl.org   * [http://www.modssl.org] http://www.modssl.org   * [http://perl.apache.org/] http://perl.apache.org/   * [http://www.webdav.org] http://www.webdav.org   * [http://www.mysql.com] http://www.mysql.com   * [http://www.postgresql.org] http://www.postgresql.org   * [http://www.pdflib.com] http://www.pdflib.com   * [http://www.php.net] http://www.php.net   * [http://www.phpbuilder.com] http://www.phpbuilder.com Please let me know if you have any other leads that can be of interest. ----------------------------------------------------------------------------- 10. Questions and Answers 1. FAQ 10.1.1. Is there such a HOWTO for Apache 2.0? 10.1.2. Why you don't add a description howto compile and setup mod_xyz? 10.1.3. If my clients are connecting to https://myserver.org an errormessage similar to this appears "Certificate not valid" 10.1.4. When I request a php file, the browser want to download it. Whats wrong? 10.1.5. Does this HOWTO also work on other platforms? 1. FAQ 10.1.1. Is there such a HOWTO for Apache 2.0? Not yet. The reason is that PHP 4.2.1 supports the Apache 2.0 API only experimentally and the speed of PHP is very poor with Apache 2.0. As the new Apache brings lots of new features and massive speed improvements, I will write such a HOWTO as soon as the PHP support is stable and more performant. I'm collecting now Ideas and wishes from users what they like to see in a Apache 2.0 HOWTO. Feel free to write an email to 10.1.2. Why you don't add a description howto compile and setup mod_xyz? Because nobody requested it yet and I either did not know about a mod_xyz, or I did not found it useful. Feel free to write me some suggestions what to add to the HOWTO. If there is more than one request, and it makes sense, it will maybe added in further releases. 10.1.3. If my clients are connecting to https://myserver.org an errormessage similar to this appears "Certificate not valid" A: The certificate produced like described in this HOWTO is just a self-signed certificate. This means the CA (Certification Authority) is you. Your CA is not recognized as a valid CA by your users browser. You can either install the certificate on your users machines (Makes sense in small Intranet environments) or buy a certificate from a CA that is recognized by all major browsers. An example of such a CA is Verisign [http://www.verisign.com] http: //www.verisign.com. Such a certificate cost approx. 300 USD a year, depending on the strength of the key (56 or 128 Bits) 10.1.4. When I request a php file, the browser want to download it. Whats wrong? You forgot to tell Apache what to do with the php files. So the php files are not processed by the php engine. To do so, add the application type like described in Section 6.3 10.1.5. Does this HOWTO also work on other platforms? Not sure, Solaris should work, AIX and HP-UX do not. I did not got the time to try FreeBSD yet. My goal is to provide a version of the HOWTO for all major Un*x platforms. Notes [1] This RPM contains the header files needed for php [2] Only needed if PHP is being built from the CVS tree [3] There is also version 1.4.1 of Java available, but Tomcat seems not to run with that version of Java. Apache+DSO+mod_ssl+mod_perl+php+mod_auth_nds+mod_auth_mysql+mod_fastcgi Ray Van Dolson, rayvd@firetail.org v0.91, 5 April 2000 Details the installation of an Apache based webserver suite configured to handle DSO, and various useful modules including mod_perl, mod_ssl and php. ______________________________________________________________________ Table of Contents 1. Legal Stuff 2. Introduction 2.1 Description of the Components 2.2 History 3. Component Installation 3.1 Preparations 3.2 mod_ssl 3.2.1 Installing and Compiling OpenSSL 3.2.2 Installing and Compiling RSAREF 2.0 3.2.3 Installing and Compiling MM 3.2.4 Installing and Compiling mod_ssl (at last!) 3.3 Apache 3.4 MySQL 3.5 PHP 3.0.15 3.5.1 GD 3.5.2 IMAP 3.5.3 OpenLDAP 3.5.4 Installing and Compiling PHP 3.0.15 3.6 mod_perl 3.6.1 Required Perl Modules 3.6.2 Installing and Compiling mod_perl 1.2x 3.7 mod_auth_mysql 3.8 mod_auth_nds 3.8.1 ncpfs 3.8.2 Compiling and Installing mod_auth_nds 3.9 mod_fastcgi 4. Final Words 4.1 Credits 4.2 Contact Information 4.3 Anything Else ______________________________________________________________________ 1. Legal Stuff Apache+mods mini-HOWTO for Linux Systems Copyright (C)2000 Ray Van Dolson. This document is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You can get a copy of the GNU GPL at at http://www.gnu.org/copyleft/gpl.html . 2. Introduction This document outlines the process used to install Apache & modules onto the web-server at Walla Walla College (www.wwc.edu). While it will be, for the most part, system specific--hopefully it can serve as a useful reference for performing other installations. This document will attempt to outline the exact process used to install the server. Notes will be made when things should have been gone about differently, but the original steps will be given (assuming they worked). 2.1. Description of the Components The platform on which the web-server was set up is a Red Hat 6.1 based system. Linux kernel 2.2.14 (compiled from scratch) running on a Dual PIII 600 based system with RAID5 and lots of other goodies. The web-server software is Apache 1.3.12. The following modules were added to the server: · mod_fastcgi SNAP (also mod_rewrite), for use with Zope. · Auth-MySQL 2.20 · mod_ssl 2.6.2 (Open-SSL 0.9.5) · mod_perl 1.21 · PHP 3.0.15 · mod_auth_nds 0.3a 2.2. History v0.91 (April 5, 2000) · Updated mod_fastcgi to correct version. v0.9 (April 4, 2000) · Completed first draft · Spelling/Grammar errors v0.1 (March 2000) · Initial draft 3. Component Installation 3.1. Preparations You will need the following software: · Apache 1.3.12 · PHP 3.0.15 · GD 1.3 (to make use of GIF files) · Source · RPM · RPM-devel · GD 1.8.1 (to make use of PNG files) · Source · RPM · RPM-devel · IMAP 4.5+ · Source · RPM · OpenLDAP 1.2.9+ · Source · RPM · RPM-devel · mod_perl 1.22+ · Perl5 Modules Required · MIME::Base64 · URI · HTML-Parser · Digest-MD5 · libnet · libwww · mod_ssl 2.6.2+ · OpenSSL 0.9.5 · Source · RPM · RPM-devel · RSAREF 2.0 · MM 1.0.12 · MySQL 3.22.32 · mod_auth_nds 0.4 · ncpfs 2.2.0.17 Note: the kernel must also be compiled with IPX support. · mod_auth_mysql 2.20 · mod_fastcgi SNAP Oct06 This is the directory layout scheme I use and recommend: + /usr/src | +-+ apache | +-+ apache-1.3.12 | +-+ modules | | | +-+ mod_perl | | | | | +- mod_perl-1.21 | | | | | +-+ depend | | | | | +- | | | +-+ mod_ssl | | | | | +- mod_ssl-2.6.2-1.3.12 | | | | | +-+ depend | | | | | +- openssl-0.9.5 | | | | | +- rsaref-2.0 | | | | | +- mm-1.0.12 | | | +-+ mod_fastcgi_SNAP | | | +-+ php | | | | | +- php-3.0.15 | | | | | +-+ depend | | | | | +- gd-1.3 | | | | | +- imap-4.5 | | | | | +- openldap-1.2.9 | | | +-+ mod_auth_nds | | | | | +- mod_auth_nds-0.4 | | | | | +-+ depend | | | | | +- ncpfs-2.2.0.17 | | | +-+ mod_auth_mysql | +-+ mysql Check to see if some of the above modules/software packages are already installed on your system. It usually doesn't hurt, however, to download everything and install it just in case. You might have an older version installed on your system. 3.2. mod_ssl 3.2.1. Installing and Compiling OpenSSL mod_ssl requires some sort of SSL engine be installed. OpenSSL is the natural choice for the Linux environment. You can either install it via RPM (as I did), or compile it from source. Since I did not compile it from source, you're on your own, although I would think it pretty straightforward. Most likely you'll either install it on the system (into /usr/local or something) or leave it in its directory and simply point whichever applications need OpenSSL to its directory. RPM will install OpenSSL into system directories. 3.2.2. Installing and Compiling RSAREF 2.0 Create the rsaref-2.0 directory wherever you like. For me this is in /usr/src/apache/modules/mod_ssl-blah/depend/. Change to this directory. cd rsaref-2.0 cp -rp install/unix local cd local make mv rsaref librsaref.a These commands should build you the rsaref library! Just leave the stuff here and when you need to link against it, just point the appropriate configure script to this location. 3.2.3. Installing and Compiling MM Extract mm-1.0.12 (or whichever version is most current) to the depend directory of the mod_ssl-blah subdirectory. Perform the following steps: cd mm-1.0.12 ./configure --disable-shared make This should build your mm libraries. As above, reference this path when needed. You're on your own if you want to install this library to the system. 3.2.4. Installing and Compiling mod_ssl (at last!) The normal procedure with apxs is to compile Apache first, and then, using apxs, compile the modules you want to use and insert them into the server. However, mod_ssl needs to be compiled into the server the normal way before you can use it via apxs. Once mod_ssl is in the server for the first time, you can then upgrade it via apxs without having to completely recompile Apache. Enter the directory where you are compiling mod_ssl and execute the following configuration directive (here is the file I use) for the initial compile: #!/bin/sh ./configure \ --with-apache=/usr/src/apache/apache_1.3.12 \ --with-ssl \ --with-rsa=../depend/rsaref-2.0/local \ --with-mm=../depend/mm-1.0.12 \ --enable-shared=ssl You don't need to run 'make' or anything here. When we compile Apache, it will do it all for us. This configuration line gives two examples of how your system could be set up. In my case, OpenSSL was already installed somewhere in the system (probably in /usr/lib, /usr/include). Therefore, I didn't need to pass it any location parameters. However, rsa and mm were -not- on the system, and I compiled them myself and left them within their source trees (didn't run make install, et al). In that case, you need to point configure to the appropriate directory so it can find the headers/libraries. From this point on, unless you upgrade Apache (in which case you'd need to perform the above step again for the new version of Apache), you can use apxs to upgrade and recompile mod_ssl. Here is the configure script I use for this: ./configure \ --with-apxs=/apps/apache-1.3.12/bin/apxs \ --with-ssl=../depend/openssl-0.9.4 \ --with-rsa=../depend/rsaref-2.0/local \ --with-mm=../depend/mm-1.0.12 Or some combination of the above. Then run: make make install make distclean to complete the installation. Notes: MM is -not- required to compile mod_ssl. If you're having problems getting it to work, simply omit compiling it and also from the ./configure line(s). When I compiled mod_ssl, I had errors regarding DBM. To fix this, I had to add -lndbm to the Makefile: · Run the above configure script. · cd to pkg.sslmod · Edit the makefile and add -lndbm to LIBS_SHLIB. It should look like: · LIBS_SHLIB=-lm -lcrypt -lndbdm Hopefully that will save you some grief. 3.3. Apache Extract apache-1.3.12.tar.gz to /usr/src/apache or wherever. Next we want to compile Apache enabling the following options: · mod_ssl (In order to compile mod_ssl as a DSO, it has to be first compiled into the server normally. After doing this, the module can then be upgraded via apxs.) · mod_proxy · mod_so · mod_rewrite (For use with Zope) Here is the configuration file I used to initially compile Apache: #!/bin/sh SSL_BASE=../depend/openssl-0.9.4 \ RSA_BASE=../depend/rsaref-2.0/local \ EAPI_MM=../depend/mm-1.0.12 \ ./configure \ --enable-module=ssl \ --enable-module=proxy\ --enable-shared=proxy\ --enable-module=rewrite \ --enable-shared=rewrite \ --prefix=/apps/apache-1.3.12 \ --enable-shared=ssl \ --enable-rule=SHARED_CORE \ --enable-rule=SHARED_CHAIN \ --enable-module=so Then run make make certificate make install Apache should now be compiled and installed into whicever directory you specified with --prefix. Test it out and make sure it starts up. /path/to/apache/bin/apachectl start or /path/to/apache/bin/apachectl startssl Hopefully it all runs smoothly. If not, trace back over your steps and ensure you didn't forget anything. 3.4. MySQL php as well as mod_auth_mysql and possibly mod_perl will require that MySQL be installed and running on your system. It is beyond the scope of this document to go into the details of installing MySQL, but download the archive and follow the directions in the INSTALL file(s). It is a fairly straight-forward procedure to get MySQL up and running. Something like: ./configure make make install Should get everything installed so that you can compile the other Apache modules. 3.5. PHP 3.0.15 We will compile php-3.0.15 as a DSO which means that it is a separate module that can be loaded and unloaded from the server. This makes it easy to upgrade php without having to recompile everything (which can be a pain if you use a lot of modules with Apache). 3.5.1. GD In our installation of Apache, php uses gd to create images, and such. I used an older version of gd (installed via RPM) to link php against. This way we can use output GIF files. This probably isn't too desirable do to copyright issues, and thus you may wish to use a version later than 1.3 which only supports PNG files. Either install via RPM (rpm -i gd*.rpm) or compile from source and install to the system. 3.5.2. IMAP If you want IMAP support, the procedure is similar to that of gd. I used the RPM since I'm on a Red Hat system, but installing from source should be a relatively simple procedure of ./configure;make;make install. 3.5.3. OpenLDAP Once again you can install OpenLDAP either via RPM or source. I chose to do it via source since the latest version was not yet available via RPM at the time we were setting things up. ./configure make make install should do the trick! (Or rpm -i openldap*.rpm) 3.5.4. Installing and Compiling PHP 3.0.15 Once the above items are installed and working, we can go ahead and compile PHP as a DSO. The process is very straightforward and simple. cd /usr/src/apache/modules/php/php-3.0.15 ./configure \ --with-apxs=/apps/apache/bin/apxs \ --with-config-file-path=/apps/etc \ --with-gd \ --with-imap \ --with-mysql=/apps/mysql \ --with-ldap=/apps \ --with-zlib \ --enable-track-vars Make sure that if any of your --with libraries are not installed in /usr/local or /usr, that you tack on an =/location/ line so that configure can find the stuff it needs! make make install If everything completes properly, 'make install' will use apxs to install libphp3.so to /apache/libexec/libphp3.so and add the proper entries into httpd.conf and activate php3. Pretty slick. 3.6. mod_perl This section documents the installation of mod_perl as a DSO for Apache. There are a number of perl modules (in addition, of course, to perl5, which I will assume you already have installed) that must be added before mod_perl will compile without complaining. If you don't install these modules, mod_perl should complain and tell you which ones you are missing. There is a certain order in which the modules must be installed. Some depend on others and thus I've listed the install order that I used without any problems. 3.6.1. Required Perl Modules The perl modules can be obtained from locations detailed further up in this document. Download them and put them wherever you like or in the location I used as depicted in the directory map (also above). Installing a module is fairly simple. After extracting the module to a directory (usually with tar xvfz), you simply change to that directory and execute the following commands: perl Makefile.PL make make install If everything goes as it should, this will configure, build and install the perl module for you. Of course, check the README for each module if things don't work quite as expected. Here is the order I used to install the modules necessary for mod_perl: 1. MIME::Base64 2. URI 3. HTML::Parser 4. Digest-MD5 5. libnet 6. libwww 3.6.2. Installing and Compiling mod_perl 1.2x After installing the perl modules, we're ready to compile and install mod_perl into Apache. Change to the directory where you extracted mod_perl to, and run the following script: perl Makefile.PL \ USE_APXS=1 \ WITH_APXS=/path/to/apache/bin/apxs \ EVERYTHING=1 This will set up your Makefile and tell mod_perl to compile itself as a DSO using apxs (the location of which you must specify). After this step, simply run make make install And mod_perl will be moved to the appropriate directory and lines added to your httpd.conf file. 3.7. mod_auth_mysql mod_auth_mysql lets the Apache web-server authorize against a MySQL user database. Installation of the module as a DSO isn't exactly documented in the README file, but it can be done. First, change to the directory you extracted mod_auth_mysql to. I assume that you have MySQL installed somewhere (along with the headers, etc). Make sure you know the location of the MySQL libraries and header files. If in doubt, check /usr/lib/mysql and /usr/include/mysql. In order to compile mod_auth_mysql, we'll first have to rename the 'config.h' file to 'auth_mysql_config.h'. I'm not sure why this file wasn't named correctly, but simply execute the following command: cp config.h auth_mysql_config.h Now for the final step: /path/to/apache/bin/apxs -i -a -I/usr/include/mysql -L/usr/lib/mysql \ -lmysqlclient -c mod_auth_mysql.c You may need to run as root if you do not have read/write access to the Apache directory. 3.8. mod_auth_nds At my school, the Windows network of choice is Netware. It's been in place for a long time, and although hopefully someday it will be retired, for now it is still the main network on campus for filesharing and email. Every student has a Netware account on which their personal files--including their webpages are stored. We mount these directories on our linux server and it's nice to be able to password protect certain ones with the Netware username and password information. With this module, Apache can authenticate straight to the Netware server itself. 3.8.1. ncpfs In order to compile mod_auth_nds, we need to have ncpfs installed (along with its headers of course). Before compiling ncpfs, you must ensure that your kernel has IPX support compiled in. If this is the case, simply running ./configure make make install (optional) will compile (and install) the libraries. 3.8.2. Compiling and Installing mod_auth_nds With ncpfs installed, running the following command should compile mod_auth_nds as a DSO: /path/to/apache/bin/apxs -c -lncp -L/usr/lib -I/usr/include mod_auth_nds.c /path/to/apache/bin/apxs -i mod_auth_nds.so Then add the following lines to your httpd.conf (by hand): LoadModule nds_auth_module libexec/mod_auth_nds.so AddModule mod_auth_nds.c Then, restart Apache! 3.9. mod_fastcgi Installing mod_fastcgi is necessary if you want to allow access to your Zope server through Apache. This might be useful simply because Apache is inherently more secure and much more configurable than the Zope server itself. The current stable version of mod_fastcgi is 2.2.2, however, this version does not work properly with Zope. You must get the SNAP release which is dated Oct 06. The link is provided above. Change to the mod_fastcgi directory and run the following commands: /path/to/apache/bin/apxs -o mod_fastcgi.so -c *.c /path/to/apache/bin/apxs -i -a -n fastcgi mod_fastcgi.so See the mod_fastcgi documentation for a description of its use. 4. Final Words Much of this information can be obtained by reading the README and INSTALL files included with the various modules. However, this document is useful in the cases which didn't work as expected for me, or else for which the installation procedure was not as well defined as I would have liked. It also has the added benefit of being one, sequential document, which should hopefully be easier to follow and understand than a slew of README files. 4.1. Credits Phillip R. Wilson , author of mod_auth_nds, for helping me get mod_auth_nds to compile and install with apxs. John Ash , my boss, for all sorts of help and of course, a job. Marcus Faure , author of the Apache SSL PHP/FI frontpage mini- HOWTO, whose document this one is loosely based on. 4.2. Contact Information If you find any blatant errors in this document, spelling, grammatical, content or otherwise, please don't hesitate to drop me an email. You can get ahold of me via a number of means. Ray Van Dolson Email: IRC: DALnet, #Bludgeon (nick Variant) 4.3. Anything Else Everything mentioned in this document will eventually be available for ftp from ftp.wwc.edu/pub/apache. I will have everything laid out as described above, and hopefully installation scripts to install everything from scratch. (A very dumb script mind you). Apache Overview HOWTO Daniel Lopez Ridruejo, ridruejo@apache.org v0.9, 2002-10-10 This document gives you an overview of the different Apache projects, such as the Apache HTTP server and the Tomcat Servlet and JSP engine. It provides pointers for further information and implementation details. ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Apache Software Foundation 1.2 Structure of this document 2. Apache 2.1 Architecture 2.1.1 2.1.1 Apache 1.3 2.1.1.1 Process-based Web server 2.1.1.2 Windows support 2.1.1.3 Modular 2.1.2 2.1.2 Apache 2.0 2.1.2.1 Multi Processing Modules 2.1.2.2 Protocol Modules 2.1.2.3 Module and filter architecture. 2.1.2.4 Compatibility issues 2.1.2.5 Portable 2.2 Security 2.2.1 Authentication 2.2.2 Access Control 2.2.3 SSL/TLS 2.3 Proxy 2.4 Performance and scalability 2.4.1 Load Balancing 2.4.2 Compression 2.5 CGI scripts 2.6 Development Platform Integration 2.6.1 Perl 2.6.2 PHP 2.6.3 Python 2.6.4 Tcl 2.6.5 Microsoft technologies 2.6.5.1 .Net 2.6.5.2 ASP 2.6.5.3 ISAPI 2.6.6 Java 2.6.7 Modules for other languages 2.7 Management 2.7.1 Build tools 2.7.2 User Interfaces for Apache 2.7.3 SNMP 2.8 Publishing 2.9 Protocol modules 2.10 Virtual Hosting 2.11 Commercial support 3. ASF Projects 3.1 Applications and Frameworks 3.1.1 3.1.1 Servers 3.1.1.1 Tomcat 3.1.1.2 JAMES (Java Apache Mail Enterprise Server) 3.1.1.3 Lucene 3.1.1.4 Jetspeed 3.1.2 3.1.2 Content management 3.1.2.1 Slide 3.1.2.2 Alexandria 3.1.3 3.1.3 Frameworks 3.1.3.1 Turbine 3.1.3.2 Avalon 3.2 Presentation 3.2.1 Cocoon 3.2.2 Velocity 3.2.3 AxKit 3.2.4 Xalan 3.2.5 FOP 3.3 Parsers and Document Access libraries 3.3.1 Xerces 3.3.2 Batik 3.3.3 POI 3.4 Interoperability 3.4.1 SOAP 3.4.2 XML-RPC 3.4.3 XML security 3.5 Development 3.5.1 Apache Portable Runtime 3.5.2 Ant 3.5.3 Byte Code Library 3.5.4 Log4j 3.5.5 ORO and Regexp 3.5.6 Struts 3.5.7 Taglibs 3.5.8 Database 3.5.9 Commons 3.6 Testing 3.6.1 httpd-test 3.6.2 Cactus 3.6.3 JMeter 3.6.4 Lakta 3.6.5 Watchdog 4. Where to find more information 4.1 Websites 4.2 Books 4.3 Support forums 5. Contacting the Author 5.1 Translations 6. Open Content Open Publication License 6.1 REQUIREMENTS ON BOTH UNMODIFIED AND MODIFIED VERSIONS 6.2 COPYRIGHT 6.3 SCOPE OF LICENSE 6.4 REQUIREMENTS ON MODIFIED WORKS 6.5 GOOD-PRACTICE RECOMMENDATIONS 6.6 LICENSE OPTIONS ______________________________________________________________________ 1. Introduction This document gives you an overview of the Apache world, including Apache Software Foundation projects such as the Apache web server and commercial and open source third party software. Apache is the most popular server on the Internet . New Apache users, especially those coming from a Windows background, are often unaware of the possibilities of Apache, its useful addons and, more in general, how everything works together. This document aims to show a general picture of such possibilities with a brief description of each one and pointers for further information. The information has been gathered from many sources, including projects' web pages, conference talks, mailing lists, Apache websites and my own hands-on experience. Full credit is given to these authors. Without them and their work, this document would not have been possible or necessary. Copyright 2002 Daniel Lopez Ridruejo Permission is granted to copy, distribute and/or modify this document under the terms of the Open Content Open Publication License, Version 1.1. A copy of the license is included in the appendix entitled "Open Content Open Publication License", or at www.opencontent.org/openpub/. 1.1. Apache Software Foundation The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users. The ASF is home to many successful Open Source projects, such as the Tomcat Servlet/JSP engine and the ANT build tool. You can learn more about the foundation here . 1.2. Structure of this document The first part of this document deals with the Apache Web Server and related modules. It covers the history, architecture and capabilities of the server and describes ways in which you can extend and customize it. The second part of this document covers projects of the Apache Software Foundation, such as those form the Jakarta and Java XML communities. Rather than organizing the projects around a certain programming language or technology, they are organized based on functionality provided. 2. Apache Apache is the leading internet web server, with over 60% market share, according to the Netcraft survey . Several key factors have contributed to Apache's success: · The Apache license . It is an open source, BSD-like license that allows for both commercial and non-commercial uses of Apache. · Talented community of developers with a variety of backgrounds and an open development process based on technical merits. · Modular architecture. Apache users can easily add functionality or tailor Apache to their specific enviroment. · Portable: Apache runs on nearly all flavors of Unix (and Linux), Windows, BeOs, mainframes... · Robustness and security. Many commercial vendors have adopted Apache-based solutions for their products, including Oracle , Red Hat and IBM . In addition, Covalent provides add-on modules and 24x7 support for Apache. The following websites use Apache or derivatives. Chances are that if Apache is good enough for them, it is also good enough for you :) · Amazon.com · Yahoo! · W3 Consortium · Financial Times · Apple · MP3.com · Stanford >From the Apache website : The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server for modern operating systems including UNIX and Windows NT. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards. Apache started its life as modifications to the NCSA Web server, one of the first HTTP servers. You can learn more about Apache's history here : The Apache project has grown beyond building just a web server into developing other critical server side technologies. The Apache Software Foundation, described in a later section, serves as an umbrella for these projects. 2.1. Architecture There are two main versions of Apache, the 1.3 series and the 2.0 series. Although both versions are considered production quality, they differ in architecture and capabilities. 2.1.1. 2.1.1 Apache 1.3 Apache 1.3 has been ported to a great variety of Unix platforms and is the most widely deployed Web server on the Internet. 2.1.1.1. Process-based Web server Apache 1.3 on Unix is a process-based Web server. The Apache program forks several children at startup. Forking means that a parent process makes identical copies of itself, called children. Each one of the children can serve a request independent of the others. This approach has the advantage of improved stability: If one of the children misbehaves (runs out of control or has memory leaks) it can be terminated without affecting the others. The stability comes with a performance penalty. In most Unix operating systems, creating processes and context switching (assigning processor time to each process) are expensive operations. Since processes are isolated from each other, they cannot easily share code and data, consuming system resources. 2.1.1.2. Windows support Apache 1.3 is the first version of Apache to support Windows, although the port is not considered to be as stable as its Unix counterparts. This is due to the fact that the server had been designed with Unix in mind and the Windows port was a later addition that did not integrate very well. 2.1.1.3. Modular Apache 1.3 has a modular architecture. You can enable or disable modules to add and remove Web server functionality. You can customize Apache to improve performance and security. In addition to modules bundled with the server, there is a great number of third party modules, providing extended functionality. 2.1.2. 2.1.2 Apache 2.0 Apache 2.0 is the latest and greatest version of the Apache server. The architecture contains significant improvements over the 1.3 series. The following are some of them. 2.1.2.1. Multi Processing Modules Apache 2.0 abstracts the request processing architecture in special server modules, called Multi Processing modules (MPMs). This means that Apache can be configured to be a pure process-based server, a purely threaded server or a mixture of those models. Threads are contained inside processes and run simultaneously. Unlike processes, threads can share data and code. Threads are thus more "lighweight" than processes, and in most cases threaded servers scale better than process based servers. The disadvantage is that the server is less reliable, since if a thread misbehaves it can corrupt data or code belonging to other threads. 2.1.2.2. Protocol Modules The protocol handling has been encapsulated in its own layer in Apache 2.0. That means it is possible to write modules to serve protocols other than HTTP, such as POP3 for mail or FTP for file transfer. These protocol modules can take advantage of a solid server framework and module functionality, such as authentication and dynamic content generation. This means that, for example, you can authenticate your POP3 users against the same user database Apache uses for web requests and that FTP content can be generated dynamically using PHP, CGI or any other technologies explained later in this document. 2.1.2.3. Module and filter architecture. Apache 2.0 maintains the 1.3 modular architecture and adds an additional extension mechanism: filters. Filters allow modules to modify the content generated by other modules. They can encrypt, scan for viruses or compress not only static files but dynamically generated content. 2.1.2.4. Compatibility issues Unfortunately, though the module API is similar between versions, they are not identical and Apache 1.3 modules need to be ported to the new architecture. Most mainstream modules such as PHP and mod_perl already have Apache 2.0 versions and others, such as mod_dav and mod_ssl, are now part of the server distribution. Running modules on a threaded architecture requires specific changes to modules. Modules distributed with Apache have undergone those changes and are considered `thread- safe', but third-party modules or libraries may not. If you need one of those, you will be limited to running Apache as a pure process- based server. 2.1.2.5. Portable Apache runs equally well now on Windows and Unix platforms thanks to the Apache Portable Runtime (APR) library. It abstracts the differences among operating systems, such as file or network access APIs. Porting Apache to a new platform is often as simple as porting the Apache Portable Runtime. This abstraction layer also provides for platform-specific tuning and optimization. 2.2. Security Apache provides several security-related modules for securing and restricting access to the server. 2.2.1. Authentication Authentication modules allow you to determine the identity of a client, usually by verifying an username and password against a backend database. Apache includes modules to authenticate against plain text and database files. Additional authentication modules exist that connect Apache to existing security frameworks or databases, including: NT Domain controller, Oracle, mySQL, PostgresSQL and so on. The LDAP modules are specially interesting, as they allow integration with company and enterprise wide existing directory services. You can find these modules at . An Apache 2.0 LDAP module can be found at the Apache website . 2.2.2. Access Control Apache provides the mod_access module that can restrict access to resources based on parameters of the client request, such as the presence of a specific header or the IP address or hostname of the client. Third party modules allow you to restrict access to clients that misbehave, as explained in later sections on performance and bandwidth control. 2.2.3. SSL/TLS The Secure Sockets Layer/Transport Layer Security protocols allow data between the Web server and client to be encrypted. In Apache 1.3, the protocols are implemented by mod_ssl, which is distributed separately from the mod_ssl website and requires applying patches to the server. This was necessary because of export regulations on encryption. Most of those restrictions have since then being lifted and starting with Apache 2.0, mod_ssl is now included as a base module with Apache. 2.3. Proxy A proxy is a program that performs requests on behalf of another. There are different kind of Web proxies. A traditional HTTP proxy, also called a forward proxy, accepts requests from clients (usually Web browsers), contacts the remote server, and returns the responses. A reverse proxy is a Web server that is placed in front of other servers, providing a unified front end and offloading certain tasks, such as SSL processing, from the backend Web servers. Apache supports both types of proxy, caching of proxied content and differente proxy backends such as FTP. 2.4. Performance and scalability Raw performance is only one of the factors to consider in a web server (flexibility and stability come usually first). Having said that, there are solutions to improve performance on heavy loaded webservers serving static content. If you are in the hosting business Apache also provides ways in which you can measure and control bandwidth usage. Throttling in this context usually means slowing down the delivery of content based on the file requested, a specific client IP address and so on. This is done to prevent abuse. · mod_mmap: Included in current Apache 1.3 releases, it maps to memory a statically configured list of files that are frequently requested but infrequently changed. This functionality is included in mod_file_cache in Apache 2. · Mod_bandwidth : This Apache 1.3 module enables the setting of server-wide or per connection bandwidth limits, based on the specific directory, size of files and remote IP/domain. · Bandwidth share module : provides bandwidth throttling and balancing by client IP address. It supports Apache 1.3 and earlier versions of Apache 2. · Mod_throttle :Throttle bandwidth per virtual host or user. For Apache 1.3 2.4.1. Load Balancing Using the Apache reverse proxy and mod_rewrite you can have an Apache process distributing requests among a variety of backend web servers. You can find more information at Additionally, mod_backhand is an Apache 1.3 module that allows seamless redirection of HTTP requests from one web server to another. This redirection can be used to target machines with under-utilized resources, thus providing fine-grained, per-request load balancing of web requests. You can find more information at . 2.4.2. Compression Apache 2.0 includes mod_deflate, a filtering module that compresses content before delivering it to clients. This saves bandwidth but can have a performance impact. The mod_gzip module provides this functionality for Apache 1.3 2.5. CGI scripts CGI stands for Common Gateway Interface. CGI programs are external programs that are called when a user requests a certain page. The CGI program receives information from the web server (form variable values, type of browser, IP address of the client and so on) and uses that information to output a web page to the client. Apache has support for CGIs and there is a third-party Apache 1.3 module that provides support for the FastCGI protocol. It avoids the performance penalties associated with starting and stopping a CGI program with every request. You can find it at 2.6. Development Platform Integration Web applications are written in high-level languages such as Java, Perl, C# and so on and Apache has several modules that integrate them with the server. In many cases the modules expose the Apache API so entire Apache modules can be written in those languages. 2.6.1. Perl mod_perl is one of the most veteran and successful Apache projects. It embeds a Perl interpreter in Apache and allows access to the web server internals from Perl. This allows for entire modules to be written in Perl or a mixture of Perl and C code. In the 1.3 Apache versions, one interpreter has to be embedded in each child, since the server is multiprocess based. In heavy traffic dynamic sites, the increased size could make a difference. In threaded versions of Apache 2.0 mod_perl allows for sharing of code, data and session state among interpreters. This results in a faster, leaner solution. mod_perl is in itself another platform, with a great variety of modules available such as Mason and Embperl for embedding Perl in HTML pages and AxKit for XML-driven templates. 2.6.2. PHP From the PHP website: PHP is a server-side, cross-platform, HTML embedded scripting language. It is the most popular module for Apache and this is due to a variety of reasons: · Learning curve is quite low · Great documentation · Extensive database support · Modularity PHP has a modular design. Among many others, there are modules that provide support for: · Database connetivity for popular databases such as Oracle, MS-SQL server, ODBC interface, MySQL, mSQL, PostgreSQL and so on. · XML support · File transfer: FTP · HTTP · Directory support: LDAP · Mail support: IMAP, POP3, NNTP · PDF document generation · CORBA · SNMP You only need to compile/use the modules you need. PHP can be used with Apache, as an external CGI or with other webservers. It is crossplatform and it runs on most flavors of Unix and Windows. If you come from a Windows background, you probably have used Internet Information Server with Active Server Pages and MS-SQL Server. A common replacement in the Unix world for this trio is Apache with PHP and MySQL. Since PHP works: · with Apache and with Microsoft IIS · with MySQL and with MS-SQL server · on Unix and on Windows you have a nice, gradual migration path from a Microsoft-centric solution to Unix based solutions. 2.6.3. Python Python is a popular object oriented scripting language. Mod_Python , which is now an official Apache project, allows you to integrate Python with the Apache web server. You can develop complex web applications or accelerate existing Python CGI scripts. Recent versions run on Apache 2.0. 2.6.4. Tcl The Tcl Apache project integrates Tcl with the Apache webserver. Tcl is a lightweight, extensible scripting language. You can learn more about Tcl here . There are several modules currently under the Apache Tcl umbrella: · Both Mod_dtcl and Neowebscript allow embedding Tcl on HTML pages. Rivet combines the best of both modules. · Mod_tcl takes an approach similar to mod_perl, exposing the Apache API. · WebSH provides a Tcl Web application environment 2.6.5. Microsoft technologies Several modules allow integration with Microsoft languages and technologies such as the .Net framework or Active Server Pages. 2.6.5.1. .Net mod_haydn integrates Mono with Apache and exposes the Apache API to the .Net framework, allowing you to write modules in C#, for example. Covalent provides mod_asp.net, an commercial Windows module that allows Apache to run ASP.Net applications, allowing you to replace Microsoft IIS. 2.6.5.2. ASP ASP stands for Active Server Pages and is a Microsoft technology that allows you to embed code, usually Visual Basic, in HTML pages. Several companies such as ChilliSoft and Stryon provide products that can run ASP applications on Unix environments. 2.6.5.3. ISAPI ISAPI is an API that you can use to extend Microsoft IIS, similarly to how you would use the Apache API. Apache includes a module mod_isapi that mirrors this functionality and allows you to run ISAPI modules. 2.6.6. Java Most applications servers, such as those from Oracle, IBM and BEA provide modules to integrate with the Apache web server. Additionally, several modules such as mod_jk and mod_webapp allow you to connect to Tomcat, a Servlet and JavaServer Pages container that is also part of the Apache Software Foundation. 2.6.7. Modules for other languages This document has described modules for popular server side languages such as Perl, Python and PHP. You can find additional language modules (JavaScript, Haskell, Ruby and others) at the Apache modules directory . 2.7. Management An important part of Web server administration includes building, configuring and monitoring different servers. 2.7.1. Build tools Apache can be extended and customized in many different ways. Integration of different modules with the server can sometimes be a difficult task. Tools such as the Apache Toolbox can make this task easier, by providing a menu driven build framework. 2.7.2. User Interfaces for Apache Apache is configured thru text configuration files, and that sometimes can be hard, specially for people coming from a Windows background. There are open source graphical tools that make this task easier: · Comanche , by yours truly, is crossplatform and runs on Unix/Linux, Windows and Mac. · Webmin : A nice web based interface. · gui.apache.org : GUI interfaces for Apache project. Programs are in various degrees of development. 2.7.3. SNMP SNMP stands for Simple Network Management Protocol. It allows monitoring and management of network servers, equipment and so on. SNMP modules for Apache help manage large deployments of web servers, measure the quality of service offered and integration of Apache with existing management frameworks. · Open source Mod SNMP for Apache 1.3. · Covalent SNMP provides a commercial SNMP module, support for the latest SNMPv3 standard, integration with HP-Openview, Tivoli and so on. 2.8. Publishing Authors of Web content require a means of managing that content and uploading it to the server. One of the protocols used for this purpose is DAV (Distributed Authoring and Versioning). DAV is an extension to the HTTP protocol that enables users and applications to publish and modify Web content. DAV technology is widely implemented, Microsoft supports it at the operating system level (WebFolders) and in its Office suite. Same goes for Apple OS-X and a variety of third party products from Adobe, Oracle and so on. You can get the mod_dav module for Apache 1.3 at . In Apache 2.0, mod_dav is included with the base distribution. Previous to DAV, Microsoft had its own publishing protocol, integrated with the Microsoft FrontPage tool. You can add server-side support for Frontpage using the modules at , though due to the way they integrate with Apache they are not considered secure. 2.9. Protocol modules Apache 2.0 introduced the concept of protocol modules. That means that developers can reuse the Apache server framework to implement new protocols such as those dealing with mail and file transfer. mod_ftp is a commercial Apache-based FTP module from Covalent . mod_pop3 is an open source module that implements the POP3 protocol, commonly used by mail readers to retrieve messages from mail servers. 2.10. Virtual Hosting Apache provides extensive virtual hosting support which means that you can serve multiple websites from a single server. In Apache 2.0, with the per-child MPM you can have multiple children, each one serving a different domain under different Unix user ids. This is very important for security in shared hosting scenarios, as it allows you to isolate customers from each other. The following are additional, alternative, virtual hosting modules. · mod_dynvhost · mod_pweb · mod_v2h 2.11. Commercial support Apache is the web server of choice for many commercial entities, including big enterprises. These companies have certain requirements when adopting a technology, specially one that is at the core of their Internet strategy, such as Web servers. Those requirements include performance, stability, management capabilities, support, professional services and integration with legacy systems. A number of commercial companies, such as IBM , Red Hat and Covalent , provide the products and services necessary to make Apache meet the needs of Enterprise customers. In addition, many other companies and OEMs ship Apache as a bundled web server with their products. 3. ASF Projects Although Apache is probably the most popular, the Apache Software Foundation is home to many other projects. This section provides an overview of the most relevant ones, organized logically. Most of them belong either to the Jakarta project and the XML project. The Jakarta project hosts Java-based projects and the XML project hosts, surprise, XML-related projects. 3.1. Applications and Frameworks The following are application and development frameworks that are part of the ASF. 3.1.1. 3.1.1 Servers The following are some ASF server projects. 3.1.1.1. Tomcat Tomcat is the flagship product of the Jakarta project. It is the official reference implementation for the Java Servlet and JavaServer Pages technologies. You can learn more in the Tomcat homepage . 3.1.1.2. JAMES (Java Apache Mail Enterprise Server) Complementary to the other Apache server side technologies, JAMES provides a 100% pure Java server designed to be a complete and portable enterprise mail engine solution based on currently available open protocols (SMTP, POP3, IMAP, HTTP) More information can be found here . 3.1.1.3. Lucene Jakarta Lucene is a high-performance, full-featured text search engine written in Java and part of the Jakarta project. You can find more information at 3.1.1.4. Jetspeed Jetspeed is a web based portal written in Java. It has a modular API that allows aggregation of different data sources (XML, SMTP, iCalendar) 3.1.2. 3.1.2 Content management The following are projects related to content management 3.1.2.1. Slide Slide is a high-level content management framework. Conceptually, it provides a hierarchical organization of binary content which can be stored into arbitrary, heterogenous, distributed data stores. In addition, Slide integrates security, locking and versioning services. It also provides a WebDAV server and client implementation. You can learn more at the Slide home page . 3.1.2.2. Alexandria Alexandria is an integrated documentation management system. It brings together technologies common to many open source projects like CVS and JavaDoc. The goal is to integrate source code and documentation to encourage code documentation and sharing. More information at 3.1.3. 3.1.3 Frameworks The following are application development frameworks. 3.1.3.1. Turbine Turbine is a servlet based framework that allows experienced Java developers to quickly build secure web applications. Turbine brings together a platform for running Java code and reusable components. Some of its features include: Integration with template systems, MVC style development, Access Control Lists, localization support and so on. You can find more information at the Turbine web site . 3.1.3.2. Avalon If you are familiar with Perl or BSD systems, Avalon is roughly the equivalent of CPAN or the Ports collection for Java Apache technologies. It does not only provide guidelines for a common repository of code, it goes one step further: is an effort to create, design, develop and maintain a common framework for server applications written using the Java language. It provides the means so server side Java projects can be easily integrated and build on each other. You can find more information at the Avalon web site . 3.2. Presentation The following template systems, transformation engines and other presentation related projects. 3.2.1. Cocoon Cocoon leverages other Apache XML technologies like Xerces, Xalan and FOP to provide a comprehensive XML publishing framework. The framework can talk to many different data sources and can transform the content into several different delivery formats such as PDF, HTML, XML and RTF. It can run as a servlet or as a command line program. You can learn more about Cocoon at the project homepage 3.2.2. Velocity Velocity is a Java based template engine. It can be used as a stand- alone utility for generating source code, HTML, reports, or it can be combined with other systems to provide template services. Velocity has a Model View Controller paradigm that enforces separation of Java code and the HTML template. You can learn more about Velocity here . 3.2.3. AxKit AxKit is a popular XML-based Application Server for mod_perl and Apache. It allows separation of content and presentation and provides on-the-fly conversion from XML to any format. 3.2.4. Xalan Xalan is an XSLT processor available for Java and C++. XSL is a style sheet language for XML. The T is for Transformation. XML is good at storing structured data (information). You sometimes need to display this data to the user or apply some other transformation. Xalan takes the original XML document, reads transformation configuration (stylesheet) and outputs HTML, plain text or another XML document. You can learn more about Xalan at the Xalan Java and Xalan C++ project homepages. 3.2.5. FOP From the website: FOP is a Java application that reads a formatting object tree and then turns it into a PDF document. So FOP takes an XML document and outputs PDF, in a similar way that Xalan does with HTML or text. You can learn more about FOP here . 3.3. Parsers and Document Access libraries The following are different libraries that can be used to parse and manipulate a variety of document formats. 3.3.1. Xerces The Xerces project provides XML parsers for a variety of languages, including Java, C++ and Perl. The Perl bindings are based on the C++ sources. An XML parser is a tool used for programatic access to XML documents. This is a description of the standards supported by Xerces: · DOM : DOM stands for Document Object Model. XML documents are hierarchical by nature (nested tags). XML documents can be accessed thru a tree like interface. The process is as follows: · Parse document · Build tree · add/delete/modify nodes · Serialize tree · SAX :Simple API for XML. This is a stream based API. This means that we will receive callbacks as elements are encountered. These callbacks can be used to construct a DOM tree for example. · XML Namespaces · XML Schema: The XML standard provides the syntax for writing documents. XML Schema provides the tools for defining the contents of the XML document (semantics). It allows to define that a certain element in the document must be an integer between 10 and 20 or contain an IP address. The Xerces XML project initial code base was donated by IBM. You can find more information in the Xerces Java , Xerces C++ and Xerces Perl homepages. 3.3.2. Batik Batik is a Java based toolkit for applications that want to use images in the Scalable Vector Graphics (SVG) format for various purposes, such as viewing, generation or manipulation. It is XML centric and compliant with the W3C specification. It is a bit atypical from other Apache projects, in that it provides a graphical component. Batik provides hooks to extend the framework thru custom tags and it allows conversion from SVG to other formats like JPEG or PNG. You can learn more at the Batik homepage 3.3.3. POI The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format using pure Java. This includes Word and Excel documents. You can find more information at 3.4. Interoperability The following are libraries for remote communication and interoperability between servers. 3.4.1. SOAP Apache SOAP ("Simple Object Access Protocol") and Axis are implementations of the SOAP protocol SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts: · An envelope that defines a framework for describing what is in a message and how to process it, · a set of encoding rules for expressing instances of application- defined datatypes, and · a convention for representing remote procedure calls and responses. Basically you can think of SOAP as an remote procedure call system, based on HTTP and XML. On the one hand this means it is verbose and slow compared to other systems. On the other hand it eases interoperatibility, debugging and development of clients and servers for a variety of languages since most modern languages have HTTP and XML modules. You can learn more at the Apache SOAP homepage 3.4.2. XML-RPC The XML-RPC project is a Java implementation of the XML-RPC protocol, a light-weight protocol similar and predecessor to SOAP. 3.4.3. XML security The XML security project provides XML document signature verification for secure exchange of documents. 3.5. Development 3.5.1. Apache Portable Runtime The APR project provides a portability layer that abstracts a number of APIs for file manipulation, network access and so on. It is written in C and works on most Unix flavors, Windows and a variety of other systems. It is the basis for Apache 2.0 3.5.2. Ant Ant is a Java based build tool. It has a modular API and can be extended by creating new tasks. It is driven by XML configuration files. 3.5.3. Byte Code Library The Byte Code Engineering Library (BCEL) is a library to analyze, create, and manipulate binary Java class files. 3.5.4. Log4j This package provides a logging framework that Java applications can use. It can be enabled at runtime without modifying the binary and has been designed with performance in mind. It can be found at 3.5.5. ORO and Regexp ORO is a complete package that provides regular expression support for Java. It includes Perl5 regular expression support, glob expressions and so on. All under the Apache license. You can learn more about ORO at . There is another ASF lightweight regular expression package, Regexp . 3.5.6. Struts Struts is an Apache project that tries to bring the Model-View- Controller (MVC) design paradigm to web development. It builds on Servlet and JavaServer Pages technologies. The model part is made up of Java server objects, which represent the internal state of the application. The view part is constructed via JavaServer Pages (JSP), which is a combination of static HTML/XML and Java. JSPs also allow the developer to define new tags. The controller part consists of servlets, which take requests (GET/POST) from the client, perform actions on the model and update the view by providing the appropriate JSP. You can learn more at the Struts project pages . 3.5.7. Taglibs The JavaServer pages technology allows developers to provide functionality by adding custom tags. The Taglibs project intends to be a common repository for these extensions. It includes tags for common utilities (i.e. date), SQL database access and so on. You can learn about TagLibs at . More documentation is included in the package. 3.5.8. Database OJB is a database mapping tool that allows persistance and storage of Java objects in relational databases. Xindice is a native XML database for storing and querying XML documents. 3.5.9. Commons The Commons project provides a great variety of reusable Java components with minimal dependencies. 3.6. Testing The following ASF projects cover testing and performance analisys. 3.6.1. httpd-test The httpd-test project provides a testing framework for the Apache web server and tools such as flood for HTTP load testing. 3.6.2. Cactus Cactus is a testing framework for testing server side Java code such as Servlets and EJBs. 3.6.3. JMeter This is a testing tool written in Java with a GUI frontend. It can be obtained at . 3.6.4. Lakta Lakta is an end-to-end HTTP testing tool 3.6.5. Watchdog The Watchdog project is a suite of validation sets for the Servlet and JavaServer Pages specification. 4. Where to find more information Additional Apache related resources 4.1. Websites The following are some useful websites · Apache Website · Apache Week · Apache modules directory · Apache today · Apache World · Slashdot Apache section 4.2. Books I maintain a list of books related to this document. It is not a comprehensive list, but rather I include only those books that I have personally found well-written and useful. 4.3. Support forums You can find the Apache users mailing list at . Similar lists exist for the rest of projects mentioned there. Make sure you read the Frequently Asked Questions document before posting . You can also get support in the newsgroup comp.infosystems.www.servers.unix at . If you want commercial support, consider contacting Covalent , which provides expert support for Apache (at a fee, of course). If you are using Apache on Linux, your Linux vendor may have support plans that include Apache. 5. Contacting the Author You can contact me at daniel @ rawbyte.com . I welcome suggestions and corrections, but please, please, do not send me messages asking me to troubleshoot your Apache installation. I just do not have the time to answer people individually. If you need support, please refer to the resources mentioned above. 5.1. Translations If you want to contribute a translation of this document you should use the SGML source. Check for info. Please drop me a note so I can make sure you get the most recent version. 6. Open Content Open Publication License Open Publication License Draft v1.0, 8 June 1999 (text version) 6.1. REQUIREMENTS ON BOTH UNMODIFIED AND MODIFIED VERSIONS The Open Publication works may be reproduced and distributed in whole or in part, in any medium physical or electronic, provided that the terms of this license are adhered to, and that this license or an incorporation of it by reference (with any options elected by the author(s) and/or publisher) is displayed in the reproduction. Proper form for an incorporation by reference is as follows: Copyright (c) by . This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, vX.Y or later (the latest version is presently available at http://www.opencontent.org/openpub/). The reference must be immediately followed with any options elected by the author(s) and/or publisher of the document (see section VI). Commercial redistribution of Open Publication-licensed material is permitted. Any publication in standard (paper) book form shall require the citation of the original publisher and author. The publisher and author's names shall appear on all outer surfaces of the book. On all outer surfaces of the book the original publisher's name shall be as large as the title of the work and cited as possessive with respect to the title. 6.2. COPYRIGHT The copyright to each Open Publication is owned by its author(s) or designee. 6.3. SCOPE OF LICENSE The following license terms apply to all Open Publication works, unless otherwise explicitly stated in the document. Mere aggregation of Open Publication works or a portion of an Open Publication work with other works or programs on the same media shall not cause this license to apply to those other works. The aggregate work shall contain a notice specifying the inclusion of the Open Publication material and appropriate copyright notice. SEVERABILITY. If any part of this license is found to be unenforceable in any jurisdiction, the remaining portions of the license remain in force. NO WARRANTY. Open Publication works are licensed and provided "as is" without warranty of any kind, express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose or a warranty of non-infringement. 6.4. REQUIREMENTS ON MODIFIED WORKS All modified versions of documents covered by this license, including translations, anthologies, compilations and partial documents, must meet the following requirements: · 1. The modified version must be labeled as such. · 2. The person making the modifications must be identified and the modifications dated. · 3. Acknowledgement of the original author and publisher if applicable must be retained according to normal academic citation practices. · 4. The location of the original unmodified document must be identified. · 5. The original author's (or authors') name(s) may not be used to assert or imply endorsement of the resulting document without the original author's (or authors') permission. 6.5. GOOD-PRACTICE RECOMMENDATIONS In addition to the requirements of this license, it is requested from and strongly recommended of redistributors that: · 1. If you are distributing Open Publication works on hardcopy or CD-ROM, you provide email notification to the authors of your intent to redistribute at least thirty days before your manuscript or media freeze, to give the authors time to provide updated documents. This notification should describe modifications, if any, made to the document. · 2. All substantive modifications (including deletions) be either clearly marked up in the document or else described in an attachment to the document. · 3. Finally, while it is not mandatory under this license, it is considered good form to offer a free copy of any hardcopy and CD- ROM expression of an Open Publication-licensed work to its author(s). 6.6. LICENSE OPTIONS The author(s) and/or publisher of an Open Publication-licensed document may elect certain options by appending language to the reference to or copy of the license. These options are considered part of the license instance and must be included with the license (or its incorporation by reference) in derived works. A. To prohibit distribution of substantively modified versions without the explicit permission of the author(s). "Substantive modification" is defined as a change to the semantic content of the document, and excludes mere changes in format or typographical corrections. To accomplish this, add the phrase `Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder.' to the license reference or copy. B. To prohibit any publication of this work or derivative works in whole or in part in standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from the copyright holder. To accomplish this, add the phrase 'Distribution of the work or derivative of the work in any standard (paper) book form is prohibited unless prior permission is obtained from the copyright holder.' to the license reference or copy. Linux Apache SSL PHP/FI frontpage mini-HOWTO Marcus Faure, marcus@faure.de v1.1, July 1998 This document is about building a multipurpose webserver that will support dynamic web content via the PHP/FI scripting language, secure transmission of data based on Netscape's SSL, secure execution of CGI's and M$ Frontpage Server Extensions ______________________________________________________________________ Table of Contents 1. Introduction 1.1 Description of the components 1.2 Working configurations 1.3 History 2. Component installation 2.1 Preparations 2.2 Adding PHP 2.3 Adding SSL 2.4 Adding frontpage 3. Putting it all together 3.1 Apache modules to try 3.2 Giving CGI's more security 3.3 Compiling and installing the server daemon 3.4 Adding frontpage support to a web 3.5 Starting the daemon 3.6 Some considerations left 3.7 Known bugs 3.8 The final word ______________________________________________________________________ 1. Introduction Before you start reading: I am not a native speaker, so there are probably spelling/grammatical errors in this document. Feel encouraged to inform me of mistakes. 1.1. Description of the components The webserver you hopefully will get after having read this howto is composed of several parts, the original apache sources with some (well, many) patches and some external executables. I recommend using the software versions I tried, they will probably compile without greater problems and result in a fairly stable daemon. If you are courageous, you can try to compile all the latest-stuff-with-tons-of- new-features, but don't blame me if something fails ;-). However, you may report other working configurations to be included in future versions of this document. All of the steps were tested on a linux 2.0.35 box, so the howto is somewhat linux-specific, but you should be able to use it for other unixes as well. You do not necesserily have to compile in all components. I tried to structure this howto so that you can skip the parts you are not interested in. The document is neither a user manual to Apache, SSL, PHP/FI nor frontpage. Its prime intention is to save webservice providers some headaches when installing their server and to do my little contribution to the linux community. PHP is a scripting language that supports dynamic HTML pages. It is a bit like Apache's SSI, but by far more complex and has database modules for many popular dbs. The GD libraries are needed by PHP. SSL is an implementation of Netscape's Secure Socket Layer that allow secure connections over insecure networks, e.g. to transmit credit card numbers to web based forms. frontpage is a wysiwyg web authoring tool that makes use of some server-specific extensions called webbots. Some people think frontpage is cool because you can create feedback forms and discussion webs without having to know a bit about html or cgi. It even protects the designer from uploading his/her site via ftp by using a builtin publisher. If you wish to support frontpage but do not like to setup a windows server, the apache server extensions are your choice. 1.2. Working configurations Though this document has been downloaded some 100 times since I published it, I received only little feedback. In particular, noone told me of other working combinations. Combinations that work for me are: · Linux 2.0.31, Apache 1.2.4, PHP 2.0.0, SSL 0.8.0, fp 98 3.0.3 (*) · Linux 2.0.33, Apache 1.2.5, PHP 2.0.1, SSL 0.8.0, fp 98 3.0.3 (*) · Linux 2.0.35, Apache 1.2.6, PHP 3, SSL 0.8.0, fp 98 3.0.4 (*) version 3.0.3 is ``not recommended'' 1.3. History v0.0/Apr 98: Preview version v1.0/Jun 98: Now using Apache 1.2.6, updated fp section, minor corrections v1.1/Jul 98: Sgmlized and restructered version You can find the latest version of this document at 2. Component installation 2.1. Preparations You will need: · Apache 1.2.6 · PHP/FI Extensions · GD Library · SSL 0.8.0 · SSL patch for Apache 1.2.6 · frontpage 98 server extensions and install script Get the sources you want. Untar apche, php, gd and ssl to /usr/src. Untar the SSL patch to /usr/src/apache_1.2.6. 2.2. Adding PHP cd to /usr/src/gd1.2 and type make. This will build the GD library libgd.a, that should be copied to /usr/lib. Now cd to php-2.0.1 and run ./install. The relevant questions are: Would you like to compile PHP/FI as an Apache module? [yN] y Are you compiling for an Apache 1.1 or later server? [Yn] y Are you using Apache-Stronghold? [yN] y Does your Apache server support ELF dynamic loading? [yN] y Apache include directory (which has httpd.h)? [/usr/local/include/apache] /usr/src/apache_1.2.6/src Would you like to build an ELF shared library? [yN] y Additional directories to search for .h files []: /usr/src/gd1.2 Would you like the bundled regex library? [yN] n Like the frontpage extensions, phtml includes a security problem because it is run under the uid of the webserver. Be sure to turn on safe mode in src/php.h and restrict the search path to a save value. There are some other options in php.h you may want to edit. If you are very concerned about security, compile php as a cgi. However, this will be a performance loss and not as smart as the module version. Type make to build all files. When the compilation is done, copy mod_php.* and libphp.a to /usr/src/apache_1.2.6/src Add a line Module php_module mod_php.o to the end of /usr/src/apache_1.2.6/src/Configuration, add -lphp -lm -lgdbm -lgd to the EXTRA_LIBS in the same file, application/x-httpd-php phtml to Apache's mime.types and AddType application/x-httpd-php .phtml to Apache's srm.conf. You may also want to add index.phtml to DirectoryIndex in that file so that a file index.phtml is automatically loaded when its directory is requested. 2.3. Adding SSL cd /usr/src/SSL-0.8.0; ./Configure linux-elf; make; make rehash This will create libraries needed by apache. You may issue make test to verify the compilation. You have to apply a patch to apache. It is important that you apply it before the frontpage patch, otherwise frontpage will not work. cd to /usr/src/apache_1.2.6/src and issue patch < /usr/src/apache_1.2.6/SSLpatch. Set SSL_BASE=/usr/src/SSLeay-0.8.0 in Configuration. Make sure that Module proxy_module is disabled otherwise Apache won't compile. If you are in need of a proxy, go for Squid http://squid.nlanr.net/ Now make certificate to generate SSLconf/conf/httpsd.pem. 2.4. Adding frontpage Rename the fp30.linux.tar.Z file to fp30.linux.tar.gz, otherwise the install script will not find it. Run ./fp_install to copy the extension files to /usr/local/frontpage. zcat can usually be invoked as /usr/bin/zcat. You now have to apply the FP patch. cd to /usr/src/apache_1.2.6/src and type patch < /usr/src/frontpage/version3.0/apache-fp/fp-patch- apache_1.2.5 This will create the mod_frontpage.* files and do some modifications to Configuration etc. The 1.2.5 patch will work with both apache 1.2.5 and 1.2.6. Skip the part about installing webs, you can do that later 3. Putting it all together 3.1. Apache modules to try The modules I use besides SSL, PHP and frontpage are: Module env_module mod_env.o Module config_log_module mod_log_config.o Module mime_module mod_mime.o Module negotiation_module mod_negotiation.o Module dir_module mod_dir.o Module cgi_module mod_cgi.o Module asis_module mod_asis.o Module imap_module mod_imap.o Module action_module mod_actions.o Module alias_module mod_alias.o Module rewrite_module mod_rewrite.o Module access_module mod_access.o Module auth_module mod_auth.o Module anon_auth_module mod_auth_anon.o Module digest_module mod_digest.o Module expires_module mod_expires.o Module headers_module mod_headers.o Module browser_module mod_browser.o 3.2. Giving CGI's more security If you are an ISP (you probably are when you read this) you will want to improve security. The suexec utility allows you to do so; it will execute cgi's under the UID of the webowner instead of executing it under the webservers UID. Go to /usr/src/apache_1.2.6/support and make suexec. chmod 4711 suxec and copy it to the location specified in ../src/httpd.h which is /usr/local/etc/httpd/sbin/suexec by default. If the path seems a little cryptic to you - it did to me - edit httpd.h and set the path to a more comfortable value. 3.3. Compiling and installing the server daemon Enter /usr/src/apache_1.2.6/src and edit Configuration to set all the Modules you want to include in your Apache daemon. When done, run ./Configure and make. This is the last (and most complicated) compilation step, so cross your fingers. If it succeeds, cp httpsd to /usr/sbin. The daemon is somewhat big, consider this when assembling your webserver. Create the directory /var/httpd with subdirectories cgi-bin, conf, htdocs, icons, virt1, virt2 and logs. In /usr/src/apache_1.2.6/conf edit access.conf-dist, mime.types and srm.conf-dist to suit your needs and copy them to var/httpd/conf/access.conf, srm.conf and mime.types. Copy the httpsd.pem you created with make certificate to /var/httpd/conf. Use the following httpd.conf: ServerType standalone Port 80 Listen 80 Listen 443 User wwwrun Group wwwrun ServerAdmin webmaster@yourhost.com ServerRoot /var/httpd ErrorLog logs/error_log TransferLog logs/access_log PidFile logs/httpd.pid ServerName www.yourhost.com MinSpareServers 3 MaxSpareServers 20 StartServers 3 SSLCACertificatePath /var/httpd/conf SSLCACertificateFile /var/httpd/conf/httpsd.pem SSLCertificateFile /var/httpd/conf/httpsd.pem SSLLogFile /var/httpd/logs/ssl.log SSLDisable ServerAdmin webmaster@virt1.com DocumentRoot /var/httpd/virt1 ScriptAlias /cgi-bin/ /var/httpd/virt1/cgi-bin/ ServerName www.virt1.com ErrorLog logs/virt1-error.log TransferLog logs/virt1-access.log User virt1admin Group users ServerAdmin webmaster@virt1.com DocumentRoot /var/httpd/virt1 ScriptAlias /cgi-bin/ /var/httpd/virt1/cgi-bin/ ServerName www.virt1.com ErrorLog logs/virt1-ssl-error.log TransferLog logs/virt1-ssl-access.log User virt1admin Group users SSLCACertificatePath /var/httpd/conf SSLCACertificateFile /var/httpd/conf/httpsd.pem SSLCertificateFile /var/httpd/conf/httpsd.pem SSLLogFile /var/httpd/logs/virt1-ssl.log SSLVerifyClient 0 SSLFakeBasicAuth SSLDisable ServerAdmin webmaster@virt2.com DocumentRoot /var/httpd/virt2 ScriptAlias /cgi-bin/ /var/httpd/virt2/cgi-bin/ ServerName www.virt2.com ErrorLog logs/virt2-error.log TransferLog logs/virt2-access.log Depending on the modules compiled in, not all directives may be available. You can retrieve a list of available directives with httpsd -h. 3.4. Adding frontpage support to a web Enter /usr/local/frontpage/version3.0/bin and load ./fpsrvadm. Choose install and apache-fp. The next questions should be answered the following way: Enter server config filename: /var/httpd/conf/httpd.conf Enter host name for multi-hosting []: www.virt2.com Starting install, port: www.virt2.com:80, web: "" Enter user's name []: virt2admin Enter user's password: Confirm password: Creating root web Recalculate links for root web Install completed. The user name must be the unix login of the webowner. The password does not necessarily have to match the system password. You have to manually add sendmailcommand:/usr/sbin/sendmail %r to /usr/local/frontpage/www.virt2.com:80.conf, otherwise your users will not be able to send web-generated eMails. kill -HUP your httpsd to make fp reread its config. You can now access www.virt2.com with your frontpage client. Under some circumstances fpsrvadm complaints that a root web has to be installed first. This is pretty useless, but you should do so to silence fpsrvadm. 3.5. Starting the daemon Start Apache with httpsd -f /var/httpd/conf/httpd.conf. You can now access www.virt1.com both through http and https which is pretty cool. Of course you have to pay for a real certificate if you want to offer webwide SSL or users might laugh at you. Copy one of the demo files from the php examples directory to virt1 to test phtml. 3.6. Some considerations left Do not use frontpage 97 extensions. They do not work, at least under Linux. When installing specific versions of the c++ libraries, they appear to work but your logs will soon fill with premature end of script headers and your mailbox will fill with complaints. Do not use frontpage 98 extensions before version 3.0.2.1330. Do not be confused, version numbers are somewhat inheterogenous. When telnetting to port 80, typing "get / http/1.0" and hitting return twice, you get a version number 3.0.4 for frontpage. You can find out the more specific version number by executing /usr/local/frontpage/currentversion/exes/_vti_bin/shtml.exe -version. Older versions have a nasty bug that requires httpd.conf to be writable by the gid of the webserver. This should make you scream if you are at all concerned about security. Versions since 3.0.2.1330 are more usable. 3.7. Known bugs When touching Recalculate Links in the frontpage client, the server starts a process that consumes 99% cpu cycles and some 10 mb of memory. But even for medium-sized webs and fast machines, the client sometimes recieves a timeout message, though the calculation will be finished correctly. Inform frontpage users to be patient and not to hit Recalculate Links several times. Inform yourself to equip the server with at least 64MB. Please note that at the time of writing both SSL and frontpage work, but not at the same time, that means you can neither publish your web using ssl nor make use of the webbots through https. You can publish your web on port 80 and access it encrypted on port 443, but your counters etc. will be broken. I consider this a bug. This problem shall be fixed in SSL 0.9.0. 3.8. The final word For those who think the title of this howto is nearly as long as the document: Did you ever listened to Meat Loaf? O.K. readers, you're done for today. Feel free to send me your feedback, eternal gratitude, flowers, ecash, cars, oil sources etc. Apache based WebDAV Server with LDAP and SSL Saqib Ali             [http://www.xml-dev.com] Offshore XML/XHTML Development Revision History Revision v4.1.2 2003-10-17 Revised by: sa Added the SSL performance tuning section. Revision v4.1.1 2003-09-29 Revised by: sa Updated the SSL section based on the feedback received from readers. Revision v4.1.0 2003-09-02 Revised by: sa Updated the SSL section based on the feedback received from readers. Revision v4.0.2 2003-08-01 Revised by: sa Minor updates to the Apache configure cmd line. /dev/random referenced in the SSL section. Revision v4.0.1 2003-07-27 Revised by: sa Added more information to the SSL section. Revision v4.0 2003-06-29 Revised by: sa Updated the HOWTO for Apache 2.0. Also the source is in XML .This document is an HOWTO on installing a Apache based WebDAV server with LDAP for authentication and SSL encryption. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. About this document 1.2. Contributions to the document 1.3. What is Apache? 1.4. What is WebDAV? 1.5. What is PHP? 1.6. What is mySQL? 1.7. What do we need? 1.8. Assumptions 2. Requirements 2.1. Basics 2.2. Apache 2.0.46 2.3. OpenSSL 2.4. iPlanet LDAP Library 2.5. mod_auth_ldap 2.6. mySQL DB Engine 2.7. PHP 3. Installation 3.1. Pre-requisites 3.2. mySQL 3.3. Apache 2.0 3.4. mod_auth_ldap 3.5. CERT DB for LDAPS:// 3.6. PHP 4. Configuring and Setting up the WebDAV services 4.1. Modifications to the /usr/local/apache/conf/httpd.conf 4.2. Creating a directory for DAVLockDB 4.3. Enabling DAV 4.4. Create a Directory called DAVtest 4.5. Restart Apache 4.6. WebDAV server protocol compliance testing 5. WebDAV server management 5.1. Restricting access to DAV shares 5.2. Restricting write access to DAV shares 6. Implementing and using SSL to secure HTTP traffic 6.1. Introduction to SSL 6.2. Test Certificates 6.3. Certificates for Production use 6.4. How to generate a CSR 6.5. Installing Server Private Key, and Server Certificate 6.6. Removing passphrase from the RSA Private Key 6.7. SSL Performance Tuning A. HTTP/HTTPS Benchmarking tools B. Hardware based SSL encryption solutions C. Certificate Authorities Glossary of PKI Terms 1. Introduction The Objective of this document in to Setup a Apache + mySQL + PHP + WebDAV based Web Application Server, that uses LDAP for Authentication. The documentation will also provide details on the encrypting LDAP transactions. Note Note:   If you encounter any problems installing Apache or any of the modules please feel free to contact me @ ----------------------------------------------------------------------------- 1.1. About this document This document was originally written in 2001. Since then many updates and new additions have been made. Thanks to all the people who submitted updates and corrections. The XML source of this document is available at [http://www.xml-dev.com:8080/ cocoon/mount/docbook/Apache-WebDAV-LDAP-HOWTO.xml] http://www.xml-dev.com: 8080/cocoon/mount/docbook/Apache-WebDAV-LDAP-HOWTO.xml. The latest version of the document is available at [http://www.xml-dev.com: 8080/cocoon/mount/docbook/Apache-WebDAV-LDAP-HOWTO.html] http:// www.xml-dev.com:8080/cocoon/mount/docbook/Apache-WebDAV-LDAP-HOWTO.html. ----------------------------------------------------------------------------- 1.2. Contributions to the document If you like to contribute to the HOWTO, you can d/l the XML source from [http://www.xml-dev.com:8080/cocoon/mount/docbook/ Apache-WebDAV-LDAP-HOWTO.xml] http://www.xml-dev.com:8080/cocoon/mount/ docbook/Apache-WebDAV-LDAP-HOWTO.xml , and send in the updated source to saqib@seagate.com ALONG WITH YOUR NAME IN THE LIST OF AUTHORS AND REVISION HISTORY :). That makes it easier for me contact the person if there are any updates/corrections. Thanks. ----------------------------------------------------------------------------- 1.3. What is Apache? The Apache HTTP Server is an open-source HTTP server for modern operating systems including UNIX and Windows NT. It provides HTTP services in sync with the current HTTP standards. Thei Apache WebServer is available for free download from [http:// httpd.apache.org/] http://httpd.apache.org/ ----------------------------------------------------------------------------- 1.4. What is WebDAV? WebDAV stands for Web enabled Distributed Authoring and Versioning. It provides a collaborative environment for users to edit/manage files on web-servers. Technically DAV is an extension to the http protocol. Here is a brief description of the extensions provided by DAV: Overwrite Protection: Lock and Unlock mechanism to prevent the "lost update problem". DAV protocol support both shared and exclusive locks. Properties: Metadata (title, subject, creater, etc) Name-space management: Copy, Rename, Move and Deletion of files Access Control: Limit access to various resources. Currently DAV assumes access control is already in place, and does not provide strong authentication mechanism. Versioning: Revision control for the documents. Versioning is not implemented yet. ----------------------------------------------------------------------------- 1.5. What is PHP? PHP (recursive acronym for "PHP: Hypertext Preprocessor") is a widely-used Open Source general-purpose scripting language that is especially suited for Web development and can be embedded into HTML. PHP is available from [http://www.php.net] http://www.php.net ----------------------------------------------------------------------------- 1.6. What is mySQL? MySQL, the most popular Open Source SQL database, is developed, distributed, and supported by MySQL AB mySQL DB Engine can be downloaded from [http://www.mysql.com/] http:// www.mysql.com/ ----------------------------------------------------------------------------- 1.7. What do we need? The tools needed to achieve this objective are: i. C Compiler e.g. GCC ii. Apache 2 Web Server iii. LDAP Module for Apache iv. iPlanet LDAP lib files v. SSL engine vi. PHP vii. mySQL DB Engine Note Note:   All of these packages are free and are available for download on the net. ----------------------------------------------------------------------------- 1.8. Assumptions This document assumes that you have the following already installed on your system. i. gzip or gunzip - available from [http://www.gnu.org] http://www.gnu.org ii. gcc and GNU make - available from [http://www.gnu.org] http://www.gnu.org ----------------------------------------------------------------------------- 2. Requirements You'll have to download and compile several packages. This document will explain the compilation process, but you should be fimiliar with installing from source code. ----------------------------------------------------------------------------- 2.1. Basics You will need a machine running Solaris / Linux and GCC Compiler. GNU gnzip and GNU tar is also needed. ----------------------------------------------------------------------------- 2.2. Apache 2.0.46 Apache is the HTTP server, it will be used to run the Web Application Server. Please download the Apache 2.0.46 source code from [http://www.apache.org/ dist/httpd/] http://www.apache.org/dist/httpd/. ----------------------------------------------------------------------------- 2.3. OpenSSL You will need to download the OpenSSL from [http://www.openssl.org/source/] http://www.openssl.org/source/ . Please download the latest version. OpenSSL installation will be used for SSL libraries for compiling mod_ssl with Apache, and for managing SSL certificates on the WebServer. Please download the OpenSSL source code gzipped file into /tmp/downloads ----------------------------------------------------------------------------- 2.4. iPlanet LDAP Library Download the iPlanet LDAP SDK from [http://wwws.sun.com/software/download/ products/3ec28dbd.html] http://wwws.sun.com/software/download/products/ 3ec28dbd.html. We will use iPlanet LDAP SDK, because it includes libraries for ldaps:// (LDAP over SSL) ----------------------------------------------------------------------------- 2.5. mod_auth_ldap mod_auth_ldap will be used for compiling LDAP support into Apache. Please download mod_auth_ldap from [http://www.muquit.com/muquit/software/ mod_auth_ldap/mod_auth_ldap_apache2.html] http://www.muquit.com/muquit/ software/mod_auth_ldap/mod_auth_ldap_apache2.html ----------------------------------------------------------------------------- 2.6. mySQL DB Engine Download the appropriate mySQL build for your platform from [http:// www.mysql.com/downloads/index.html] http://www.mysql.com/downloads/index.html ----------------------------------------------------------------------------- 2.7. PHP Download the PHP source code from [http://www.php.net/downloads.php] http:// www.php.net/downloads.php ----------------------------------------------------------------------------- 3. Installation First we hve take care of the few pre-requisites, and then we will get into the main installtion. ----------------------------------------------------------------------------- 3.1. Pre-requisites The application server as we plan to install, requires the SSL libraries and LDAP libraries. SSL engine is also required for managing the SSL certs for Apache 2.x ----------------------------------------------------------------------------- 3.1.1. iPlanet LDAP SDK Become root by using the su command: $ su - Create the /usr/local/iplanet-ldap-sdk.5 directory. Copy the ldapcsdk5.08-Linux2.2_x86_glibc_PTH_OPT.OBJ.tar.gz form /tmp/downloads to / usr/local/iplanet-ldap-sdk.5 directory. # mkdir /usr/local/iplanet-ldap-sdk.5 # cp /tmp/downloads/ldapcsdk5.08-Linux2.2_x86_glibc_PTH_OPT.OBJ.tar /usr/local/iplanet-ldap-sdk.5 # cd /usr/local/iplanet-ldap-sdk.5 # tar -xvf ldapcsdk5.08-Linux2.2_x86_glibc_PTH_OPT.OBJ.tar Now you should have all the required iPlanet LDAP lib files in the correct directory ----------------------------------------------------------------------------- 3.1.2. OpenSSL Engine Next we need to install the OpenSSL Engine OpenSSL is an open source implementation of the SSL/TLS protocol. It is required to create and manage SSL certificates on the webserver. The installion is also necessary for the lib files that will be used by the SSL module for apache. Change to the directory where you placed the OpenSSL source code files # cd /tmp/download # gzip -d openssl.x.x.tar.gz # tar -xvf openssl.x.x.tar # cd openssl.x.x # make # make test # make install Upon successful completion of the make install the openssl binaries should reside in /usr/local/ssl ----------------------------------------------------------------------------- 3.2. mySQL Installaing mySQL is quite simple. The downloaded binaries have to be place in appropriate directory. We start creating a user:group for mysql daemon, and copying the files to appropriate directories. # groupadd mysql # useradd -g mysql mysql # cd /usr/local # gunzip < /path/to/mysql-VERSION-OS.tar.gz | tar xvf - # ln -s full-path-to-mysql-VERSION-OS mysql Next run the install_db script, and change permission on the files # cd mysql # scripts/mysql_install_db # chown -R mysql . ----------------------------------------------------------------------------- 3.2.1. Starting mySQL Now start the mySQL server to verify the installation # bin/mysqld_safe --user=mysql & Verify mySQL daemon is running, by using the ps -ef command. You should see the following output: # ps -ef | grep mysql root 3237 1 0 May29 ? 00:00:00 /bin/sh bin/safe_mysqld mysql 3256 3237 0 May29 ? 00:06:58 /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --user=mysql --pid-file=/usr/local/mysql/data/downloa ----------------------------------------------------------------------------- 3.2.2. Stopping mySQL To stop the MySQL server, follow the instructions below # cd /usr/local/mysql # ./bin/mysqladmin -u root -p shutdown ----------------------------------------------------------------------------- 3.2.3. Locating Data Directory mySQL deamon stores all the information in a direcory called "Data Directory". If you followed the installation instructions above, your Data Directory should be located under /use/local/mysql/data. To locate where your Data Directory is located, use the mysqladmin utility as follows: # /usr/local/mysql/bin/mysqladmin variables -u root --password={your_password} | grep datadir ----------------------------------------------------------------------------- 3.3. Apache 2.0 Start by setting some FLAGS for the compiler # export LDFLAGS="-L/usr/local/iplanet-ldap-sdk.5/lib/ -R/usr/local/iplanet-ldap-sdk.5/lib/:/usr/local/lib" # export CPPFLAGS="-I/usr/local/iplanet-ldap-sdk.5/include" Next UNTAR the apache 2.0 source files, and execute the configure script. # cd /tmp/download # gzip -d httpd-2.0.46.tar.gz # tar -xvf httpd-2.0.46.tar # cd httpd-2.0.46 #./configure --enable-so --with-ssl --enable-ssl --enable-rewrite --enable-dav Next run the make command # make # make install ----------------------------------------------------------------------------- 3.3.1. Starting Apache # /usr/local/apache2/bin/apachectl start ----------------------------------------------------------------------------- 3.3.2. Stopping Apache # /usr/local/apache2/bin/apachectl stop ----------------------------------------------------------------------------- 3.4. mod_auth_ldap Untar modauthldap_apache2.tar.gz cd /tmp/download # gzip -d modauthldap_apache2.tar.gz # tar -xvf modauthldap_apache2.tar # cd modauthldap_apache2 Now configure and install mod_auth_ldap # ./configure --with-apxs=/usr/local/apache2/bin/apxs --with-ldap-dir=/usr/local/iplanet-ldap-sdk.5/ # make # make install ----------------------------------------------------------------------------- 3.5. CERT DB for LDAPS:// You will also need to get the cert7.db and key7.db from [http:// www.xml-dev.com/xml/key3.db] http://www.xml-dev.com/xml/key3.db and [http:// www.xml-dev.com/xml/cert7.db] http://www.xml-dev.com/xml/cert7.db and place it in the /usr/local/apache2/sslcert/directory. ----------------------------------------------------------------------------- 3.6. PHP Unzip the PHP Source Files gzip -d php-xxx.tar.gz tar -xvf php-xxx.tar Configure and run the make command cd php-xxx ./configure --with-mysql --with-apxs=/usr/local/apache2/bin/apxs Compile the source code # make # make install Copy the php.ini file to the appropriate directory cp php.ini-dist /usr/local/lib/php.ini ----------------------------------------------------------------------------- 4. Configuring and Setting up the WebDAV services Now for the easy part. In this section we will WebDAV enable a directory under Apache root. ----------------------------------------------------------------------------- 4.1. Modifications to the /usr/local/apache/conf/httpd.conf Please verify that the following Apache directive appears in the /usr/local/ apache/conf/httpd.conf : Addmodule mod_dav.c If it does not please add it. This directive informs Apache about DAV capability. The directive must be placed outside any container. Next we must specify where Apache should store the DAVLockDB file. DAVLockDB is a lock database for the WebDAV. This directory should be writable by the httpd process. I store the DAVLock file under /usr/local/apache/var. I use this directory for other purposes as well. Please add the following line to your /usr/local/ apache/conf/httpd.conf to specify that the DAVLockDB file will be under /usr/ local/apache/var : DAVLockDB /usr/local/apache/var/DAVLock The directive must be placed outside any container. ----------------------------------------------------------------------------- 4.2. Creating a directory for DAVLockDB As mentioned above a directory must be created for DAVLockDB that can be written by the web server process. Usually web server process runs under the user 'nobody' . Please verify this for your system using the command: ps -ef | grep httpd Under /usr/local/apache create the directory and set the permissions on it using the following commands: # cd /usr/local/apache # mkdir var # chmod -R 755 var/ # chown -R nobody var/ # chgrp -R nobody var/ ----------------------------------------------------------------------------- 4.3. Enabling DAV Enabling DAV is a trivial task. To enable DAV for a directory under Apache root, just add the following directive in the container for that particular directory: DAV On This directive will enable DAV for the directory and its sub-directories. The following is a sample configuration that will enable WebDAV and LDAP authentication on /usr/local/apache/htdocs/DAVtest. Place this in the /usr/ local/apache/conf/httpd.conf file. DavLockDB /tmp/DavLock Options Indexes FollowSymLinks AllowOverride None order allow,deny allow from all AuthName "SMA Development server" AuthType Basic LDAP_Debug On #LDAP_Protocol_Version 3 #LDAP_Deref NEVER #LDAP_StartTLS On LDAP_Server you.ldap.server.com #LDAP_Port 389 # If SSL is on, must specify the LDAP SSL port, usually 636 LDAP_Port 636 LDAP_CertDbDir /usr/local/apache2/sslcert Base_DN "o=SDS" UID_Attr uid DAV On #require valid-user require valid-user #require roomnumber "123 Center Building" #require filter "(&(telephonenumber=1234)(roomnumber=123))" #require group cn=rcs,ou=Groups ----------------------------------------------------------------------------- 4.4. Create a Directory called DAVtest As mentioned in a earlier section, all DAV directories have to be writable by the WebServer process. In this example we assume WebServer is running under username 'nobody'. This is usually the case. To check httpd is running under what user, please use: # ps -ef | grep httpd Create a test directory called 'DAVtest' under /usr/local/apache2/htdocs : # mkdir /usr/local/apache/htdocs/DAVtest Change the permissions on the directory to make it is read-writable by the httpd process. Assuming the httpd is running under username 'nobody', use the following commands: # cd /usr/local/apache/htdocs # chmod -R 755 DAVtest/ # chown -R nobody DAVtest/ # chgrp -R nobody DAVtest/ ----------------------------------------------------------------------------- 4.5. Restart Apache Finally you must run the configuration test routine that comes with Apache to verify the syntax in httpd.conf : # /usr/local/apache/bin/apachectl configtest If you get error messages please verify that you followed all of the above mentioned steps correctly. If you can not figure out the error message feel free to email me with the error message ([mailto:saqib@seagate.com] saqib@seagate.com). If the configtest is successful start the apache web-server: # /usr/local/apache/bin/apachectl restart Now you have WebDAV enabled Apache Server with LDAP authentication and SSL encryption. ----------------------------------------------------------------------------- 4.6. WebDAV server protocol compliance testing It is very important that the WebDAV that we just implemented be fully complaint with the WebDAV-2 protocol. If it is not fully compatible, the client side WebDAV applications will not function properly. To test the complaince we will use a tool called Litmus. Litmus is a WebDAV server protocol compliance test suite, which aims to test whether a server is compliant with the WebDAV protocol as specified in RFC2518. Please download the Litmus source code from [http://www.webdav.org/neon/ litmus/] http://www.webdav.org/neon/litmus/ and place it in the /tmp/ downloads directory. Then use gzip and tar to extract the files: # cd /tmp/downloads # gzip -d litmus-0.6.x.tar.gz # tar -xvf litmus-0.6.x.tar # cd litmus-0.6.x Compiling and installing Litmus is easy: # ./configure # make # make install make install will install the Litmus binary files under /usr/local/bin and the help files under /usr/local/man To the test the complaince of the WebDAV server that you just installed, please use the following command # /usr/local/bin/litmus http://you.dav.server/DAVtest userid passwd ----------------------------------------------------------------------------- 5. WebDAV server management In this section we will discuss about the various management task - e.g. using LDAP for access control, and working with DAV method on Apache Most of the configuration changes for the DAV will have to done using the httpd.conf file. This file is located at /usr/local/apache/conf/httpd.conf httpd.conf is a text based configuration file that Apache uses. It can b editted using any text editor - I preffer using vi. Please make backup copy of this file, before changing it. After making changes to the httpd.conf the Apache server has to be restarted using the /usr/local/apache/bin/apachectl restart command. However before restarting you test for the validity of the httpd.conf by using the /usr/ local/apache/bin/apachectl configtest comand. ----------------------------------------------------------------------------- 5.1. Restricting access to DAV shares In the previous section when we created the DAVtest share, we used the LDAP for authentication purposes. However anyone who can authenticates using their LDAP useri/passwd will be able to access that folder. Using the require directive in the httpd.conf file, we can limit access to certain individuals or groups of individuals. If we look at the DAVtest configuration from the previosu section: Dav On #Options Indexes FollowSymLinks AllowOverride None order allow,deny allow from all AuthName "LDAP_userid_password_required" AuthType Basic Require valid-user LDAP_Server ldap.server.com LDAP_Port 389 Base_DN "o=ROOT" UID_Attr uid We see that the require is set to valid-user. Which means any valid authenticated user has access to this folder. ----------------------------------------------------------------------------- 5.1.1. Restricting access based on Individual UID(s) LDAP UID can be used to restrict access to DAV folder. require valid-user directive can be changed to require user 334455 445566 This will restrict access to individuals with UID 334455 and 445566. Anyone else will not be able to access this folder. ----------------------------------------------------------------------------- 5.1.2. Restricting access based on groups of individuals. require can also be used to restrict access to groups of individuals. This can be either done using LDAP groups or LDAP filters. The filter must be valid LDAP filter syntax. ----------------------------------------------------------------------------- 5.2. Restricting write access to DAV shares It maybe be required that the editting for the resources on the DAV shares be restricted to certain individual, however anyone can view the resources. This can be easily done using the tags in the httpd.conf file Dav On #Options Indexes FollowSymLinks AllowOverride None order allow,deny allow from all AuthName "LDAP_userid_password_required" AuthType Basic Require valid-user LDAP_Server ldap.server.com LDAP_Port 389 Base_DN "o=ROOT" UID_Attr uid You restrict write access to certain individuals by changing the to Require 334455 Basically we are limiting the PUT POST DELETE PROPPATH MKCOL COPY MOVE LOCK and UNLOCK to an individual who has the UID of 334455. Everone else will be able to use the methods GET and PROPFIND on the resources, but not any other method. ----------------------------------------------------------------------------- 6. Implementing and using SSL to secure HTTP traffic Security of the data stored on a file server is very important these days. Compromised data can cost thousands of dollars to company. In the last section, we compiled LDAP authentication module into the Apache build to provide a Authentication mechanism. However HTTP traffic is very insecure, and all data is transferred in clear text - meaning, the LDAP authentication (userid/passwd) will be transmitted as clear text as well. This creates a problem. Anyone can sniff these userid/passwd and gain access to DAV store. To prevent this we have to encrypt HTTP traffic, essentially HTTP + SSL or HTTPS. Anything transferred over HTTPS is encrypted, so the LDAP userid/ passwd can not be easily deciphered. HTTPS runs on port 443. The resulting build from the last section's compilation process will have Apache to listen to both port 80 (normal HTTP) and 443 (HTTPS). If you are just going to use this server for DAV, then I will highly suggest that you close port 80. In this section of the HOWTO I will provide some information regarding SSL and maintaining SSL on a Apache HTTP server. ----------------------------------------------------------------------------- 6.1. Introduction to SSL SSL (Secure Socket Layer) is a protocol layer that exists between the Network Layer and Application layer. As the name suggest SSL provides a mechanism for encrypting all kinds of traffic - LDAP, POP, IMAP and most importantly HTTP. The following is a over-simplified structure of the layers involved in SSL. +-------------------------------------------+ | LDAP | HTTP | POP | IMAP | +-------------------------------------------+ | SSL | +-------------------------------------------+ | Network Layer | +-------------------------------------------+ ----------------------------------------------------------------------------- 6.1.1. Encryption algorithms used in SSL There are three kinds of cryptographic techniques used in SSL: Public-Private Key, Symmetric Key, and Digital Signature. Public-Private Key Crytography - Initiating SSL connection: In this algorithm, encryption and decryption is performed using a pair of private and public keys. The Web-server holds the private Key, and sends the Public key to the client in the Certificate. 1. The client request content from the Web Server using HTTPS. 2. The web server responds with a Digital Certificate which includes the server's public key. 3. The client checks to see if the certificate has expired. 4. Then the client checks if the Certificate Authority that signed the certificate, is a trusted authority listed in the browser. This explains why we need to get a certificate from a a trusted CA. 5. The client then checks to see if the Fully Qualified Domain Name (FQDN) of the web server matches the Comman Name (CN) on the certificate? 6. If everything is successful the SSL connection is initiated. Note Note:   Anything encrypted with Private Key can only be decrypted by using the Public Key. Similarly anything encrypted using the Public Key can only be decrypted using the Private Key. There is a common mis-conception that only the Public Key is used for encryption and Private Key is used for decryption. This is not case. Any key can be used for encryption/ decryption. However if one key is used for encryption then the other key must be used for decryption. e.g. A message can not encrypted and then decrypted using only the Public Key. Using Private Key to encrypt and a Public Key to decrypt ensures the integrity of the sender (owner of the Private Key) to the recipients. Using Public Key to encrypt and a Private Key to decrypt ensures that only the inteded recipient (owner of the Private Key) will have access to the data.(i.e. only the person who holds the Private Key will be able to decipher the message). Symmetric Cryptography - Actual transmission of data: After the SSL connection has been established, Symmetric cryptography is used for encrypting data as it uses less CPU cycles. In symmetric cryptography the data can be encrypted and decrypted using the same key. The Key for symmetric cryptography is exchanged during the initiation process, using Public Key Cryptography. Message Digest The server uses message digest algoritm such as HMAC, SHA-1, MD5 to verify the integrity of the transferred data. ----------------------------------------------------------------------------- 6.1.2. Ensuring Authenticity and Integrity Encryption Process Sender's Receiver's PrivateKey PublicKey ,-. ,-. ( ).......... ( ).......... `-' ''''|'|'|| `-' ''''''''|| | | | | | | .----------. | | .----------. | .----------. | | V | | | V | | |Clear Text|--------->|CipherText|--------->|CipherText| | | Step1 | 1 | Step2 | 2 |\ `----------' | `----------' `----------' \ __ | | \ [_' | | step5 \ | |Step3 | __ --|-- | | _.--' | V | _..-'' / \ .---------. | .---------. _..-'' Receiver | SHA 1 | V | Digital | _..-'' |MsgDigest|--------->|Signature|' _ `---------' Step4 `---------' _ (_) _____ ____ ____ ____ _ _ ____ _| |_ _ ___ ____ | ___ | _ \ / ___)/ ___) | | | _ (_ _) |/ _ \| _ \ | ____| | | ( (___| | | |_| | |_| || |_| | |_| | | | | |_____)_| |_|\____)_| \__ | __/ \__)_|\___/|_| |_| (____/|_|   * Step1: In this step the Original "Clear Text" message is encrypted using the Sender's Private Key, which results in Cipher Text 1. This ensures the Authenticity of the sender.   * Step2: In this step the "CipherText 1" is encrypted using Receiver's Public Key resulting in "CipherText 2". This will ensure the Authenticity of the Receiver i.e. only the Receiver can decipher the Messsage using his Private Key.   * Step3: Here the SHA1 Message Digest of the "Clear Text" is created.   * Step4: SHA1 Message Digest is then encrypted using Sender's Private Key resulting in the Digital Signature of the "ClearText". This Digital Signature can be used by the receiver to ensure the Integrity of the message and authenticity of the Sender.   * Step5: The "Digital Signature" and the "CipherText 2" are then send to the Receiver. Decryption Process Receiver's Sender's PrivateKey PublicKey ,-. ,-. ( ).......... ( ).......... `-' ''''''''|| `-' '''''''||| | | | | | | .----------. | .----------. | | .----------. | | V | | V | | | .---#1----. |CipherText|--------->|CipherText|--------->|ClearText |------>| SHA 1 | | 2 | Step1 | 1 | Step2 | | | Step3 |MsgDigest| `----------' `----------' | `----------' `---------' | || | ||Step5 | || | || .---------. | .---------. | Digital | V | SHA 1 | |Signature|---------------------->|MsgDigest| _ `---------' Step4 _ `---#2----' | | _ (_) __| |_____ ____ ____ _ _ ____ _| |_ _ ___ ____ / _ | ___ |/ ___)/ ___) | | | _ (_ _) |/ _ \| _ \ ( (_| | ____( (___| | | |_| | |_| || |_| | |_| | | | | \____|_____)\____)_| \__ | __/ \__)_|\___/|_| |_| (____/|_|   * Step1: In this step the "CipherText 2" message is decrypted using the Receiver's Private Key, which results in Cipher Text 1.   * Step2: In this step the "CipherText 1" is decrypted using Sender's Public Key resulting in "ClearText".   * Step3: Here the SHA1 Message Digest of the "Clear Text" is created.   * Step4: The "Digital Signature" is then decrypted using Sender's Public Key, resulting the "SHA 1 MSG Digest".   * Step5: The "SHA1 MsgDigest #1" is then compared against "SHA1 MsgDigest # 2". If they are equal, the data was not modified during transmission, and the integrity of the Original "Clear Text" has been maintained ----------------------------------------------------------------------------- 6.2. Test Certificates While compiling Apache we created a test certificate. We used the makefile provided by mod_ssl to create this custom Certificate. We used the command: # make certificate TYPE=custom This certificate can be used for testing purposes. ----------------------------------------------------------------------------- 6.3. Certificates for Production use For production use you will need a certificate from a Certificate Authority (hereafter CA). Certificate Authorities are certificate vendors, who are listed as a Trusted CA in the user's browser. As mentioned in the Encryption Algorithms section, if the CA is not listed as a trusted authority, your user will get a warning message when trying to connect to a secure location. Similarly the test certificates will also cause a warning message to appear on the user's browser. ----------------------------------------------------------------------------- 6.4. How to generate a CSR CSR or Certificate Signing Request must be sent to the trusted CA for signing. This section discusses howto create a CSR, and send it to the CA of your choice. # openssl req command can be used to a CSR as follows: # cd /usr/local/apache/conf/ # /usr/local/ssl/bin/openssl req -new -nodes -keyout private.key -out public.csr Generating a 1024 bit RSA private key ............++++++ ....++++++ writing new private key to 'private.key' ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:US State or Province Name (full name) [Some-State]:California Locality Name (eg, city) []:San Jose Organization Name (eg, company) [Internet Widgits Pty Ltd]:Seagate Organizational Unit Name (eg, section) []:Global Client Server Common Name (eg, YOUR name) []:xml.seagate.com Email Address []:saqib@seagate.com Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []:badpassword An optional company name []: Note "PRNG not seeded"   If you do not have /dev/random on your system you will get a "PRNG not seeded" error message. In that case you can use the following command: # /usr/local/ssl/bin/openssl req -rand some_file.ext -new -nodes -keyout private.key -out public.csr Replace some_file.ext with the name of a existing file on your file system. Any file can be specified. Openssl will use that file to generate the seed Solaris 9 comes with /dev/random. However on Solaris you might have to install the [http:// sunsolve.sun.com/pub-cgi/findPatch.pl?patchId=112438] 112438 patch to get the /dev/random At this point you will be asked several questions about your server to generate the Certificate Singning Request Note: Your Common Name (CN) is the Fully Qualified DNS (FQDN) name of your webserver e.g. dav.server.com . If you put in anything else, it will NOT work. Remember the password that you use, for future reference. Once the process is complete, you will have private.key and a public.csr . You will need to submit the public.csr to the Certification Authority. At this pointe the public.key is not encrypted. To encrypt: # mv private.key private.key.unecrpyted # /usr/local/ssl/bin/openssl rsa -in private.key.unecrpyted -des3 -out private.key ----------------------------------------------------------------------------- 6.5. Installing Server Private Key, and Server Certificate Once the Certification Authority processes your request, they will send an encoded certificate (Digital Certificate) back to you. The Digital Certificate is in the format defined by X.509 v3. The following shows the structure of a typical X509 v3 Digital Certificate   * Certificate   + Version   + Serial Number   + Algorithm ID   + Issuer   + Validity   +    o Not Before   o Not After   + Subject   + Subject Public Key Info   +    o Public Key Algorithm   o RSA Public Key   + Extensions   * Certificate Signature Algorithm   * Certificate Signature ----------------------------------------------------------------------------- 6.5.1. Verifying a Digital Certificate To verify a X.509 Certificate use the following command # openssl verify server.crt server.crt: OK Where server.crt is the name of the file that contains the Digital Certificate ----------------------------------------------------------------------------- 6.5.2. Viewing the contents of a Digital Certificate The contents of a Digital Certificate can be viewed by using the # openssl x509 command as follows: # openssl x509 -text -in server.crt Certificate: Data: Version: 3 (0x2) Serial Number: 312312312 (0x0) Signature Algorithm: md5WithRSAEncryption Issuer: C=US, O=GTE Corporation, CN=GTE CyberTrust Root Validity Not Before: Feb 8 03:25:50 2000 GMT Not After : Feb 8 03:25:50 2001 GMT Subject: C=US, ST=New York, L=Pelham, O=xml-dev, OU=web, CN=www.xml-dev.com/Email=saqib@xml-dev.com Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (1024 bit) Modulus (1024 bit): ............ ............ Exponent: 65537 (0x10001) Signature Algorithm: md5WithRSAEncryption ............ ............ ----------------------------------------------------------------------------- 6.5.3. Modifying the httpd.conf to Install the Certificates You will need to place this certificate on the server, and tell Apache where to find it. For this example, the Private Key is placed in the /usr/local/apache2/conf/ ssl.key/ directory, and the Sever Certificate is placed in the /usr/local/ apache2/conf/ssl.crt/. Copy the file received from the Certification to a file called server.crt in the /usr/local/apache2/conf/ssl.crt/. And place the private.key generated in the previous step in the /usr/local/ apache2/conf/ssl.key/ Then modify the /usr/local/apache2/conf/ssl.conf to point to the correct Private Key and Server Certificate files: # Server Certificate: # Point SSLCertificateFile at a PEM encoded certificate. If # the certificate is encrypted, then you will be prompted for a # pass phrase. Note that a kill -HUP will prompt again. Keep # in mind that if you have both an RSA and a DSA certificate you # can configure both in parallel (to also allow the use of DSA # ciphers, etc.) SSLCertificateFile /usr/local/apache2/conf/ssl.crt/server.crt #SSLCertificateFile /usr/local/apache2/conf/ssl.crt/server-dsa.crt # Server Private Key: # If the key is not combined with the certificate, use this # directive to point at the key file. Keep in mind that if # you've both a RSA and a DSA private key you can configure # both in parallel (to also allow the use of DSA ciphers, etc.) SSLCertificateKeyFile /usr/local/apache2/conf/ssl.key/private.key #SSLCertificateKeyFile /usr/local/apache2/conf/ssl.key/server-dsa.key ----------------------------------------------------------------------------- 6.6. Removing passphrase from the RSA Private Key RSA Private Key stored on the webserver is usually encrypted, and you need a passphrase to parse the file. That is why you are prompted for a passphrase when start Apache with modssl: # apachectl startssl Apache/1.3.23 mod_ssl/2.8.6 (Pass Phrase Dialog) Some of your private key files are encrypted for security reasons. In order to read them you have to provide us with the pass phrases. Server your.server.dom:443 (RSA) Enter pass phrase: Encrypting the RSA Private Key is very important. If a cracker gets hold of your "Unencrypted RSA Private Key" he/she can easily impersonate your webserver. If the Key is encrypted, the cracker can not do anything without brute forcing the passphrase. Use of a strong (ie: long) passphrase is encouraged. However encrypting the Key can sometimes be nuisance, since you will be prompted for a passphrase everytime you start the web-server. Especially if you are using rc scripts to start the webserver at boot time. The prompt for a passphrase will stop the boot process, waiting for your input. You can get rid of the passphrase prompt easily by decrypting the Key. However make sure that no one can hold of this Key. I would recommend Hardening and Securing guidelines be followed before decrypting the Key on the webserver. To decrypt the Key: First make a copy of the encrypted key # cp server.key server.key.cryp Then re-write the key with encryption. You will be prompted for the original encrypted Key passphrase # /usr/local/ssl/bin/openssl rsa -in server.key.cryp -out server.key read RSA key Enter PEM pass phrase: writing RSA key One way to secure the decrypted Private Key is to make readable only by the root: # chmod 400 server.key ----------------------------------------------------------------------------- 6.7. SSL Performance Tuning 6.7.1. Inter Process SSL Session Cache Apache uses a multi-process model, in which all the request are NOT handled by the same process. This causes the SSL Session Information to be lost when a Client makes multiple requests. Multiple SSL HandShakes causes lot of overhead on the webserver and the client. To avoid this, SSL Session Information must be stored in a inter-process Session Cache, allowing all the processes to have access to to handshake information. SSLSessionCache Directive the in /usr/local/apache2/conf/ssl.conf file can be used to specify the location of the SSL Session Cache: SSLSessionCache shmht:logs/ssl_scache(512000) #SSLSessionCache shmcb:logs/ssl_scache(512000) #SSLSessionCache dbm:logs/ssl_scache SSLSessionCacheTimeout 300 Using dbm:logs/ssl_scache creates the Cache as DBM hashfile on the local disk. Using shmht:logs/ssl_scache(512000) creates the Cache in Shared Memory Segment Note shmht vs shmcb   shmht: uses a Hash Table to Cache the SSL HandShake Information in the Shared Memory shmht: uses a Cyclic Buffer to Cache the SSL HandShake Informationin the Shared Memory Note Note:   Not all platforms/OS support creation of Hash table in the Shared Memory. So dbm:logs/ssl_scache must be used instead ----------------------------------------------------------------------------- 6.7.2. Verifying SSLSession Cache To verify if the SSLSessionCache is working properly, you can use the openssl utility with the -reconnect as follows: # openssl s_client -connect your.server.dom:443 -state -reconnect CONNECTED(00000003) ....... ....... Reused, TLSv1/SSLv3, Cipher is EDH-RSA-DES-CBC3-SHA SSL-Session: ..... Reused, TLSv1/SSLv3, Cipher is EDH-RSA-DES-CBC3-SHA SSL-Session: ..... Reused, TLSv1/SSLv3, Cipher is EDH-RSA-DES-CBC3-SHA SSL-Session: ..... Reused, TLSv1/SSLv3, Cipher is EDH-RSA-DES-CBC3-SHA SSL-Session: ..... Reused, TLSv1/SSLv3, Cipher is EDH-RSA-DES-CBC3-SHA SSL-Session: ..... -reconnect forces the s_client to connect to the server 5 times using the same SSL session ID. You should see 5 attempts of Reusing the same Session-ID as shown above. ----------------------------------------------------------------------------- A. HTTP/HTTPS Benchmarking tools The following is a list of some of the OpenSource BenchMarking tools for WebServers i. [http://distcache.sourceforge.net/] SSLswamp - For stress-testing/ benchmarking connction to a SSL enable server ii. [http://www.hpl.hp.com/personal/David_Mosberger/httperf.html] HTTPERF - A Tool for Measuring Web Server Performance iii. [http://httpd.apache.org/docs-2.1/en/programs/ab.html] ab - Apache HTTP server benchmarking tool ----------------------------------------------------------------------------- B. Hardware based SSL encryption solutions The following is a Hardware Based SSL encryption solution available: i. [http://www.ncipher.com] CHIL (Cryptographic Hardware Interface Library) by nCipher ii. [http://httpd.apache.org/docs-2.1/en/programs/ab.html] ab - Apache HTTP server benchmarking tool ----------------------------------------------------------------------------- C. Certificate Authorities The following is list of Certificate Authorities that are trusted by the various browsers: i. [http://www.baltimore.com/] Baltimore ii. [http://www.entrust.com/] Entrust iii. [http://www.globalsign.net/] GeoTrust iv. [http://www.thawte.com] Thawte v. [http://www.trustcenter.de/] TrustCenter Glossary of PKI Terms A Asymmetric Cryptography In this Cryptography a Key Pair - Private and Public Key is used. Private Key is kept secret and the Public Key is Widely distributed. C Certificate A Data Record that contains the information as defined in the X.509 Format. Certificate Authority (CA) (CA) Issuer of the Digital Certificate. Also validates the Identity of the End-Entity that posseses the Digital Certificate. Certificate Signing Request (CSR) (CSR) Certificate Signing Request (CSR) is what you send to a Certifiate Authority (CA) to get enrolled. A CSR contains the Public Key of the End-Entity that is a requesting the Digital Certificate. Common Name (CN) (CN) Common Name is the name of the End-Entity e.g. Saqib Ali. If the End-Entity is a WebServer the CN is the Fully Qualified Domain Name (FQDN) of the WebServer D Digital Certificate A certificate that binds a Public Key to a Subject (end-entity). This certificate also contains other indentifying information about the subject as defined in the X.509 Format. It is signed by Issuing CA, using CA's pivate key. e.g. of a digital certificate Digital Signature A Digital Signature is created by signing the Message Digest (Message Hash) using the Private Key. It ensures the Identity of the Sender, and the Integrity of the Data. E End-Entity An entity that participates in the PKI. Usually a Server, Service, Router, or a Person. A CA is not a End-Entity. An RA is an End-Entity to the CA H Hash A hash is Hexadecimal number generated from a string of text such that, no two different strings can produce the same hash. HMAC: Keyed Hashing for Message Authentication (HMAC) HMAC is an implementation of Message Authentication Code Algorithm. M Message Authentication Code (MAC) Similar to a Message Digest (Hash/Fingerprint), except the Shared Secret Key is used in the process of calculating the Hash. Since a shared secret key is used, an attacker can not change the Message Digest. However the shared secret key has to be first communicated to the participating entities, unlike Digital Signature where Message Digest is signed using the Private Key. HMAC is an example of a Message Authentication Code Algorithm. Message Digest 5 - MD5 (MD5) Message Digest 5 (MD5) is a 128-bit one-way hash function P Private Key Private Key is the Key in Asymmetric Cryptography that is kept secret by the owner (End-Entity). Can be used for encryption or decryption Public Key Public Key is the Key in Asymmetric Cryptography that is widely distributed. Can be used for encryption or decryption Public Key Infrastructure (PKI) (PKI) Public Key Infrastructure S SHA-1: Secure Hash Algorithm (MD5) Secure Hash Algorithm (SHA-1) is a 160-bit one-way hash function. Maximum message is 2^64 bits. Secure Socket Layer (SSL) (SSL) Secure Socket Layer (SSL) is a security protocol that provides authentication (Digital Certificate), confidentiality (encryption), and data integrity (Message Digest - MD5, SHA etc). Symmetric Cryptography In this cryptography the message the encrypted and decrypted by the same key. (((n^2-n))/2) keys are required for n users who want to participate in this system of cryptography. Linux Assembly HOWTO Konstantin Boldyshev Linux Assembly     konst@linuxassembly.org Francois-Rene Rideau Tunes project     fare@tunes.org Copyright © 1999-2002 by Konstantin Boldyshev Copyright © 1996-1999 by Francois-Rene Rideau $Date: 2002/08/17 08:35:59 $ This is the Linux Assembly HOWTO, version 0.6f. This document describes how to program in assembly language using free programming tools, focusing on development for or from the Linux Operating System, mostly on IA-32 (i386) platform. Included material may or may not be applicable to other hardware and/or software platforms. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and no Back-Cover texts. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. Legal Blurb 1.2. Foreword 1.3. Contributions 1.4. Translations 2. Do you need assembly? 2.1. Pros and Cons 2.2. How to NOT use Assembly 2.3. Linux and assembly 3. Assemblers 3.1. GCC Inline Assembly 3.2. GAS 3.3. NASM 3.4. AS86 3.5. Other Assemblers 4. Metaprogramming 4.1. External filters 4.2. Metaprogramming 5. Calling conventions 5.1. Linux 5.2. DOS and Windows 5.3. Your own OS 6. Quick start 6.1. Introduction 6.2. Hello, world! 6.3. Building an executable 7. Resources 7.1. Pointers 7.2. Mailing list 8. Frequently Asked Questions A. History B. Acknowledgements C. Endorsements D. GNU Free Documentation License ----------------------------------------------------------------------------- Chapter 1. Introduction Note You can skip this chapter if you are familiar with HOWTOs, or just hate to read all this assembly-unrelated crap. ----------------------------------------------------------------------------- 1.1. Legal Blurb Permission is granted to copy, distribute and/or modify this document under the terms of the GNU [http://www.gnu.org/copyleft/fdl.html] Free Documentation License Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and no Back-Cover texts. A copy of the license is included in the GNU Free Documentation License appendix. The most recent official version of this document is available from the [http://linuxassembly.org/howto.html] Linux Assembly and [http://linuxdoc.org /docs.html] LDP sites. If you are reading a few-months-old copy, consider checking the above URLs for a new version. ----------------------------------------------------------------------------- 1.2. Foreword This document aims answering questions of those who program or want to program 32-bit x86 assembly using free software, particularly under the Linux operating system. At many places Universal Resource Locators (URL) are given for some software or documentation repository. This document also points to other documents about non-free, non-x86, or non-32-bit assemblers, although this is not its primary goal. Also note that there are FAQs and docs about programming on your favorite platform (whatever it is), which you should consult for platform-specific issues, not related directly to assembly programming. Because the main interest of assembly programming is to build the guts of operating systems, interpreters, compilers, and games, where C compiler fails to provide the needed expressiveness (performance is more and more seldom as issue), we are focusing on development of such kind of software. If you don't know what free software is, please do read carefully the GNU [http://www.gnu.org/copyleft/gpl.html] General Public License (GPL or copyleft), which is used in a lot of free software, and is the model for most of their licenses. It generally comes in a file named COPYING (or COPYING.LIB). Literature from the [http://www.fsf.org] Free Software Foundation (FSF) might help you too. Particularly, the interesting feature of free software is that it comes with source code which you can consult and correct, or sometimes even borrow from. Read your particular license carefully and do comply to it. ----------------------------------------------------------------------------- 1.3. Contributions This is an interactively evolving document: you are especially invited to ask questions, to answer questions, to correct given answers, to give pointers to new software, to point the current maintainer to bugs or deficiencies in the pages. In one word, contribute! To contribute, please contact the maintainer. Note At the time of writing, it is Konstantin Boldyshev and no more Francois-Rene Rideau (since version 0.5). I (Fare) had been looking for some time for a serious hacker to replace me as maintainer of this document, and am pleased to announce Konstantin as my worthy successor. ----------------------------------------------------------------------------- 1.4. Translations Korean translation of this HOWTO is avalilable at [http://kldp.org/HOWTO/html /Assembly-HOWTO/] http://kldp.org/HOWTO/html/Assembly-HOWTO/. Also, there was French translation of the early HOWTO versions, but I couldn't find it now. ----------------------------------------------------------------------------- Chapter 2. Do you need assembly? Well, I wouldn't want to interfere with what you're doing, but here is some advice from the hard-earned experience. ----------------------------------------------------------------------------- 2.1. Pros and Cons 2.1.1. The advantages of Assembly Assembly can express very low-level things:   * you can access machine-dependent registers and I/O   * you can control the exact code behavior in critical sections that might otherwise involve deadlock between multiple software threads or hardware devices   * you can break the conventions of your usual compiler, which might allow some optimizations (like temporarily breaking rules about memory allocation, threading, calling conventions, etc)   * you can build interfaces between code fragments using incompatible conventions (e.g. produced by different compilers, or separated by a low-level interface)   * you can get access to unusual programming modes of your processor (e.g. 16 bit mode to interface startup, firmware, or legacy code on Intel PCs)   * you can produce reasonably fast code for tight loops to cope with a bad non-optimizing compiler (but then, there are free optimizing compilers available!)   * you can produce hand-optimized code perfectly tuned for your particular hardware setup, though not to someone else's   * you can write some code for your new language's optimizing compiler (that is something what very few ones will ever do, and even they not often)   * i.e. you can be in complete control of your code ----------------------------------------------------------------------------- 2.1.2. The disadvantages of Assembly Assembly is a very low-level language (the lowest above hand-coding the binary instruction patterns). This means   * it is long and tedious to write initially   * it is quite bug-prone   * your bugs can be very difficult to chase   * your code can be fairly difficult to understand and modify, i.e. to maintain   * the result is non-portable to other architectures, existing or upcoming   * your code will be optimized only for a certain implementation of a same architecture: for instance, among Intel-compatible platforms each CPU design and its variations (relative latency, through-output, and capacity, of processing units, caches, RAM, bus, disks, presence of FPU, MMX, 3DNOW, SIMD extensions, etc) implies potentially completely different optimization techniques. CPU designs already include: Intel 386, 486, Pentium, PPro, PII, PIII, PIV; Cyrix 5x86, 6x86, M2; AMD K5, K6 (K6-2, K6-III), K7 (Athlon, Duron). New designs keep popping up, so don't expect either this listing and your code to be up-to-date.   * you spend more time on a few details and can't focus on small and large algorithmic design, that are known to bring the largest part of the speed up (e.g. you might spend some time building very fast list/array manipulation primitives in assembly; only a hash table would have sped up your program much more; or, in another context, a binary tree; or some high-level structure distributed over a cluster of CPUs)   * a small change in algorithmic design might completely invalidate all your existing assembly code. So that either you're ready (and able) to rewrite it all, or you're tied to a particular algorithmic design   * On code that ain't too far from what's in standard benchmarks, commercial optimizing compilers outperform hand-coded assembly (well, that's less true on the x86 architecture than on RISC architectures, and perhaps less true for widely available/free compilers; anyway, for typical C code, GCC is fairly good);   * And in any case, as says moderator John Levine on [news:comp.compilers] comp.compilers, "compilers make it a lot easier to use complex data structures, and compilers don't get bored halfway through and generate reliably pretty good code." They will also correctly propagate code transformations throughout the whole (huge) program when optimizing code between procedures and module boundaries. ----------------------------------------------------------------------------- 2.1.3. Assessment All in all, you might find that though using assembly is sometimes needed, and might even be useful in a few cases where it is not, you'll want to:   * minimize use of assembly code   * encapsulate this code in well-defined interfaces   * have your assembly code automatically generated from patterns expressed in a higher-level language than assembly (e.g. GCC inline assembly macros)   * have automatic tools translate these programs into assembly code   * have this code be optimized if possible   * All of the above, i.e. write (an extension to) an optimizing compiler back-end. Even when assembly is needed (e.g. OS development), you'll find that not so much of it is required, and that the above principles retain. See the Linux kernel sources concerning this: as little assembly as needed, resulting in a fast, reliable, portable, maintainable OS. Even a successful game like DOOM was almost massively written in C, with a tiny part only being written in assembly for speed up. ----------------------------------------------------------------------------- 2.2. How to NOT use Assembly 2.2.1. General procedure to achieve efficient code As says Charles Fiterman on [news:comp.compilers] comp.compilers about human vs computer-generated assembly code: The human should always win and here is why. First the human writes the whole thing in a high level language. Second he profiles it to find the hot spots where it spends its time. Third he has the compiler produce assembly for those small sections of code. Fourth he hand tunes them looking for tiny improvements over the machine generated code. The human wins because he can use the machine. ----------------------------------------------------------------------------- 2.2.2. Languages with optimizing compilers Languages like ObjectiveCAML, SML, CommonLISP, Scheme, ADA, Pascal, C, C++, among others, all have free optimizing compilers that will optimize the bulk of your programs, and often do better than hand-coded assembly even for tight loops, while allowing you to focus on higher-level details, and without forbidding you to grab a few percent of extra performance in the above-mentioned way, once you've reached a stable design. Of course, there are also commercial optimizing compilers for most of these languages, too! Some languages have compilers that produce C code, which can be further optimized by a C compiler: LISP, Scheme, Perl, and many other. Speed is fairly good. ----------------------------------------------------------------------------- 2.2.3. General procedure to speed your code up As for speeding code up, you should do it only for parts of a program that a profiling tool has consistently identified as being a performance bottleneck. Hence, if you identify some code portion as being too slow, you should   * first try to use a better algorithm;   * then try to compile it rather than interpret it;   * then try to enable and tweak optimization from your compiler;   * then give the compiler hints about how to optimize (typing information in LISP; register usage with GCC; lots of options in most compilers, etc).   * then possibly fallback to assembly programming Finally, before you end up writing assembly, you should inspect generated code, to check that the problem really is with bad code generation, as this might really not be the case: compiler-generated code might be better than what you'd have written, particularly on modern multi-pipelined architectures! Slow parts of a program might be intrinsically so. The biggest problems on modern architectures with fast processors are due to delays from memory access, cache-misses, TLB-misses, and page-faults; register optimization becomes useless, and you'll more profitably re-think data structures and threading to achieve better locality in memory access. Perhaps a completely different approach to the problem might help, then. ----------------------------------------------------------------------------- 2.2.4. Inspecting compiler-generated code There are many reasons to inspect compiler-generated assembly code. Here is what you'll do with such code:   * check whether generated code can be obviously enhanced with hand-coded assembly (or by tweaking compiler switches)   * when that's the case, start from generated code and modify it instead of starting from scratch   * more generally, use generated code as stubs to modify, which at least gets right the way your assembly routines interface to the external world   * track down bugs in your compiler (hopefully the rarer) The standard way to have assembly code be generated is to invoke your compiler with the -S flag. This works with most Unix compilers, including the GNU C Compiler (GCC), but YMMV. As for GCC, it will produce more understandable assembly code with the -fverbose-asm command-line option. Of course, if you want to get good assembly code, don't forget your usual optimization options and hints! ----------------------------------------------------------------------------- 2.3. Linux and assembly As you probably noticed, in general case you don't need to use assembly language in Linux programming. Unlike DOS, you do not have to write Linux drivers in assembly (well, actually you can do it if you really want). And with modern optimizing compilers, if you care of speed optimization for different CPU's, it's much simpler to write in C. However, if you're reading this, you might have some reason to use assembly instead of C/C++. You may need to use assembly, or you may want to use assembly. In short, main practical (need) reasons of diving into the assembly realm are small code and libc independence. Impractical (want), and the most often reason is being just an old crazy hacker, who has twenty years old habit of doing everything in assembly language. However, if you're porting Linux to some embedded hardware you can be quite short at the size of whole system: you need to fit kernel, libc and all that stuff of (file|find|text|sh|etc.) utils into several hundreds of kilobytes, and every kilobyte costs much. So, one of the possible ways is to rewrite some (or all) parts of system in assembly, and this will really save you a lot of space. For instance, a simple httpd written in assembly can take less than 600 bytes; you can fit a server consisting of kernel, httpd and ftpd in 400 KB or less... Think about it. ----------------------------------------------------------------------------- Chapter 3. Assemblers 3.1. GCC Inline Assembly The well-known GNU C/C++ Compiler (GCC), an optimizing 32-bit compiler at the heart of the GNU project, supports the x86 architecture quite well, and includes the ability to insert assembly code in C programs, in such a way that register allocation can be either specified or left to GCC. GCC works on most available platforms, notably Linux, *BSD, VSTa, OS/2, *DOS, Win*, etc. ----------------------------------------------------------------------------- 3.1.1. Where to find GCC The original GCC site is the GNU FTP site [ftp://prep.ai.mit.edu/pub/gnu/gcc /] ftp://prep.ai.mit.edu/pub/gnu/gcc/ together with all released application software from the GNU project. Linux-configured and pre-compiled versions can be found in [ftp://metalab.unc.edu/pub/Linux/GCC/] ftp://metalab.unc.edu/pub/ Linux/GCC/ There are a lot of FTP mirrors of both sites everywhere around the world, as well as CD-ROM copies. GCC development has split into two branches some time ago (GCC 2.8 and EGCS), but they merged back, and current GCC webpage is [http://gcc.gnu.org] http:// gcc.gnu.org. Sources adapted to your favorite OS and pre-compiled binaries should be found at your usual FTP sites. DOS port of GCC is called [http://www.delorie.com/djgpp/] DJGPP. There are two Win32 GCC ports: [http://sourceware.cygnus.com/cygwin/] cygwin and [http://www.mingw.org] mingw There is also an OS/2 port of GCC called EMX; it works under DOS too, and includes lots of unix-emulation library routines. Look around the following site: [ftp://ftp-os2.cdrom.com/pub/os2/emx09c/] ftp://ftp-os2.cdrom.com/pub/ os2/emx09c. ----------------------------------------------------------------------------- 3.1.2. Where to find docs for GCC Inline Asm The documentation of GCC includes documentation files in TeXinfo format. You can compile them with TeX and print then result, or convert them to .info, and browse them with emacs, or convert them to .html, or nearly whatever you like; convert (with the right tools) to whatever you like, or just read as is. The .info files are generally found on any good installation for GCC. The right section to look for is C Extensions::Extended Asm:: Section Invoking GCC::Submodel Options::i386 Options:: might help too. Particularly, it gives the i386 specific constraint names for registers: abcdSDB correspond to %eax, %ebx, %ecx, %edx, %esi, %edi and %ebp respectively (no letter for %esp). The DJGPP Games resource (not only for game hackers) had page specifically about assembly, but it's down. Its data have nonetheless been recovered on the DJGPP site, that contains a mine of other useful information: [http:// www.delorie.com/djgpp/doc/brennan/] http://www.delorie.com/djgpp/doc/brennan /, and in the [http://www.castle.net/~avly/djasm.html] DJGPP Quick ASM Programming Guide. GCC depends on GAS for assembling and follows its syntax (see below); do mind that inline asm needs percent characters to be quoted, they will be passed to GAS. See the section about GAS below. Find lots of useful examples in the linux/include/asm-i386/ subdirectory of the sources for the Linux kernel. ----------------------------------------------------------------------------- 3.1.3. Invoking GCC to build proper inline assembly code Because assembly routines from the kernel headers (and most likely your own headers, if you try making your assembly programming as clean as it is in the linux kernel) are embedded in extern inline functions, GCC must be invoked with the -O flag (or -O2, -O3, etc), for these routines to be available. If not, your code may compile, but not link properly, since it will be looking for non-inlined extern functions in the libraries against which your program is being linked! Another way is to link against libraries that include fallback versions of the routines. Inline assembly can be disabled with -fno-asm, which will have the compiler die when using extended inline asm syntax, or else generate calls to an external function named asm() that the linker can't resolve. To counter such flag, -fasm restores treatment of the asm keyword. More generally, good compile flags for GCC on the x86 platform are gcc -O2 -fomit-frame-pointer -W -Wall -O2 is the good optimization level in most cases. Optimizing besides it takes more time, and yields code that is much larger, but only a bit faster; such over-optimization might be useful for tight loops only (if any), which you may be doing in assembly anyway. In cases when you need really strong compiler optimization for a few files, do consider using up to -O6. -fomit-frame-pointer allows generated code to skip the stupid frame pointer maintenance, which makes code smaller and faster, and frees a register for further optimizations. It precludes the easy use of debugging tools (gdb), but when you use these, you just don't care about size and speed anymore anyway. -W -Wall enables all useful warnings and helps you to catch obvious stupid errors. You can add some CPU-specific -m486 or such flag so that GCC will produce code that is more adapted to your precise CPU. Note that modern GCC has -mpentium and such flags (and [http://goof.com/pcg/] PGCC has even more), whereas GCC 2.7.x and older versions do not. A good choice of CPU-specific flags should be in the Linux kernel. Check the TeXinfo documentation of your current GCC installation for more. -m386 will help optimize for size, hence also for speed on computers whose memory is tight and/or loaded, since big programs cause swap, which more than counters any "optimization" intended by the larger code. In such settings, it might be useful to stop using C, and use instead a language that favors code factorization, such as a functional language and/or FORTH, and use a bytecode- or wordcode- based implementation. Note that you can vary code generation flags from file to file, so performance-critical files will use maximum optimization, whereas other files will be optimized for size. To optimize even more, option -mregparm=2 and/or corresponding function attribute might help, but might pose lots of problems when linking to foreign code, including libc. There are ways to correctly declare foreign functions so the right call sequences be generated, or you might want to recompile the foreign libraries to use the same register-based calling convention... Note that you can add make these flags the default by editing file /usr/lib/ gcc-lib/i486-linux/2.7.2.3/specs or wherever that is on your system (better not add -W -Wall there, though). The exact location of the GCC specs files on system can be found by gcc -v. ----------------------------------------------------------------------------- 3.1.4. Macro support GCC allows (and requires) you to specify register constraints in your inline assembly code, so the optimizer always know about it; thus, inline assembly code is really made of patterns, not forcibly exact code. Thus, you can make put your assembly into CPP macros, and inline C functions, so anyone can use it in as any C function/macro. Inline functions resemble macros very much, but are sometimes cleaner to use. Beware that in all those cases, code will be duplicated, so only local labels (of 1: style) should be defined in that asm code. However, a macro would allow the name for a non local defined label to be passed as a parameter (or else, you should use additional meta-programming methods). Also, note that propagating inline asm code will spread potential bugs in them; so watch out doubly for register constraints in such inline asm code. Lastly, the C language itself may be considered as a good abstraction to assembly programming, which relieves you from most of the trouble of assembling. ----------------------------------------------------------------------------- 3.2. GAS GAS is the GNU Assembler, that GCC relies upon. ----------------------------------------------------------------------------- 3.2.1. Where to find it Find it at the same place where you've found GCC, in the binutils package. The latest version of binutils is available from [http://sources.redhat.com/ binutils/] http://sources.redhat.com/binutils/. ----------------------------------------------------------------------------- 3.2.2. What is this AT&T syntax Because GAS was invented to support a 32-bit unix compiler, it uses standard AT&T syntax, which resembles a lot the syntax for standard m68k assemblers, and is standard in the UNIX world. This syntax is neither worse, nor better than the Intel syntax. It's just different. When you get used to it, you find it much more regular than the Intel syntax, though a bit boring. Here are the major caveats about GAS syntax:   * Register names are prefixed with %, so that registers are %eax, %dl and so on, instead of just eax, dl, etc. This makes it possible to include external C symbols directly in assembly source, without any risk of confusion, or any need for ugly underscore prefixes.   * The order of operands is source(s) first, and destination last, as opposed to the Intel convention of destination first and sources last. Hence, what in Intel syntax is mov eax,edx (move contents of register edx into register eax) will be in GAS syntax mov %edx,%eax.   * The operand size is specified as a suffix to the instruction name. The suffix is b for (8-bit) byte, w for (16-bit) word, and l for (32-bit) long. For instance, the correct syntax for the above instruction would have been movl %edx,%eax. However, gas does not require strict AT&T syntax, so the suffix is optional when size can be guessed from register operands, and else defaults to 32-bit (with a warning).   * Immediate operands are marked with a $ prefix, as in addl $5,%eax (add immediate long value 5 to register %eax).   * Missing operand prefix indicates that it is memory-contents; hence movl $foo,%eax puts the address of variable foo into register %eax, but movl foo,%eax puts the contents of variable foo into register %eax.   * Indexing or indirection is done by enclosing the index register or indirection memory cell address in parentheses, as in testb $0x80,17 (%ebp) (test the high bit of the byte value at offset 17 from the cell pointed to by %ebp). Note: There are few programs which may help you to convert source code between AT&T and Intel assembler syntaxes; some of the are capable of performing conversion in both directions. GAS has comprehensive documentation in TeXinfo format, which comes at least with the source distribution. Browse extracted .info pages with Emacs or whatever. There used to be a file named gas.doc or as.doc around the GAS source package, but it was merged into the TeXinfo docs. Of course, in case of doubt, the ultimate documentation is the sources themselves! A section that will particularly interest you is Machine Dependencies::i386-Dependent:: Again, the sources for Linux (the OS kernel) come in as excellent examples; see under linux/arch/i386/ the following files: kernel/*.S, boot/compressed/ *.S, math-emu/*.S. If you are writing kind of a language, a thread package, etc., you might as well see how other languages ( [http://para.inria.fr/] OCaml, [http:// www.jwdt.com/~paysan/gforth.html] Gforth, etc.), or thread packages (QuickThreads, MIT pthreads, LinuxThreads, etc), or whatever else do it. Finally, just compiling a C program to assembly might show you the syntax for the kind of instructions you want. See section Do you need assembly? above. ----------------------------------------------------------------------------- 3.2.3. Intel syntax Good news are that starting from binutils 2.10 release, GAS supports Intel syntax too. It can be triggered with .intel_syntax directive. Unfortunately this mode is not documented (yet?) in the official binutils manual, so if you want to use it, try to examine [http://home.snafu.de/phpr/lhpas86.html.gz] http://home.snafu.de/phpr/lhpas86.html.gz, which is an extract from AMD 64bit port of binutils 2.11. ----------------------------------------------------------------------------- 3.2.4. 16-bit mode Binutils (2.9.1.0.25+) now fully support 16-bit mode (registers and addressing) on i386 PCs. Use .code16 and .code32 to switch between assembly modes. Also, a neat trick used by several people (including the oskit authors) is to force GCC to produce code for 16-bit real mode, using an inline assembly statement asm(".code16\n"). GCC will still emit only 32-bit addressing modes, but GAS will insert proper 32-bit prefixes for them. ----------------------------------------------------------------------------- 3.2.5. Macro support GAS has some macro capability included, as detailed in the texinfo docs. Moreover, while GCC recognizes .s files as raw assembly to send to GAS, it also recognizes .S files as files to pipe through CPP before feeding them to GAS. Again and again, see Linux sources for examples. GAS also has GASP (GAS Preprocessor), which adds all the usual macroassembly tricks to GAS. GASP comes together with GAS in the GNU binutils archive. It works as a filter, like CPP and M4. I have no idea on details, but it comes with its own texinfo documentation, which you would like to browse (info gasp ), print, grok. GAS with GASP looks like a regular macro-assembler to me. ----------------------------------------------------------------------------- 3.3. NASM The Netwide Assembler project provides cool i386 assembler, written in C, that should be modular enough to eventually support all known syntaxes and object formats. ----------------------------------------------------------------------------- 3.3.1. Where to find NASM [http://nasm.sourceforge.net] http://nasm.sourceforge.net, [http:// www.cryogen.com/nasm/] http://www.cryogen.com/nasm/ Binary release on your usual metalab mirror in devel/lang/asm/ directory. Should also be available as .rpm or .deb in your usual RedHat/Debian distributions' contrib. ----------------------------------------------------------------------------- 3.3.2. What it does The syntax is Intel-style. Comprehensive macroprocessing support is integrated. Supported object file formats are bin, aout, coff, elf, as86, obj (DOS), win32, rdf (their own format). NASM can be used as a backend for the free LCC compiler (support files included). Unless you're using BCC as a 16-bit compiler (which is out of scope of this 32-bit HOWTO), you should definitely use NASM instead of say AS86 or MASM, because it runs on all platforms. Note NASM comes with a disassembler, NDISASM. Its hand-written parser makes it much faster than GAS, though of course, it doesn't support three bazillion different architectures. If you like Intel-style syntax, as opposed to GAS syntax, then it should be the assembler of choice.. Note: There are few programs which may help you to convert source code between AT&T and Intel assembler syntaxes; some of the are capable of performing conversion in both directions. ----------------------------------------------------------------------------- 3.4. AS86 AS86 is a 80x86 assembler, both 16-bit and 32-bit, with integrated macro support. It has mostly Intel-syntax, though it differs slightly as for addressing modes. ----------------------------------------------------------------------------- 3.4.1. Where to get AS86 Current version is 0.16, it can be found at [http://www.cix.co.uk/~mayday/] http://www.cix.co.uk/~mayday/, in bin86 package with linker (ld86), or as separate archive. Note A completely outdated version 0.4 of AS86 is distributed by HJLu just to compile the Linux kernel versions prior to 2.4, in a package named bin86, available in any Linux GCC repository. But I advise no one to use it for anything else but compiling Linux. This version supports only a hacked minix object file format, which is not supported by the GNU binutils or anything, and it has a few bugs in 32-bit mode, so you really should better keep it only for compiling Linux. ----------------------------------------------------------------------------- 3.4.2. Where to find docs See the man page and as.doc from the source package. When in doubt, the sources themselves are often a good docs: they aren't very well commented, but the programming style is straightforward. You might try to see how as86 is used in ELKS, LILO, or Tunes 0.0.0.25... ----------------------------------------------------------------------------- 3.4.3. Using AS86 with BCC Here's the GNU Makefile entry for using BCC to transform .s asm into both a.out .o object and .l listing: %.o %.l: %.s bcc -3 -G -c -A-d -A-l -A$*.l -o $*.o $< Remove the %.l, -A-l, and -A$*.l, if you don't want any listing. If you want something else than a.out, you can examine BCC docs about the other supported formats, and/or use the objcopy utility from the GNU binutils package. ----------------------------------------------------------------------------- 3.5. Other Assemblers There are other assemblers with various interesting and outstanding features which may be of your interest as well. Note They can be in various stages of development, and can be non-classic/ high-level/whatever else. ----------------------------------------------------------------------------- 3.5.1. YASM YASM is a complete rewrite of the NASM assembler under the GNU GPL (some portions are under the "new" BSD License). It is designed from the ground up to allow for multiple syntaxes to be supported (eg, NASM, TASM, GAS, etc.) in addition to multiple output object formats. Another primary module of the overall design is an optimizer module. It looks promising; it is under heavy development, and you may want to take part. See [http://www.tortall.net/projects/yasm/] http://www.tortall.net/ projects/yasm/. ----------------------------------------------------------------------------- 3.5.2. FASM FASM (flat assembler) is a fast, efficient 80x86 assembler that runs in 'flat real mode'. Unlike many other 80x86 assemblers, FASM only requires the source code to include the information it really needs. It is written in itself and is very small and fast. It runs on DOS/Windows/Linux and can produce flat binary, DOS EXE, Win32 PE and COFF output. See [http://fasm.sourceforge.net] http://fasm.sourceforge.net. ----------------------------------------------------------------------------- 3.5.3. OSIMPA (SHASM) osimpa is an assembler for Intel 80386 processors and subsequent, written entirely in the GNU Bash command interpreter shell. The predecessor of osimpa was shasm. osimpa is much cleaned up, can create useful Linux ELF executables, and has various HLL-like extensions and programmer convenience commands. It is (of course) slower than other assemblers. It has its own syntax (and uses its own names for x86 opcodes) Fairly good documentation is included. Check it out: [ftp://linux01.gwdg.de/pub/cLIeNUX/interim/] ftp:// linux01.gwdg.de/pub/cLIeNUX/interim/. Probably you'll not use it on regular basis, but at least it deserves your interest as an interesting idea. ----------------------------------------------------------------------------- 3.5.4. TDASM The Table Driven Assembler (TDASM) is a free portable cross assembler for any kind of assembly language. It should be possible to use it as a compiler to any target microprocessor using a table that defines the compilation process. It is available from [http://www.penguin.cz/~niki/tdasm/] http:// www.penguin.cz/~niki/tdasm/. ----------------------------------------------------------------------------- 3.5.5. HLA [http://webster.cs.ucr.edu] HLA is a High Level Assembly language. It uses a high level language like syntax (similar to Pascal, C/C++, and other HLLs) for variable declarations, procedure declarations, and procedure calls. It uses a modified assembly language syntax for the standard machine instructions. It also provides several high level language style control structures (if, while, repeat..until, etc.) that help you write much more readable code. HLA is free and comes with source, Linux and Win32 versions available. On Win32 you need MASM and a 32-bit version of MS-link on Win32, on Linux you nee GAS, because HLA produces specified assembler code and uses that assembler for final assembling and linking. ----------------------------------------------------------------------------- 3.5.6. TALC [http://www.cs.cornell.edu/talc/] TALC is another free MASM/Win32 based compiler (however it supports ELF output, does it?). TAL stands for Typed Assembly Language. It extends traditional untyped assembly languages with typing annotations, memory management primitives, and a sound set of typing rules, to guarantee the memory safety, control flow safety,and type safety of TAL programs. Moreover, the typing constructs are expressive enough to encode most source language programming features including records and structures, arrays, higher-order and polymorphic functions, exceptions, abstract data types, subtyping, and modules. Just as importantly, TAL is flexible enough to admit many low-level compiler optimizations. Consequently, TAL is an ideal target platform for type-directed compilers that want to produce verifiably safe code for use in secure mobile code applications or extensible operating system kernels. ----------------------------------------------------------------------------- 3.5.7. Free Pascal [http://www.freepascal.org] Free Pascal has an internal 32-bit assembler (based on NASM tables) and a switchable output that allows:   * Binary (ELF and coff when crosscompiled .o) output   * NASM   * MASM   * TASM   * AS (aout,coff, elf32) The MASM and TASM output are not as good debugged as the other two, but can be handy sometimes. The assembler's look and feel are based on Turbo Pascal's internal BASM, and the IDE supports similar highlighting, and FPC can fully integrate with gcc (on C level, not C++). Using a dummy RTL, one can even generate pure assembler programs. ----------------------------------------------------------------------------- 3.5.8. Win32Forth assembler Win32Forth is a free 32-bit ANS FORTH system that successfully runs under Win32s, Win95, Win/NT. It includes a free 32-bit assembler (either prefix or postfix syntax) integrated into the reflective FORTH language. Macro processing is done with the full power of the reflective language FORTH; however, the only supported input and output contexts is Win32For itself (no dumping of .obj file, but you could add that feature yourself, of course). Find it at [ftp://ftp.forth.org/pub/Forth/Compilers/native/windows/Win32For/] ftp://ftp.forth.org/pub/Forth/Compilers/native/windows/Win32For/. ----------------------------------------------------------------------------- 3.5.9. Terse [http://www.terse.com] Terse is a programming tool that provides THE most compact assembler syntax for the x86 family! However, it is evil proprietary software. It is said that there was a project for a free clone somewhere, that was abandoned after worthless pretenses that the syntax would be owned by the original author. Thus, if you're looking for a nifty programming project related to assembly hacking, I invite you to develop a terse-syntax frontend to NASM, if you like that syntax. As an interesting historic remark, on [news:comp.compilers] comp.compilers, 1999/07/11 19:36:51, the moderator wrote: "There's no reason that assemblers have to have awful syntax.  About 30 years ago I used Niklaus Wirth's PL360, which was basically a S/360 assembler with Algol syntax and a a little syntactic sugar like while loops that turned into the obvious branches.  It really was an assembler, e.g., you had to write out your expressions with explicit assignments of values to registers, but it was nice.  Wirth used it to write Algol W, a small fast Algol subset, which was a predecessor to Pascal.  As is so often the case, Algol W was a significant improvement over many of its successors. -John" ----------------------------------------------------------------------------- 3.5.10. Non-free and/or Non-32bit x86 assemblers You may find more about them, together with the basics of x86 assembly programming, in the Raymond Moon's x86 assembly FAQ. Note that all DOS-based assemblers should work inside the Linux DOS Emulator, as well as other similar emulators, so that if you already own one, you can still use it inside a real OS. Recent DOS-based assemblers also support COFF and/or other object file formats that are supported by the GNU BFD library, so that you can use them together with your free 32-bit tools, perhaps using GNU objcopy (part of the binutils) as a conversion filter. ----------------------------------------------------------------------------- Chapter 4. Metaprogramming Assembly programming is a bore, but for critical parts of programs. You should use the appropriate tool for the right task, so don't choose assembly when it does not fit; C, OCaml, perl, Scheme, might be a better choice in the most cases. However, there are cases when these tools do not give fine enough control on the machine, and assembly is useful or needed. In these cases you'll appreciate a system of macroprocessing and metaprogramming that allows recurring patterns to be factored each into one indefinitely reusable definition, which allows safer programming, automatic propagation of pattern modification, etc. Plain assembler often is not enough, even when one is doing only small routines to link with C. ----------------------------------------------------------------------------- 4.1. External filters Whatever is the macro support from your assembler, or whatever language you use (even C!), if the language is not expressive enough to you, you can have files passed through an external filter with a Makefile rule like that: %.s: %.S other_dependencies $(FILTER) $(FILTER_OPTIONS) < $< > $@ ----------------------------------------------------------------------------- 4.1.1. CPP CPP is truly not very expressive, but it's enough for easy things, it's standard, and called transparently by GCC. As an example of its limitations, you can't declare objects so that destructors are automatically called at the end of the declaring block; you don't have diversions or scoping, etc. CPP comes with any C compiler. However, considering how mediocre it is, stay away from it if by chance you can make it without C. ----------------------------------------------------------------------------- 4.1.2. M4 M4 gives you the full power of macroprocessing, with a Turing equivalent language, recursion, regular expressions, etc. You can do with it everything that CPP cannot. See [ftp://ftp.forth.org/pub/Forth/Compilers/native/unix/this4th.tar.gz] macro4th (this4th) or [ftp://ftp.tunes.org/pub/tunes/obsolete/dist/ tunes.0.0.0/tunes.0.0.0.25.src.zip] the Tunes 0.0.0.25 sources as examples of advanced macroprogramming using m4. However, its disfunctional quoting and unquoting semantics force you to use explicit continuation-passing tail-recursive macro style if you want to do advanced macro programming (which is remindful of TeX -- BTW, has anyone tried to use TeX as a macroprocessor for anything else than typesetting ?). This is NOT worse than CPP that does not allow quoting and recursion anyway. The right version of M4 to get is GNU m4 1.4 (or later if exists), which has the most features and the least bugs or limitations of all. m4 is designed to be slow for anything but the simplest uses, which might still be ok for most assembly programming (you are not writing million-lines assembly programs, are you?). ----------------------------------------------------------------------------- 4.1.3. Macroprocessing with your own filter You can write your own simple macro-expansion filter with the usual tools: perl, awk, sed, etc. It can be made rather quickly, and you control everything. But, of course, power in macroprocessing implies "the hard way". ----------------------------------------------------------------------------- 4.2. Metaprogramming Instead of using an external filter that expands macros, one way to do things is to write programs that write part or all of other programs. For instance, you could use a program outputting source code   * to generate sine/cosine/whatever lookup tables,   * to extract a source-form representation of a binary file,   * to compile your bitmaps into fast display routines,   * to extract documentation, initialization/finalization code, description tables, as well as normal code from the same source files,   * to have customized assembly code, generated from a perl/shell/scheme script that does arbitrary processing,   * to propagate data defined at one point only into several cross-referencing tables and code chunks.   * etc. Think about it! ----------------------------------------------------------------------------- 4.2.1. Backends from compilers Compilers like GCC, SML/NJ, Objective CAML, MIT-Scheme, CMUCL, etc, do have their own generic assembler backend, which you might choose to use, if you intend to generate code semi-automatically from the according languages, or from a language you hack: rather than write great assembly code, you may instead modify a compiler so that it dumps great assembly code! ----------------------------------------------------------------------------- 4.2.2. The New-Jersey Machine-Code Toolkit There is a project, using the programming language Icon (with an experimental ML version), to build a basis for producing assembly-manipulating code. See around [http://www.eecs.harvard.edu/~nr/toolkit/] http://www.eecs.harvard.edu /~nr/toolkit/ ----------------------------------------------------------------------------- 4.2.3. TUNES The [http://www.tunes.org] TUNES Project for a Free Reflective Computing System is developing its own assembler as an extension to the Scheme language, as part of its development process. It doesn't run at all yet, though help is welcome. The assembler manipulates abstract syntax trees, so it could equally serve as the basis for a assembly syntax translator, a disassembler, a common assembler/compiler back-end, etc. Also, the full power of a real language, Scheme, make it unchallenged as for macroprocessing/metaprogramming. ----------------------------------------------------------------------------- Chapter 5. Calling conventions 5.1. Linux 5.1.1. Linking to GCC This is the preferred way if you are developing mixed C-asm project. Check GCC docs and examples from Linux kernel .S files that go through gas (not those that go through as86). 32-bit arguments are pushed down stack in reverse syntactic order (hence accessed/popped in the right order), above the 32-bit near return address. %ebp, %esi, %edi, %ebx are callee-saved, other registers are caller-saved; %eax is to hold the result, or %edx:%eax for 64-bit results. FP stack: I'm not sure, but I think result is in st(0), whole stack caller-saved. The SVR4 i386 ABI specs at [http://www.caldera.com/developer/ devspecs/] http://www.caldera.com/developer/devspecs/ is a good reference point if you want more details. Note that GCC has options to modify the calling conventions by reserving registers, having arguments in registers, not assuming the FPU, etc. Check the i386 .info pages. Beware that you must then declare the cdecl or regparm(0) attribute for a function that will follow standard GCC calling conventions. See C Extensions::Extended Asm:: section from the GCC info pages. See also how Linux defines its asmlinkage macro... ----------------------------------------------------------------------------- 5.1.2. ELF vs a.out problems Some C compilers prepend an underscore before every symbol, while others do not. Particularly, Linux a.out GCC does such prepending, while Linux ELF GCC does not. If you need to cope with both behaviors at once, see how existing packages do. For instance, get an old Linux source tree, the Elk, qthreads, or OCaml... You can also override the implicit C->asm renaming by inserting statements like void foo asm("bar") (void); to be sure that the C function foo() will be called really bar in assembly. Note that the objcopy utility from the binutils package should allow you to transform your a.out objects into ELF objects, and perhaps the contrary too, in some cases. More generally, it will do lots of file format conversions. ----------------------------------------------------------------------------- 5.1.3. Direct Linux syscalls Often you will be told that using C library (libc) is the only way, and direct system calls are bad. This is true. To some extent. In general, you must know that libc is not sacred, and in most cases it only does some checks, then calls kernel, and then sets errno. You can easily do this in your program as well (if you need to), and your program will be dozen times smaller, and this will result in improved performance as well, just because you're not using shared libraries (static binaries are faster). Using or not using libc in assembly programming is more a question of taste/belief than something practical. Remember, Linux is aiming to be POSIX compliant, so does libc. This means that syntax of almost all libc "system calls" exactly matches syntax of real kernel system calls (and vice versa). Besides, GNU libc(glibc) becomes slower and slower from version to version, and eats more and more memory; and so, cases of using direct system calls become quite usual. But.. main drawback of throwing libc away is that possibly you will need to implement several libc specific functions (that are not just syscall wrappers) on your own (printf() and Co.).. and you are ready for that, aren't you? :) Here is summary of direct system calls pros and cons. Pros:   * the smallest possible size; squeezing the last byte out of the system   * the highest possible speed; squeezing cycles out of your favorite benchmark   * full control: you can adapt your program/library to your specific language or memory requirements or whatever   * no pollution by libc cruft   * no pollution by C calling conventions (if you're developing your own language or environment)   * static binaries make you independent from libc upgrades or crashes, or from dangling #! path to an interpreter (and are faster)   * just for the fun out of it (don't you get a kick out of assembly programming?) Cons:   * If any other program on your computer uses the libc, then duplicating the libc code will actually wastes memory, not saves it.   * Services redundantly implemented in many static binaries are a waste of memory. But you can make your libc replacement a shared library.   * Size is much better saved by having some kind of bytecode, wordcode, or structure interpreter than by writing everything in assembly. (the interpreter itself could be written either in C or assembly.) The best way to keep multiple binaries small is to not have multiple binaries, but instead to have an interpreter process files with #! prefix. This is how OCaml works when used in wordcode mode (as opposed to optimized native code mode), and it is compatible with using the libc. This is also how Tom Christiansen's [http://language.perl.com/ppt/] Perl PowerTools reimplementation of unix utilities works. Finally, one last way to keep things small, that doesn't depend on an external file with a hardcoded path, be it library or interpreter, is to have only one binary, and have multiply-named hard or soft links to it: the same binary will provide everything you need in an optimal space, with no redundancy of subroutines or useless binary headers; it will dispatch its specific behavior according to its argv[0]; in case it isn't called with a recognized name, it might default to a shell, and be possibly thus also usable as an interpreter!   * You cannot benefit from the many functionalities that libc provides besides mere linux syscalls: that is, functionality described in section 3 of the manual pages, as opposed to section 2, such as malloc, threads, locale, password, high-level network management, etc.   * Therefore, you might have to reimplement large parts of libc, from printf () to malloc() and gethostbyname. It's redundant with the libc effort, and can be quite boring sometimes. Note that some people have already reimplemented "light" replacements for parts of the libc -- check them out! (Redhat's minilibc, Rick Hohensee's [ftp://linux01.gwdg.de/pub/ cLIeNUX/interim/libsys.tgz] libsys, Felix von Leitner's [http:// www.fefe.de/dietlibc/] dietlibc, Christian Fowelin's [http:// www.fowelin.de/christian/computer/libASM/] libASM, [http:// linuxassembly.org/asmutils.html] asmutils project is working on pure assembly libc)   * Static libraries prevent you to benefit from libc upgrades as well as from libc add-ons such as the zlibc package, that does on-the-fly transparent decompression of gzip-compressed files.   * The few instructions added by the libc can be a ridiculously small speed overhead as compared to the cost of a system call. If speed is a concern, your main problem is in your usage of system calls, not in their wrapper's implementation.   * Using the standard assembly API for system calls is much slower than using the libc API when running in micro-kernel versions of Linux such as L4Linux, that have their own faster calling convention, and pay high convention-translation overhead when using the standard one (L4Linux comes with libc recompiled with their syscall API; of course, you could recompile your code with their API, too).   * See previous discussion for general speed optimization issue.   * If syscalls are too slow to you, you might want to hack the kernel sources (in C) instead of staying in userland. If you've pondered the above pros and cons, and still want to use direct syscalls, then here is some advice.   * You can easily define your system calling functions in a portable way in C (as opposed to unportable using assembly), by including asm/unistd.h, and using provided macros.   * Since you're trying to replace it, go get the sources for the libc, and grok them. (And if you think you can do better, then send feedback to the authors!)   * As an example of pure assembly code that does everything you want, examine Linux assembly resources. Basically, you issue an int 0x80, with the __NR_syscallname number (from asm/ unistd.h) in eax, and parameters (up to six) in ebx, ecx, edx, esi, edi, ebp respectively. Result is returned in eax, with a negative result being an error, whose opposite is what libc would put into errno. The user-stack is not touched, so you needn't have a valid one when doing a syscall. Note Passing sixth parameter in ebp appeared in Linux 2.4, previous Linux versions understand only 5 parameters in registers. [http://www.linuxdoc.org/LDP/lki/] Linux Kernel Internals, and especially [http://www.linuxdoc.org/LDP/lki/lki-2.html#ss2.11] How System Calls Are Implemented on i386 Architecture? chapter will give you more robust overview. As for the invocation arguments passed to a process upon startup, the general principle is that the stack originally contains the number of arguments argc, then the list of pointers that constitute *argv, then a null-terminated sequence of null-terminated variable=value strings for the environment. For more details, do examine Linux assembly resources, read the sources of C startup code from your libc (crt0.S or crt1.S), or those from the Linux kernel (exec.c and binfmt_*.c in linux/fs/). ----------------------------------------------------------------------------- 5.1.4. Hardware I/O under Linux If you want to perform direct port I/O under Linux, either it's something very simple that does not need OS arbitration, and you should see the IO-Port-Programming mini-HOWTO; or it needs a kernel device driver, and you should try to learn more about kernel hacking, device driver development, kernel modules, etc, for which there are other excellent HOWTOs and documents from the LDP. Particularly, if what you want is Graphics programming, then do join one of the [http://www.ggi-project.org/] GGI or [http://www.XFree86.org/] XFree86 projects. Some people have even done better, writing small and robust XFree86 drivers in an interpreted domain-specific language, [http://www.irisa.fr/compose/gal /] GAL, and achieving the efficiency of hand C-written drivers through partial evaluation (drivers not only not in asm, but not even in C!). The problem is that the partial evaluator they used to achieve efficiency is not free software. Any taker for a replacement? Anyway, in all these cases, you'll be better when using GCC inline assembly with the macros from linux/asm/*.h than writing full assembly source files. ----------------------------------------------------------------------------- 5.1.5. Accessing 16-bit drivers from Linux/i386 Such thing is theoretically possible (proof: see how [http://www.dosemu.org] DOSEMU can selectively grant hardware port access to programs), and I've heard rumors that someone somewhere did actually do it (in the PCI driver? Some VESA access stuff? ISA PnP? dunno). If you have some more precise information on that, you'll be most welcome. Anyway, good places to look for more information are the Linux kernel sources, DOSEMU sources (and other programs in the [ftp://tsx-11.mit.edu/pub/linux/ALPHA/dosemu/] DOSEMU repository), and sources for various low-level programs under Linux... (perhaps GGI if it supports VESA). Basically, you must either use 16-bit protected mode or vm86 mode. The first is simpler to setup, but only works with well-behaved code that won't do any kind of segment arithmetics or absolute segment addressing (particularly addressing segment 0), unless by chance it happens that all segments used can be setup in advance in the LDT. The later allows for more "compatibility" with vanilla 16-bit environments, but requires more complicated handling. In both cases, before you can jump to 16-bit code, you must   * mmap any absolute address used in the 16-bit code (such as ROM, video buffers, DMA targets, and memory-mapped I/O) from /dev/mem to your process' address space,   * setup the LDT and/or vm86 mode monitor.   * grab proper I/O permissions from the kernel (see the above section) Again, carefully read the source for the stuff contributed to the DOSEMU project, particularly these mini-emulators for running ELKS and/or simple .COM programs under Linux/i386. ----------------------------------------------------------------------------- 5.2. DOS and Windows Most DOS extenders come with some interface to DOS services. Read their docs about that, but often, they just simulate int 0x21 and such, so you do "as if" you are in real mode (I doubt they have more than stubs and extend things to work with 32-bit operands; they most likely will just reflect the interrupt into the real-mode or vm86 handler). Docs about DPMI (and much more) can be found on [ftp://x2ftp.oulu.fi/pub/ msdos/programming/] ftp://x2ftp.oulu.fi/pub/msdos/programming/ (again, the original x2ftp site is closing (no more?), so use a [ftp://ftp.lip6.fr/pub/pc /x2ftp/README.mirror_sites] mirror site). DJGPP comes with its own (limited) glibc derivative/subset/replacement, too. It is possible to cross-compile from Linux to DOS, see the devel/msdos/ directory of your local FTP mirror for metalab.unc.edu; Also see the MOSS DOS-extender from the [http://www.cs.utah.edu/projects/flux/] Flux project from the university of Utah. Other documents and FAQs are more DOS-centered; we do not recommend DOS development. Windows and Co. This document is not about Windows programming, you can find lots of documents about it everywhere.. The thing you should know is that [http://www.cygnus.com] Cygnus Solutions developed the [http:// sourceware.cygnus.com/cygwin/] cygwin32.dll library, for GNU programs to run on Win32 platform; thus, you can use GCC, GAS, all the GNU tools, and many other Unix applications. ----------------------------------------------------------------------------- 5.3. Your own OS Control is what attracts many OS developers to assembly, often is what leads to or stems from assembly hacking. Note that any system that allows self-development could be qualified an "OS", though it can run "on the top" of an underlying system (much like Linux over Mach or OpenGenera over Unix). Hence, for easier debugging purpose, you might like to develop your "OS" first as a process running on top of Linux (despite the slowness), then use the [http://www.cs.utah.edu/projects/flux/oskit/] Flux OS kit (which grants use of Linux and BSD drivers in your own OS) to make it stand-alone. When your OS is stable, it is time to write your own hardware drivers if you really love that. This HOWTO will not cover topics such as bootloader code, getting into 32-bit mode, handling Interrupts, the basics about Intel protected mode or V86/R86 braindeadness, defining your object format and calling conventions. The main place where to find reliable information about that all, is source code of existing OSes and bootloaders. Lots of pointers are on the following webpage: [http://www.tunes.org/Review/OSes.html] http://www.tunes.org/Review/ OSes.html ----------------------------------------------------------------------------- Chapter 6. Quick start 6.1. Introduction Finally, if you still want to try this crazy idea and write something in assembly (if you've reached this section -- you're real assembly fan), here's what you need to start. As you've read before, you can write for Linux in different ways; I'll show how to use direct kernel calls, since this is the fastest way to call kernel service; our code is not linked to any library, does not use ELF interpreter, it communicates with kernel directly. I will show the same sample program in two assemblers, nasm and gas, thus showing Intel and AT&T syntax. You may also want to read [http://linuxassembly.org/intro.html] Introduction to UNIX assembly programming tutorial, it contains sample code for other UNIX-like OSes. ----------------------------------------------------------------------------- 6.1.1. Tools you need First of all you need assembler (compiler) -- nasm or gas. Second, you need a linker -- ld, since assembler produces only object code. Almost all distributions have gas and ld, in the binutils package. As for nasm, you may have to download and install binary packages for Linux and docs from the nasm site; note that several distributions (Stampede, Debian, SuSe, Mandrake) already have nasm, check first. If you're going to dig in, you should also install include files for your OS, and if possible, kernel source. ----------------------------------------------------------------------------- 6.2. Hello, world! 6.2.1. Program layout Linux is 32-bit, runs in protected mode, has flat memory model, and uses the ELF format for binaries. A program can be divided into sections: .text for your code (read-only), .data for your data (read-write), .bss for uninitialized data (read-write); there can actually be a few other standard sections, as well as some user-defined sections, but there's rare need to use them and they are out of our interest here. A program must have at least .text section. Now we will write our first program. Here is sample code: ----------------------------------------------------------------------------- 6.2.2. NASM (hello.asm) section .data ;section declaration msg db "Hello, world!",0xa ;our dear string len equ $ - msg ;length of our dear string section .text ;section declaration ;we must export the entry point to the ELF linker or global _start ;loader. They conventionally recognize _start as their ;entry point. Use ld -e foo to override the default. _start: ;write our string to stdout mov edx,len ;third argument: message length mov ecx,msg ;second argument: pointer to message to write mov ebx,1 ;first argument: file handle (stdout) mov eax,4 ;system call number (sys_write) int 0x80 ;call kernel ;and exit mov ebx,0 ;first syscall argument: exit code mov eax,1 ;system call number (sys_exit) int 0x80 ;call kernel ----------------------------------------------------------------------------- 6.2.3. GAS (hello.S) .data # section declaration msg: .ascii "Hello, world!\n" # our dear string len = . - msg # length of our dear string .text # section declaration # we must export the entry point to the ELF linker or .global _start # loader. They conventionally recognize _start as their # entry point. Use ld -e foo to override the default. _start: # write our string to stdout movl $len,%edx # third argument: message length movl $msg,%ecx # second argument: pointer to message to write movl $1,%ebx # first argument: file handle (stdout) movl $4,%eax # system call number (sys_write) int $0x80 # call kernel # and exit movl $0,%ebx # first argument: exit code movl $1,%eax # system call number (sys_exit) int $0x80 # call kernel ----------------------------------------------------------------------------- 6.3. Building an executable 6.3.1. Producing object code First step of building an executable is compiling (or assembling) object file from the source: For nasm example: +---------------------------------------------------------------------------+ |$ nasm -f elf hello.asm | +---------------------------------------------------------------------------+ For gas example: +---------------------------------------------------------------------------+ |$ as -o hello.o hello.S | +---------------------------------------------------------------------------+ This makes hello.o object file. ----------------------------------------------------------------------------- 6.3.2. Producing executable Second step is producing executable file itself from the object file by invoking linker: +---------------------------------------------------------------------------+ |$ ld -s -o hello hello.o | +---------------------------------------------------------------------------+ This will finally build hello executable. Hey, try to run it... Works? That's it. Pretty simple. ----------------------------------------------------------------------------- Chapter 7. Resources 7.1. Pointers Your main resource for Linux/UNIX assembly programming material is: [http://linuxassembly.org/resources.html] http://linuxassembly.org/ resources.html Do visit it, and get plenty of pointers to assembly projects, tools, tutorials, documentation, guides, etc, concerning different UNIX operating systems and CPUs. Because it evolves quickly, I will no longer duplicate it here. If you are new to assembly in general, here are few starting pointers:   * [http://webster.cs.ucr.edu/Page_asm/ArtOfAsm.html] The Art Of Assembly   * [http://www2.dgsys.com/~raymoon/faq/] x86 assembly FAQ   * [ftp://ftp.luth.se/pub/msdos/] ftp.luth.se mirrors the hornet and x2ftp former archives of msdos assembly coding stuff   * [http://www.koth.org] CoreWars, a fun way to learn assembly in general   * Usenet: [news://comp.lang.asm.x86] comp.lang.asm.x86; [news:// alt.lang.asm] alt.lang.asm ----------------------------------------------------------------------------- 7.2. Mailing list If you're are interested in Linux/UNIX assembly programming (or have questions, or are just curious) I especially invite you to join Linux assembly programming mailing list. This is an open discussion of assembly programming under Linux, *BSD, BeOS, or any other UNIX/POSIX like OS; also it is not limited to x86 assembly (Alpha, Sparc, PPC and other hackers are welcome too!). Mailing list address is . To subscribe send a messgage to with the following line in the body of the message: subscribe linux-assembly Detailed information and list archives are available at [http:// linuxassembly.org/list.html] http://linuxassembly.org/list.html. ----------------------------------------------------------------------------- Chapter 8. Frequently Asked Questions Here are frequently asked questions (with answers) about Linux assembly programming. Some of the questions (and the answers) were taken from the the linux-assembly mailing list. 8.1. How do I do graphics programming in Linux? 8.2. How do I debug pure assembly code under Linux? 8.3. Any other useful debugging tools? 8.4. How do I access BIOS functions from Linux (BSD, BeOS, etc)? 8.5. Is it possible to write kernel modules in assembly? 8.6. How do I allocate memory dynamically? 8.7. I can't understand how to use select system call! 8.1. How do I do graphics programming in Linux? An answer from [mailto:paulf@icom.co.za] Paul Furber: +---------------------------------------------------------------------------+ |Ok you have a number of options to graphics in Linux. Which one you use | |depends on what you want to do. There isn't one Web site with all the | |information but here are some tips: | | | |SVGALib: This is a C library for console SVGA access. | |Pros: very easy to learn, good coding examples, not all that different | |from equivalent gfx libraries for DOS, all the effects you know from DOS | |can be converted with little difficulty. | |Cons: programs need superuser rights to run since they write directly to | |the hardware, doesn't work with all chipsets, can't run under X-Windows. | |Search for svgalib-1.4.x on http://ftp.is.co.za | | | |Framebuffer: do it yourself graphics at SVGA res | |Pros: fast, linear mapped video access, ASM can be used if you want :) | |Cons: has to be compiled into the kernel, chipset-specific issues, must | |switch out of X to run, relies on good knowledge of linux system calls | |and kernel, tough to debug | |Examples: asmutils (http://www.linuxassembly.org) and the leaves example | |and my own site for some framebuffer code and tips in asm | |(http://ma.verick.co.za/linux4k/) | | | |Xlib: the application and development libraries for XFree86. | |Pros: Complete control over your X application | |Cons: Difficult to learn, horrible to work with and requires quite a bit | |of knowledge as to how X works at the low level. | |Not recommended but if you're really masochistic go for it. All the | |include and lib files are probably installed already so you have what | |you need. | | | |Low-level APIs: include PTC, SDL, GGI and Clanlib | |Pros: very flexible, run under X or the console, generally abstract away | |the video hardware a little so you can draw to a linear surface, lots of | |good coding examples, can link to other APIs like OpenGL and sound libs, | |Windows DirectX versions for free | |Cons: Not as fast as doing it yourself, often in development so versions | |can (and do) change frequently. | |Examples: PTC and GGI have excellent demos, SDL is used in sdlQuake, | |Myth II, Civ CTP and Clanlib has been used for games as well. | | | |High-level APIs: OpenGL - any others? | |Pros: clean api, tons of functionality and examples, industry standard | |so you can learn from SGI demos for example | |Cons: hardware acceleration is normally a must, some quirks between | |versions and platforms | |Examples: loads - check out www.mesa3d.org under the links section. | | | |To get going try looking at the svgalib examples and also install SDL | |and get it working. After that, the sky's the limit. | +---------------------------------------------------------------------------+ 8.2. How do I debug pure assembly code under Linux? There's an early version of the [http://ellipse.mcs.drexel.edu/ald.html] Assembly Language Debugger, which is designed to work with assembly code, and is portable enough to run on Linux and *BSD. It is already functional and should be the right choice, check it out! You can also try gdb ;). Although it is source-level debugger, it can be used to debug pure assembly code, and with some trickery you can make gdb to do what you need (unfortunately, nasm '-g' switch does not generate proper debug info for gdb; this is nasm bug, I think). Here's an answer from [mailto: dl@gazeta.ru] Dmitry Bakhvalov: +---------------------------------------------------------------------------+ |Personally, I use gdb for debugging asmutils. Try this: | | | |1) Use the following stuff to compile: | | $ nasm -f elf -g smth.asm | | $ ld -o smth smth.o | | | |2) Fire up gdb: | | $ gdb smth | | | |3) In gdb: | | (gdb) disassemble _start | | Place a breakpoint at _start+1 (If placed at _start the breakpoint | | wouldnt work, dunno why) | | (gdb) b *0x8048075 | | | | To step thru the code I use the following macro: | | (gdb)define n | | >ni | | >printf "eax=%x ebx=%x ...etc...",$eax,$ebx,...etc... | | >disassemble $pc $pc+15 | | >end | | | | Then start the program with r command and debug with n. | | | | Hope this helps. | +---------------------------------------------------------------------------+ An additional note from ???: +---------------------------------------------------------------------------+ | I have such a macro in my .gdbinit for quite some time now, and it | | for sure makes life easier. A small difference : I use "x /8i $pc", | | which guarantee a fixed number of disassembled instructions. Then, | | with a well chosen size for my xterm, gdb output looks like it is | | refreshed, and not scrolling. | +---------------------------------------------------------------------------+ If you want to set breakpoints across your code, you can just use int 3 instruction as breakpoint (instead of entering address manually in gdb). If you're using gas, you should consult gas and gdb related [http:// linuxassembly.org/resources.html#tutorials] tutorials. 8.3. Any other useful debugging tools? Definitely strace can help a lot (ktrace and kdump on FreeBSD), it is used to trace system calls and signals. Read its manual page (man strace) and strace --help output for details. 8.4. How do I access BIOS functions from Linux (BSD, BeOS, etc)? Short answer is -- noway. This is protected mode, use OS services instead. Again, you can't use int 0x10, int 0x13, etc. Fortunately almost everything can be implemented by means of system calls or library functions. In the worst case you may go through direct port access, or make a kernel patch to implement needed functionality, or use LRMI library to access BIOS functions. 8.5. Is it possible to write kernel modules in assembly? Yes, indeed it is. While in general it is not a good idea (it hardly will speedup anything), there may be a need of such wizardy. The process of writing a module itself is not that hard -- a module must have some predefined global function, it may also need to call some external functions from the kernel. Examine kernel source code (that can be built as module) for details. Meanwhile, here's an example of a minimum dumb kernel module (module.asm) (source is based on example by mammon_ from APJ #8): section .text global init_module global cleanup_module global kernel_version extern printk init_module: push dword str1 call printk pop eax xor eax,eax ret cleanup_module: push dword str2 call printk pop eax ret str1 db "init_module done",0xa,0 str2 db "cleanup_module done",0xa,0 kernel_version db "2.2.18",0 The only thing this example does is reporting its actions. Modify kernel_version to match yours, and build module with: +---------------------------------------------------------------------------+ |$ nasm -f elf -o module.m module.asm | +---------------------------------------------------------------------------+ +---------------------------------------------------------------------------+ |$ ld -r -o module.o module.m | +---------------------------------------------------------------------------+ Now you can play with it using insmod/rmmod/lsmod (root privilidged are required); a lot of fun, huh? 8.6. How do I allocate memory dynamically? A laconic answer from [mailto:phpr@snafu.de] H-Peter Recktenwald: ebx := 0 (in fact, any value below .bss seems to do) sys_brk eax := current top (of .bss section) ebx := [ current top < ebx < (esp - 16K) ] sys_brk eax := new top of .bss An extensive answer from [mailto:ee97034@fe.up.pt] Tiago Gasiba: section .bss var1 resb 1 section .text ; ;allocate memory ; %define LIMIT 0x4000000 ; about 100Megs mov ebx,0 ; get bottom of data segment call sys_brk cmp eax,-1 ; ok? je erro1 add eax,LIMIT ; allocate +LIMIT memory mov ebx,eax call sys_brk cmp eax,-1 ; ok? je erro1 cmp eax,var1+1 ; has the data segment grown? je erro1 ; ;use allocated memory ; ; now eax contains bottom of ; data segment mov ebx,eax ; save bottom mov eax,var1 ; eax=beginning of data segment repeat: mov word [eax],1 ; fill up with 1's inc eax cmp ebx,eax ; current pos = bottom? jne repeat ; ;free memory ; mov ebx,var1 ; deallocate memory call sys_brk ; by forcing its beginning=var1 cmp eax,-1 ; ok? je erro2 8.7. I can't understand how to use select system call! An answer from [mailto:mochel@transmeta.com] Patrick Mochel: When you call sys_open, you get back a file descriptor, which is simply an index into a table of all the open file descriptors that your process has. stdin, stdout, and stderr are always 0, 1, and 2, respectively, because that is the order in which they are always open for your process from there. Also, notice that the first file descriptor that you open yourself (w/o first closing any of those magic three descriptors) is always 3, and they increment from there. Understanding the index scheme will explain what select does. When you call select, you are saying that you are waiting certain file descriptors to read from, certain ones to write from, and certain ones to watch from exceptions from. Your process can have up to 1024 file descriptors open, so an fd_set is just a bit mask describing which file descriptors are valid for each operation. Make sense? Since each fd that you have open is just an index, and it only needs to be on or off for each fd_set, you need only 1024 bits for an fd_set structure. 1024 / 32 = 32 longs needed to represent the structure. Now, for the loose example. Suppose you want to read from a file descriptor (w/o timeout). - Allocate the equivalent to an fd_set. .data my_fds: times 32 dd 0 - open the file descriptor that you want to read from. - set that bit in the fd_set structure. First, you need to figure out which of the 32 dwords the bit is in. Then, use bts to set the bit in that dword. bts will do a modulo 32 when setting the bit. That's why you need to first figure out which dword to start with. mov edx, 0 mov ebx, 32 div ebx lea ebx, my_fds bts ebx[eax * 4], edx - repeat the last step for any file descriptors you want to read from. - repeat the entire exercise for either of the other two fd_sets if you want action from them. That leaves two other parts of the equation - the n paramter and the timeout parameter. I'll leave the timeout parameter as an exercise for the reader (yes, I'm lazy), but I'll briefly talk about the n parameter. It is the value of the largest file descriptor you are selecting from (from any of the fd_sets), plus one. Why plus one? Well, because it's easy to determine a mask from that value. Suppose that there is data available on x file descriptors, but the highest one you care about is (n - 1). Since an fd_set is just a bitmask, the kernel needs some efficient way for determining whether to return or not from select. So, it masks off the bits that you care about, checks if anything is available from the bits that are still set, and returns if there is (pause as I rummage through kernel source). Well, it's not as easy as I fantasized it would be. To see how the kernel determines that mask, look in fs/select.c in the kernel source tree. Anyway, you need to know that number, and the easiest way to do it is to save the value of the last file descriptor open somewhere so you don't lose it. Ok, that's what I know. A warning about the code above (as always) is that it is not tested. I think it should work, but if it doesn't let me know. But, if it starts a global nuclear meltdown, don't call me. ;-) That's all for now, folks. ----------------------------------------------------------------------------- Appendix A. History Each version includes a few fixes and minor corrections, that need not to be repeatedly mentioned every time. Revision History Revision 0.6f 17 Aug 2002 Revised by: konst Added FASM, added URL to Korean translation, added URL to SVR4 i386 ABI specs, update on HLA/Linux, small fix in hello.S example, misc URL updates; Revision 0.6e 12 Jan 2002 Revised by: konst Added URL describing GAS Intel syntax; Added OSIMPA(former SHASM); Added YASM; FAQ update. Revision 0.6d 18 Mar 2001 Revised by: konst Added Free Pascal; new NASM URL again Revision 0.6c 15 Feb 2001 Revised by: konst Added SHASM; new answer in FAQ, new NASM URL, new mailing list address Revision 0.6b 21 Jan 2001 Revised by: konst new questions in FAQ, corrected few URLs Revision 0.6a 10 Dec 2000 Revised by: konst Remade section on AS86 (thanks to Holluby Istvan for pointing out obsolete information). Fixed several URLs that can be incorrectly rendered from sgml to html. Revision 0.6 11 Nov 2000 Revised by: konst HOWTO is completely rewritten using DocBook DTD. Layout is totally rearranged; too much changes to list them here. Revision 0.5n 07 Nov 2000 Revised by: konst Added question regarding kernel modules to FAQ, fixed NASM URLs, GAS has Intel syntax too Revision 0.5m 22 Oct 2000 Revised by: konst Linux 2.4 system calls can have 6 args, Added ALD note to FAQ, fixed mailing list subscribe address Revision 0.5l 23 Aug 2000 Revised by: konst Added TDASM, updates on NASM Revision 0.5k 11 Jul 2000 Revised by: konst Few additions to FAQ Revision 0.5j 14 Jun 2000 Revised by: konst Complete rearrangement of Introduction and Resources sections. FAQ added to Resources, misc cleanups and additions. Revision 0.5i 04 May 2000 Revised by: konst Added HLA, TALC; rearrangements in Resources, Quick Start Assemblers sections. Few new pointers. Revision 0.5h 09 Apr 2000 Revised by: konst finally managed to state LDP license on document, new resources added, misc fixes Revision 0.5g 26 Mar 2000 Revised by: konst new resources on different CPUs Revision 0.5f 02 Mar 2000 Revised by: konst new resources, misc corrections Revision 0.5e 10 Feb 2000 Revised by: konst URL updates, changes in GAS example Revision 0.5d 01 Feb 2000 Revised by: konst Resources (former "Pointers") section completely redone, various URL updates. Revision 0.5c 05 Dec 1999 Revised by: konst New pointers, updates and some rearrangements. Rewrite of sgml source. Revision 0.5b 19 Sep 1999 Revised by: konst Discussion about libc or not libc continues. New web pointers and and overall updates. Revision 0.5a 01 Aug 1999 Revised by: konst Quick Start section rearranged, added GAS example. Several new web pointers. Revision 0.5 01 Aug 1999 Revised by: konstfare GAS has 16-bit mode. New maintainer (at last): Konstantin Boldyshev. Discussion about libc or not libc. Added Quick Start section with examples of assembly code. Revision 0.4q 22 Jun 1999 Revised by: fare process argument passing (argc, argv, environ) in assembly. This is yet another "last release by Fare before new maintainer takes over". Nobody knows who might be the new maintainer. Revision 0.4p 06 Jun 1999 Revised by: fare clean up and updates Revision 0.4o 01 Dec 1998 Revised by: fare Revision 0.4m 23 Mar 1998 Revised by: fare corrections about gcc invocation Revision 0.4l 16 Nov 1997 Revised by: fare release for LSL 6th edition Revision 0.4k 19 Oct 1997 Revised by: fare Revision 0.4j 07 Sep 1997 Revised by: fare Revision 0.4i 17 Jul 1997 Revised by: fare info on 16-bit mode access from Linux Revision 0.4h 19 Jun 1997 Revised by: fare still more on "how not to use assembly"; updates on NASM, GAS. Revision 0.4g 30 Mar 1997 Revised by: fare Revision 0.4f 20 Mar 1997 Revised by: fare Revision 0.4e 13 Mar 1997 Revised by: fare Release for DrLinux Revision 0.4d 28 Feb 1997 Revised by: fare Vapor announce of a new Assembly-HOWTO maintainer Revision 0.4c 09 Feb 1997 Revised by: fare Added section Do you need assembly?. Revision 0.4b 03 Feb 1997 Revised by: fare NASM moved: now is before AS86 Revision 0.4a 20 Jan 1997 Revised by: fare CREDITS section added Revision 0.4 20 Jan 1997 Revised by: fare first release of the HOWTO as such Revision 0.4pre1 13 Jan 1997 Revised by: fare text mini-HOWTO transformed into a full linuxdoc-sgml HOWTO, to see what the SGML tools are like Revision 0.3l 11 Jan 1997 Revised by: fare Revision 0.3k 19 Dec 1996 Revised by: fare What? I had forgotten to point to terse??? Revision 0.3j 24 Nov 1996 Revised by: fare point to French translated version Revision 0.3i 16 Nov 1996 Revised by: fare NASM is getting pretty slick Revision 0.3h 06 Nov 1996 Revised by: fare more about cross-compiling -- See on sunsite: devel/msdos/ Revision 0.3g 02 Nov 1996 Revised by: fare Created the History. Added pointers in cross-compiling section. Added section about I/O programming under Linux (particularly video). Revision 0.3f 17 Oct 1996 Revised by: fare Revision 0.3c 15 Jun 1996 Revised by: fare Revision 0.2 04 May 1996 Revised by: fare Revision 0.1 23 Apr 1996 Revised by: fare Francois-Rene "Fare" Rideau creates and publishes the first mini-HOWTO, because "I'm sick of answering ever the same questions on comp.lang.asm.x86" ----------------------------------------------------------------------------- Appendix B. Acknowledgements I would like to thank all the people who have contributed ideas, answers, remarks, and moral support, and additionally the following persons, by order of appearance:   * [mailto:buried.alive@in.mail] Linus Torvalds for Linux   * [mailto:bde@zeta.org.au] Bruce Evans for bcc from which as86 is extracted   * [mailto:anakin@pobox.com] Simon Tatham and [mailto:jules@earthcorp.com] Julian Hall for NASM   * [mailto:gregh@metalab.unc.edu] Greg Hankins and now [mailto: linux-howto@metalab.unc.edu] Tim Bynum for maintaining HOWTOs   * [mailto:raymoon@moonware.dgsys.com] Raymond Moon for his FAQ   * [mailto:dumas@linux.eu.org] Eric Dumas for his translation of the mini-HOWTO into French (sad thing for the original author to be French and write in English)   * [mailto:paul@geeky1.ebtech.net] Paul Anderson and [mailto: rahim@megsinet.net] Rahim Azizarab for helping me, if not for taking over the HOWTO   * [mailto:pcg@goof.com] Marc Lehman for his insight on GCC invocation   * [mailto:ams@wiw.org] Abhijit Menon-Sen for helping me figure out the argument passing convention ----------------------------------------------------------------------------- Appendix C. Endorsements This version of the document is endorsed by Konstantin Boldyshev. Modifications (including translations) must remove this appendix according to the license agreement. $Id: Assembly-HOWTO.sgml,v 1.7 2002/08/17 08:35:59 konst Exp $ ----------------------------------------------------------------------------- Appendix D. GNU Free Documentation License GNU Free Documentation License Version 1.1, March 2000     Copyright (C) 2000  Free Software Foundation, Inc.     59 Temple Place, Suite 330, Boston, MA  02111-1307  USA     Everyone is permitted to copy and distribute verbatim copies     of this license document, but changing it is not allowed. 0. PREAMBLE The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software. We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference. 1. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language. A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque". Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only. The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text. 2. VERBATIM COPYING You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies. 3. COPYING IN QUANTITY If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects. If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document. 4. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission. B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has less than five). C. State on the Title page the name of the publisher of the Modified Version, as the publisher. D. Preserve all the copyright notices of the Document. E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below. G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice. H. Include an unaltered copy of this License. I. Preserve the section entitled "History", and its title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission. K. In any section entitled "Acknowledgements" or "Dedications", preserve the section's title, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. M. Delete any section entitled "Endorsements". Such a section may not be included in the Modified Version. N. Do not retitle any existing section as "Endorsements" or to conflict in title with any Invariant Section. If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles. You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. 5. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice. The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Acknowledgements", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements." 6. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 7. AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate. 8. TRANSLATION Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail. 9. TERMINATION You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 10. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See [http://www.gnu.org/copyleft/] http://www.gnu.org/copyleft/. Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. How to use this License for your documents To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:       Copyright (c)  YEAR  YOUR NAME.       Permission is granted to copy, distribute and/ or modify this document       under the terms of the GNU Free Documentation License, Version 1.1       or any later version published by the Free Software Foundation;       with the Invariant Sections being LIST THEIR TITLES, with the       Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.       A copy of the license is included in the section entitled "GNU       Free Documentation License". If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise for Back-Cover Texts. If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software. Linux Astronomy HOWTO Elwood Downey John Huggins $Revision: 1.48 $ Copyright © 2000-2004 Elwood Downey and John Huggins $Date: 2004/02/07 17:58:56 $ This document shares tips and resources to utilize Linux solutions in the pursuit of Astronomy. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. Knowledge Required 1.2. Scope 1.3. Disclaimer 1.4. Version 1.5. Copyright 1.6. Contributions 1.7. Translations 1.8. About the authors 2. Software 2.1. Collections 2.2. Planetarium Programs 2.3. Portable and Handheld Appications 2.4. Simulators 2.5. Image Processing 2.6. Sun and Moon 2.7. Libraries 2.8. Games 2.9. Other 3. Online Tools 3.1. Traditional Form Based Programs 3.2. Java Applets 4. Astronomical Images over the web 4.1. List 5. Organizations 6. Hardware Control 6.1. Telescope Control 6.2. CCD Camera Control 7. Installation Help 8. Projects using Linux 9. Revision History 1. Introduction 1.1. Knowledge Required With all the help from major Linux distributions such as SuSE, Redhat and many others, Linux based systems are becoming easier to use. However, there is still some need of understanding of basic UNIX skills to make the most of Linux. Thus, this HOWTO will assume that the reader has at least a basic knowledge of using a UNIX system including the ability to compile and install programs. A few resources we have found useful over the years include:   * "A Practical Guide to the UNIX System", Mark G. Sobel   * "Advanced Programming in the UNIX Environment", the late W. Richard Stevens   * "Running LINUX", Matt Welsh et al.   * "LINUX Device Drivers", Alessandro Rubini Similarly, this is not a tutorial or reference for astronomy principles or astronomical instrumentation. Astronomy is perhaps the grandest of all sciences, employing widely disparate disciplines in a bold attempt to understand nothing less than the universe itself. Your interests will lead in many directions. A few references we have used include:   * "Explanatory Supplement to the Astronomical Almanac", P. Kenneth Seidelmann   * "Astronomy with your Personal Computer", Peter Duffett-Smith   * "Astronomy on the Personal Computer", Oliver Montenbruck et al   * "Textbook on Spherical Astronomy", W. M. Smart   * "The Astronomy and Astrophysics Encyclopedia", Stephen P. Maran, ed. ----------------------------------------------------------------------------- 1.2. Scope The authors define the scope of this HOWTO as primarily an index to Linux tools applicable in some fashion to the pursuit of Astronomy. It is NOT our intention to list WWW astronomy references in general. Our own interests tend more towards the technology than the pure science and so we welcome contributions from others who have found Linux tools which contribute in other ways to Astronomy. Please contact us at the address above. ----------------------------------------------------------------------------- 1.3. Disclaimer No liability for the contents of this documents can be accepted. Use the concepts, examples and other content at your own risk. As this is a new edition of this document, there may be errors and inaccuracies, that may of course be damaging to your system. Proceed with caution, and although this is highly unlikely, the author(s) do not take any responsibility for that. All copyrights are held by their by their respective owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark. Naming of particular products or brands should not be seen as endorsements. You are strongly recommended to take a backup of your system before major installation and backups at regular intervals. ----------------------------------------------------------------------------- 1.4. Version $Revision: 1.48 $ $Date: 2004/02/07 17:58:56 $ The latest version of this document is always available on the [http:// astronomy.net/] Astronomy Net at [http://howto.astronomy.net/howto/] Astronomy HOWTO. We eagerly accept suggestions from you. Send them to [mailto: howto@astronomy.net] Astronomy HOWTO Editors. ----------------------------------------------------------------------------- 1.5. Copyright Copyright 2000-2003 by Elwood Downey and John Huggins. This document may be distributed only subject to the terms and conditions set forth in the LDP License except that this document must not be distributed in modified form without the author's consent. A verbatim copy may be reproduced or distributed in any medium physical or electronic without permission of the author. Translations are similarly permitted without express permission if it includes a notice on who translated it. Commercial redistribution is allowed and encouraged; however please notify authors of any such distributions. Excerpts from the document may be used without prior consent provided that the derivative work contains the verbatim copy or a pointer to a verbatim copy. Permission is granted to make and distribute verbatim copies of this document provided the copyright notice, the list of authors and this permission notice are preserved on all copies. In short, we wish to promote dissemination of this information through as many channels as possible. However, we wish to retain copyright on this HOWTO document, and would like to be notified of any plans to redistribute this HOWTO. For information about translations of this document, please see below. ----------------------------------------------------------------------------- 1.6. Contributions As we pursue the goals of the Astronomy HOWTO, we will recogize the contributions of folks who provide us with data here.   * Progga - Helped us get this document into modern times by converting the older linuxdoc to docbook. ----------------------------------------------------------------------------- 1.7. Translations Since Astronomy is very much an international effort, we encourage translation of this HOWTO into any language. We only ask the following:   * If you are a translator, please contact us at the above address so we may give proper credit here. This way, readers will immediately see what translations are available and see where to get them.   * Please obtain the latest copy of the Astronomy HOWTO from its home at [http://howto.astronomy.net/] Astronomy Net before you begin your translation effort. We thank the following for their translation efforts:   * [http://w3studi.informatik.uni-stuttgart.de/~moltenml/downloads.html] German Translation courtesy of Michael Moltenbrey   * [http://www.linux.or.jp/JF/JFdocs/Astronomy-HOWTO.html] Japanese Translation courtesy of Shouhei Nagaoka ----------------------------------------------------------------------------- 1.8. About the authors Elwood Downey has over two decades experience in software engineering for various astronomy projects. Learn more about Elwood at [http:// www.clearskyinstitute.com/resumes/ecdowney/resume.html] Clear Sky Institute. John Huggins has over fifeteen years years experience in hardware engineering including eight years associated with an astronomy project. Learn more at [http://www.johnhuggins.com/resume/] John's Site. ----------------------------------------------------------------------------- 2. Software Software Section ----------------------------------------------------------------------------- 2.1. Collections Here are some links to collections and other indexes of Linux astronomy software.   * [http://www.randomfactory.com/lfa.html] The Linux for Astronomy CDROM   * [http://SAL.KachinaTech.COM/Z/4/index.shtml] Scientific Applications on Linux (SAL), Physics and Astronomy   * [http://home.xnet.com/~blatura/linapp3.html#science] Linux Applications and Utilities Page, Science and Math   * [http://bima.astro.umd.edu/nemo/linuxastro/astromake/] AstroMake is a utility intended to make installations of some common astronomical packages (in binary form) easy.   * The linuxastro mailing list also contains a list of applications and packages. For more information, see [http://bima.astro.umd.edu/nemo/ linuxastro] linuxastro.   * [http://sourceforge.net/softwaremap/trove_list.php?form_cat=134] Astronomy at sourceforge.net If the above does not appeal to your needs, these links may help:   * [http://www.google.com/search?q=Astronomy+Software+Linux] Linux Astronomy Software from the Google Search Engine   * [http://dir.yahoo.com/Science/Astronomy/Software/] Astronomy Software from the Yahoo Listings ----------------------------------------------------------------------------- 2.2. Planetarium Programs Here is discussion of programs which run on Linux for use in finding objects, natural and man-made, in the sky.   * [http://clearskyinstitute.com/xephem/] XEphem has been the pet project of one of us (Downey) for the past 15-odd years. It has grown to become one of the more capable interactive tools for the computation of astronomical ephemerides.   * [http://www.astrotrf.net:8080/xsky_blurb.html] XSky is by Terry R. Friedrichsen, terry@venus.sunquest.com. XSky is essentially an interactive sky atlas.   * [http://edu.kde.org/kstars/] KStars is a Desktop Planetarium for KDE.   * [http://tdc-www.harvard.edu/software/skymap.html] Skymap is an astronomical mapping program written in Fortran and C for unix workstations by Doug Mink of the Smithsonian Astrophysical Observatory Telescope Data Center.   * [http://www.astroarts.com/products/xplns/] Xplns reproduces real starry sky on your display of X Window System.   * [http://www.lsw.uni-heidelberg.de/~rwichman/Nightfall.html] Nightfall is an astronomy application for fun, education, and science. It can produce animated views of eclipsing binary stars, calculate synthetic lightcurves and radial velocity curves, and eventually determine the best-fit model for a given set of observational data of an eclipsing binary star system.   * [http://nova.sourceforge.net] NOVA free Integrated Observational Environment for astronomers. ----------------------------------------------------------------------------- 2.3. Portable and Handheld Appications The advance of palm computers has taken hold. Linux has made its way to this realm.   * Clear Sky Institute brings us the [http://www.clearskyinstitute.com/psc/] Personal Sky Chart for the Sharp Zaurus PDA. ----------------------------------------------------------------------------- 2.4. Simulators Programs that classify themselves as simulators.   * [http://www.shatters.net/celestia/] Celestia Real-time visual simulation of space for Windows and Unix(Linux)   * [http://openuniverse.sourceforge.net/] OpenUniverse Simulates the Solar System bodies in 3D in Windows and Linux ----------------------------------------------------------------------------- 2.5. Image Processing   * Astronomical Information Processing System (AIPS) is the heavy iron used by professional astronomers. [http://aips2.nrao.edu/docs/aips++.html] AIPS++ is the place to find out more, but note that [http:// www.aoc.nrao.edu/aips/] AIPS Classic also exists and is actively maintained.   * Good ol' [http://www.gimp.org/] GNU Image Manipulation Program (GIMP) is a fine program to use for processing of digital images of all kinds and can prove useful for astro images as well. ----------------------------------------------------------------------------- 2.6. Sun and Moon A surprising number of applications deal with just the Sun and Moon.   * [http://nis-www.lanl.gov/~mgh/WindowMaker/DockApps.shtml] wmMoonClock shows lunar ephemeris to fairly high accuracy and is listed at this web site along with several other interesting programs.   * [http://www.paganlink.org/downloads/astronomy/xvmoontool.html] XVMoontool is an XView application which displays information about the Moon in real time.   * [http://www.flaterco.com/xtide/] XTide is a Harmonic tide clock and tide predictor. ----------------------------------------------------------------------------- 2.7. Libraries This section discusses bits and pieces of software that can be used to form the basis for specialized projects.   * [http://rlspc5.bnsc.rl.ac.uk/star/docs/sun67.htx/sun67.html#xref_] SLALIB, part of the [http://star-www.rl.ac.uk] Starlink Project, is a complete library of subroutines for astrometric computations.   * [http://ascl.net] Astrophysics Source Code Library is a collection of links to numerical astrophysical process models.   * [http://people.ne.mediaone.net/moshier/index.html] Astronomy and numerical software source codes is a collection of C codes related to astronomy.   * [http://hem.passagen.se/pausch/comp/ppcomp.html] How to compute planetary positions.   * [http://dimensional.com/~ashe/ccd-astro.html] CCD Astronomy on Linux. A library of routines that help control SBIG cameras. ----------------------------------------------------------------------------- 2.8. Games Yes, games.   * [http://www.head-crash.com/orbit/] Orbit - Be a space fighter pilot in Windows or Linux. ----------------------------------------------------------------------------- 2.9. Other Every list needs a miscellaneous section, and this is it for Software.   * [http://iraf.noao.edu] IRAF is a gigantic but exceptionally capable astronomical analysis system, shepherded over the past 20-odd years by Doug Tody formally at NOAO. It has accumulated innumerable authoritative contributions from leading astronomers in all areas of astronomical data analysis. If you have a serious interest in astronomical data reduction and significant time to invest, this system will reward you mightily.   * [http://www.lsw.uni-heidelberg.de/~rwichman/Nightfall.html] Nightfall Eclipsing Binary Star Program   * [http://xplanet.sourceforge.net] Xplanet Very realistic rendering program for Earth and other planets and moons. Uses X Windows and OpenGL.   * [http://www.princeton.edu/~kmccarty/starplot.html] StarPlot A 3-Dimensional Star Chart Viewer for Linux. Uses C++ and Gtk+. ----------------------------------------------------------------------------- 3. Online Tools I know we said we would not start listing Web sites, but here are a few links to sites which offer fully operational tools running online that we feel are especially useful or interesting, from a browser on any platform. ----------------------------------------------------------------------------- 3.1. Traditional Form Based Programs   * [http://aa.usno.navy.mil/data/docs/RS_OneYear.html] Sun and Moon Rise and Set calculator   * [http://aa.usno.navy.mil/data/docs/WebMICA_2.html] Web version of MICA   * [http://ssd.jpl.nasa.gov/cgi-bin/eph] JPL Ephemeris Generator   * [http://space.jpl.nasa.gov] Solar System Simulator   * [http://www.cleardarksky.com/csk/] Clear Sky Clock will show at a glance when we might expect clear and dark skies for one particular observing site.   * The [http://simbad.harvard.edu/cgi-bin/WSimbad.pl] Simbad astronomical database provides basic data, cross-identifications and bibliography for astronomical objects outside the solar system. ----------------------------------------------------------------------------- 3.2. Java Applets   * [http://www.sweethome.de/giesen/GeoAstro/GeoAstro.html] GeoAstro Applet Collection by Juergen Giesen   * [http://aladin.u-strasbg.fr] Aladin Interactive Sky Atlas   * [http://www.astro.queensu.ca/~dursi/dm-tutorial/cluster-sim.html] Cluster simulator   * [http://www.phys.vt.edu/~jhs/SIP] Sky Image Processor   * [http://liftoff.msfc.nasa.gov/RealTime/JTrack/3d/JTrack3D.html] J-Track 3D - Satellite Tracking ----------------------------------------------------------------------------- 4. Astronomical Images over the web Much effort exists to allow access to Astronomical image file type such as FITS from any web browser. Here are some pointers. ----------------------------------------------------------------------------- 4.1. List The folks at harvard have a list of Image Servers and Image Browsers.   * [http://tdc-www.harvard.edu/astro.image.html] Astronomical Images Over the Web ----------------------------------------------------------------------------- 5. Organizations   * The yearly [http://hea-www.harvard.edu/adass] Astronomical Data Analysis Software and Systems, ADAAS, Conference Series provides a forum for scientists and computer specialists concerned with algorithms, software and operating systems in the acquisition, reduction and analysis of astronomical data. The program includes invited talks, contributed papers and poster sessions as well as user group meetings and special interest meetings ("BOFs''). All these activities aim to encourage communication between software specialists and users, and also to stimulate further development of astronomical software and systems.   * The linuxastro mailing list, linuxastro@majordomo.cv.nrao.edu, is for people who are interested in porting astronomical software to linux. For more information, see [http://bima.astro.umd.edu/nemo/linuxastro] linuxastro. ----------------------------------------------------------------------------- 6. Hardware Control More folks are using Linux to control equipment. Users range from amateur astronomers in the field to professional observatories. ----------------------------------------------------------------------------- 6.1. Telescope Control   * [http://ktelescope.sourceforge.net/] KTelescope is a robust Client/Server control library for Meade's LX200 based telescopes. Is uses the Instrument Neutral Distributed Interface (INDI) protocol.   * [http://sourceforge.net/projects/observatory] Talon, formerly [http:// www.clearskyinstitute.com/Company/History.html] OCAAS, is a complete observatory control and astronomical analysis system for Linux.   * [http://clearskyinstitute.com/xephem/] XEphem has the capability to communicate with a telescope control daemon process. ----------------------------------------------------------------------------- 6.2. CCD Camera Control   * [http://www.apogee-ccd.com/software.html] Apogee Instruments Inc supports their line of professional CCD cameras under Linux.   * [http://www.fli-cam.com/] Finger Lates Instrumentation Manufacturer of CCD cameras and filter wheels and include drivers for Linux.   * [http://www.sbig.com/sbwhtmls/linux_announcement.htm] SBIG offers some assistance with operating their ST7 and ST8 CCD cameras under Linux.   * [http://dimensional.com/~ashe/ccd-astro.html] CCD Astronomy on Linux These pages describe a number of facets of using astronomical CCD cameras for image acquisition and processing under Linux.   * [http://home.earthlink.net/~dschmenk] Gccd is a gnome-based CCD camera and filter wheel control program. ----------------------------------------------------------------------------- 7. Installation Help You need to know what you're doing with Linux and installing programs, but help is available for some programs. Here are some ways to make life easier.   * [http://bima.astro.umd.edu/nemo/linuxastro/astromake/] AstroMake is a utility intended to make installations of some common astronomical packages (in binary form) easy.   * XEphem requires several elements to exist on your machine. Life is much simpler with the CDROM version of the program as it contains an installation script which loads the appropriate precompiled binary for most systems and places all auxiliary files to the correct spots. See [http://www.clearskyinstitute.com/ecommerce/xephem/order.html] XEphem CDROM ----------------------------------------------------------------------------- 8. Projects using Linux Here is a list of astronomy projects using Linux in whole or in part of their instrumentation:   * [http://www.chara.gsu.edu/CHARA/index.html] The CHARA Array is an optical interferometer project using Linux in their control system.   * [http://www.eso.org/projects/caos] CAOS Club of Amateurs in Optical Spectroscopy. ----------------------------------------------------------------------------- 9. Revision History In an effort to record a history of the evolution of this document, we maintain it within a CVS repository. What follows is the steps to today's document. +-----------------------------------------------------------------------------------------------------------------------------------------+ |$Log: Astronomy-HOWTO.sgml,v $ | |Revision 1.48 2004/02/07 17:58:56 jhuggins | |Another ulink fix removing expired attribute 'name.' | | | |Revision 1.47 2004/02/07 17:55:24 jhuggins | |Fixed a ulink issue with the Talon link. | | | |Revision 1.46 2004/02/05 01:22:32 ecdowney | | | | | |Add entry for Talon now that it is GPL | | | |Revision 1.45 2004/01/20 13:51:17 jhuggins | |Changed URL of German translation and changed the copyright date. | | | |Revision 1.44 2003/04/21 11:44:17 jhuggins | |Adjusted the name of a contributor. | | | |Revision 1.43 2003/04/21 01:58:59 jhuggins | |Wholesale changes including several new links, several new sections and a | |few corrections to previous information. | | | |Revision 1.42 2003/04/20 19:26:09 jhuggins | |Testing CVS keywords in docbook tags. | |Revision has no : at the end. | | | |Revision 1.41 2003/04/20 15:45:12 jhuggins | |Placed the CVS Log keyword within the screen parameter to avoid troubles. | | | |Revision 1.40 2003/04/20 15:42:13 jhuggins | |Added a revision history to the tail end of the document to avoid it cluttering the top. | | | |revision 1.39 | |date: 2003/04/20 03:58:00; author: jhuggins; state: Exp; lines: +540 -264 | |First conversion to Docbook. No content was changed, only the tags. | | | |revision 1.38 | |date: 2001/08/27 20:45:52; author: astro; state: Exp; lines: +4 -4 | |Added Michael Moltenbrey's german translation to the list of translators. | | | |revision 1.37 | |date: 2001/08/19 18:24:25; author: ecdowney; state: Exp; lines: +8 -3 | |Add gccd | | | |revision 1.36 | |date: 2001/07/25 19:58:54; author: astro; state: Exp; lines: +26 -7 | |Added Translator information and fixed a few text format lines. | | | |revision 1.35 | |date: 2001/06/18 18:48:42; author: astro; state: Exp; lines: +5 -3 | |Fixed a few more sgml bugs. | | | |revision 1.34 | |date: 2001/06/18 18:43:42; author: astro; state: Exp; lines: +12 -5 | |More errors fixed. | | | |revision 1.33 | |date: 2001/06/18 18:23:11; author: astro; state: Exp; lines: +6 -5 | |Fixed a few bugs in 1.32. John | | | |revision 1.32 | |date: 2001/06/18 18:11:39; author: astro; state: Exp; lines: +19 -4 | |I added a simulation and games section with a few new links. I also corrected a few spacing issues. John. | | | |revision 1.31 | |date: 2001/06/15 13:37:58; author: astro; state: Exp; lines: +13 -6 | |Change the working of the Online Tools section, divided up the tools into form based and Java Applet based and added a few new listings. | | | |revision 1.30 | |date: 2001/06/14 20:17:26; author: astro; state: Exp; lines: +4 -4 | |Added a space between link for CAOS and its name. | | | |revision 1.29 | |date: 2001/06/14 20:07:08; author: astro; state: Exp; lines: +5 -5 | |Removed the word "Linux" from the Yahoo link as this is just Astronomy Software. | | | |revision 1.28 | |date: 2001/06/14 20:03:18; author: astro; state: Exp; lines: +4 -4 | |Fixed missing quote in the Yahoo Astronomy Software link. | | | |revision 1.27 | |date: 2001/06/14 19:59:33; author: astro; state: Exp; lines: +5 -4 | |Added link to the Yahoo Astronomy Software directory. | | | |revision 1.26 | |date: 2001/06/14 19:34:32; author: ecdowney; state: Exp; lines: +16 -9 | |*** empty log message *** | | | |revision 1.25 | |date: 2001/06/14 18:48:10; author: astro; state: Exp; lines: +47 -49 | |Changed htmlurl to url so the links appear in the txt file along with the html files. htmlurl suppresses the links in the text. | | | |revision 1.24 | |date: 2001/06/14 18:29:25; author: astro; state: Exp; lines: +3 -10 | |Fixed a few links. | | | |revision 1.23 | |date: 2001/06/14 18:19:24; author: astro; state: Exp; lines: +8 -4 | |Added a few more general search engine links for specific queries about Astronomy Software Linux | | | |revision 1.22 | |date: 2001/06/14 18:14:34; author: astro; state: Exp; lines: +13 -3 | |Added some general search engine links for specific queries about Astronomy Software Linux | | | |revision 1.21 | |date: 2001/06/13 22:06:47; author: ecdowney; state: Exp; lines: +4 -4 | |*** empty log message *** | | | |revision 1.20 | |date: 2001/06/13 18:11:27; author: ecdowney; state: Exp; lines: +5 -5 | |*** empty log message *** | | | |revision 1.19 | |date: 2001/06/13 18:05:05; author: ecdowney; state: Exp; lines: +42 -4 | |*** empty log message *** | | | |revision 1.18 | |date: 2001/06/13 16:49:06; author: astro; state: Exp; lines: +4 -4 | |Changed the Copyright to include the year 2001. | | | |revision 1.17 | |date: 2001/04/10 21:47:17; author: astrohowto; state: Exp; lines: +4 -4 | |Changed www.astronomy.net to astronomy.net. | | | |revision 1.16 | |date: 2001/04/10 21:43:43; author: astrohowto; state: Exp; lines: +0 -2 | |Removed log message. | | | |revision 1.15 | |date: 2001/04/10 21:42:56; author: astrohowto; state: Exp; lines: +5 -3 | |Added Log information. | | | |revision 1.14 | |date: 2001/04/10 21:40:14; author: astrohowto; state: Exp; lines: +4 -4 | |Changed main HOWTO web site to howto.astronomy.net. | | | |revision 1.13 | |date: 2000/11/28 15:23:37; author: astrohowto; state: Exp; lines: +6 -4 | |Revised the Author information. | | | |revision 1.12 | |date: 2000/11/21 22:00:45; author: astrohowto; state: Exp; lines: +16 -4 | |Added Projects section and added CHARA to it. | | | |revision 1.11 | |date: 2000/11/21 21:39:11; author: astrohowto; state: Exp; lines: +14 -7 | |Added several links and removed one bad one for AstrHorloge. | | | |revision 1.10 | |date: 2000/11/07 17:22:16; author: astrohowto; state: Exp; lines: +220 -225 | |Removed a few more text line ending problems. | | | |revision 1.9 | |date: 2000/09/21 15:55:48; author: astrohowto; state: Exp; lines: +4 -4 | |Changed the link to XSky after receiving an email from terry Friedrichsen. | | | |revision 1.8 | |date: 2000/08/14 18:33:47; author: astrohowto; state: Exp; lines: +12 -3 | |Added Nightfall to plantarium programs. | | | |revision 1.7 | |date: 2000/08/14 18:16:28; author: astrohowto; state: Exp; lines: +38 -62 | |Removed line feeds from several of Elwood's paragraphs. | |Also added a few suggestions from emails received. | | | |revision 1.6 | |date: 2000/05/03 22:01:25; author: astrohowto; state: Exp; lines: +15 -3 | |Added Copyright text. | | | |revision 1.5 | |date: 2000/05/02 11:59:19; author: astrohowto; state: Exp; lines: +8 -4 | |Added Linux to the Title and added some contact information. JSH. | | | |revision 1.4 | |date: 2000/05/02 09:05:20; author: astrohowto; state: Exp; lines: +224 -110 | |Elwoods additions. | | | |revision 1.3 | |date: 2000/04/30 15:14:25; author: astrohowto; state: Exp; lines: +4 -3 | |More RCS | | | |revision 1.2 | |date: 2000/04/30 14:45:16; author: astrohowto; state: Exp; lines: +5 -0 | |Added some RCS keywords. | | | |revision 1.1 | |date: 2000/04/30 14:43:43; author: astrohowto; state: Exp; | |Initial revision | +-----------------------------------------------------------------------------------------------------------------------------------------+ Linux ATA RAID HOWTO Murty Rompalli            murty@solar.m u r t y.n e t          April 26, 2002 Revision History Revision 2.0 2002-05-10 Revised by: mr Major enhancements Revision 1.3 2002-05-07 Revised by: jyg format fixes Revision 1.2 2002-04-30 Revised by: mr Minor fixes Revision 1.1 2002-04-28 Revised by: ldl Some minor changes and sgml-improvements Revision 1.0 2002-04-26 Revised by: mr Initial Release RAID is not limited to expensive SCSI disks anymore as more and more motherboard manufacturers are introducing motherboards with onboard RAID support for inexpensive IDE disks, known as ATA RAID. Promise Technolgy and HighPoint are two companies that dominate this ATA RAID market. This HOWTO document explains how to install Linux on an Intel Pentium compatible computer with an ATA RAID Controller (onboard chip or seperate card), single or multiple processors and atleast two hard disks. Currently, this document covers installing RedHat Linux 7.2 with Promise FastTrack ATA RAID Controller only. ----------------------------------------------------------------------------- Table of Contents 1. Introduction 1.1. Copyright Information 1.2. Disclaimer 1.3. New Versions 1.4. Credits 1.5. Feedback 1.6. Translations 2. Requirements 3. Prepare Promise Driver Floppy 4. Preparing RedHat 7.2 CDs 5. Installing Red Hat 7.2 6. Installing Native Linux RAID 7. Installing on an existing Linux system 7.1. Append Line 7.