CSM & xCAT Coexistence HOWTO
Version: 1.3
Last revised: 5/23/2003
Author: Vallard Benincosa - vallard@us.ibm.com
As always, comments and suggestions are appreciated and welcomed.
Change History
5/23/2003 1.3 added switch, updated sections
2/19/2003 1.2 added remote bios intructions (6.1)
1/25/2003 1.1 initial release.
Table of Contents
1. Introduction
xCAT and CSM can coexist on the same cluster, and in fact can be used at
the same time. This document will not go into the details of installing
xCAT, nor into the details of installing CSM. What it will cover is what
to do to make the transition from one to the other. This will include a
description of the tools found in the csm.ect rpm. So if you need to install
xCAT, please go to http://xcat.org and follow
the appropriate links. If you would like to install CSM, please go to
http://www-1.ibm.com/servers/eserver/clusters/library/csmsetup.html.
This document assumes that you have installed either xCAT or CSM on
your system and that it is functioning properly.
Also, theses tools (as of today 1/20/2003) have only been tested on
RedHat machines and may not work on SUSE or SLES or other distributions
of Linux.
2. Install xcatcsm tools
Assuming you have xCAT or CSM installed, you should now obtain the csm.ect
rpm. This is available on the IBM alphaworks site at http://alphaworks.ibm.com/tech/ect4linux.
Make sure that you have version 1.3.1-14 or greater. Now just install this
like you would any other RPM.
# rpm -i /root/csm.ect-1.3.1-14.i386.rpm
You will need to have the perl-DBI 1.21 installed as it is required. Also,
csm.ect will only work with csm 1.3.1 installed.
(Although it doesn't require any csm to be installed for ect to be installed,
this is just so that xcat2csm will work without csm being installed)
3. Installing CSM on an xCAT cluster
So you have xCAT running on your system and it works perfectly. Now, you
want to put CSM on the same cluster so that you can use event monitoring
and some of the other CSM tools. If you are used to installing nodes via
xCAT then you should do so. Define all the attributes in the /opt/xcat/etc/
tables and install the nodes like you would normally do, if you haven't
done so already.
3.1 Run xcat2csm --migrate
The first thing to do is run xcat2csm --migrate. This tool is packaged
inside the csm.ect rpm. By default, it is installed under /opt/csm/ect/bin,
[root@imaster4 RedHat]# /opt/csm/ect/bin/xcat2csm --migrate
Let's see if CSM is installed...................1.3.1.0
Migrating xCAT configuration files to CSM...
Would you like to link xCAT install files for CSM to use?
(yes is recommended) [y/n] y
linking files under /install/rh73
linking files under /install/rh73
linking files under /install/rh73/cdrom
linking files under /install/rh73/cdrom
linking files under /install/rh73/cdrom/RedHat
...
linking files under /install/rh73/RedHat
linking files under /install/rh73/RedHat/base
Migration Complete. You can now install CSM
When the csm.ect rpm was installed, xcat2csm was put into the /opt/csm/ect/bin
directory.
Running this command will do a few things to make your system ready for
a CSM install. Here is what it does:
-
It makes a directory called /tmp/bak.
-
It makes a "hard" link from /install/ to the directories under /csminstall
that CSM requires for full installs. This is so that you can still do xCAT
linux installs as well as CSM linux installs. The reason it's a hard link
and not a soft link is because if you were to NFS mount one of them, then
the links would fail.
-
It checks to see if you have conserver and atftpd running
-
If you do, (like you should if xCAT is installed properly) then the files
atftpd and conserver from the /etc/rc.d/init.d directory will be moved
to /tmp/bak.
-
Shut down atftp and conserver services. This will make it so the xCAT remote
console functions (rcons,wcons,etc) do not work as well as many of the
installation functions, such as pxe boot.
You may be asking, why do I want to shut down those services? I need those.
The reason why is because CSM installs both of those services and it trips
if those are already installed. But don't worry, once CSM is installed,
then those services will be installed again.
I should note here that the atftp that will be installed by CSM will
be version 0.3. CSM installs an RPM while xCAT uses a tarball. The xCAT
version will probably be at 0.6 and you may decide that you would rather
have 0.6 running instead of 0.3. That's fine. Wait until CSM is installed,
then after it is installed do the following:
# vi /etc/xinetd.d/tftp
Add the following line inside the brackets:
service tftp
{
...
disable = yes
...
}
# service xinetd restart
# cp /tmp/bak/atftpd /etc/rc.d/init.d/
# service atftpd restart
It will be to your benefit to use the CSM conserver. This is because, when
new nodes are added to CSM, the conserver.cf file is automatically updated.
3.2 Install CSM Management Server
Follow the directions in the CSM Installation guides. Do not define the
nodes
* note: Make sure that you accept the CSM license before moving
on. To accept the "try and buy" license that is only good for a few weeks
run:
# /opt/csm/bin/csmconfig -L
If you have the full license, then run:
# /opt/csm/bin/csmconfig -L
3.3 Run xcat2csm
This time, don't specify any arguments and the nodes will automatically
be put in the CSM database:
[root@imaster4 xcat]# lsnode # this is the CSM command - no nodes
[root@imaster4 xcat]# nodels # this is the xCAT command - 1 node
i3
[root@imaster4 xcat]# /opt/csm/ect/bin/xcat2csm
Let's see if CSM is installed...................1.3.1.0
Let's see if there are xCAT tables in place......Yep.
Is c5cn20 an ELS or CPS console server? [els | cps]
els
Getting Current CSM configuration...
Defining i3 in CSM database....
[root@imaster4 xcat]# lsnode
i3.clusters.com
In the above example, I only had one node defined in xCAT and no nodes
defined in CSM. I ran xcat2csm and it took data from the xcat tables and
magically put it into the CSM database.
You should now go through and check the CSM database to make sure that
the attributes are the way you like them.
3.4 Other Uses For xcat2csm
You do not need CSM installed to take xCAT tables and put them into a CSM
nodedef file. If you run xcat2csm with no arguments when CSM is not installed,
it will create a file: /tmp/nodedef that can be used by CSM's definenode
command to define all of the nodes. It also creates a file called /tmp/nodegrpdef
that takes all your nodegroups in xCAT and moves them to CSM's nodegroup.
You have to use the nodegrp command to do this.
Here's how:
# definenode -f /tmp/nodedef
# nodegrp -f /tmpnodegrpdef
You may also want to make a delta file to see what has changed in CSM:
# xcat2csm -d
(puts the file in /tmp/nodedef)
You can also use the -f option to specify where you want the nodedef file
to go. (the nodegrp file will still go to /tmp/nodegrpdef)
# xcat2csm -f /home/csm/csmnodedef
3.5 Verify Conserver and ATFTP work
You should be able to run CSM's rconsole and xCAT's rcons,wcons, etc. If
you can't then compare the /opt/xcat/etc/conserver.tab with the
/etc/opt/conserver/conserver.cf.
You'll notice that CSM does not put "localhost" as an allowed client. It
may not be worth your time to add "localhost" in the file, since CSM updates
this file whenever a new node is added or Console attributes are changed.
Instead, you should update the /opt/xcat/etc/conserver.tab to use
the actual host name of the management server.
You can verify that atftp is working by going to /tmp and doing the
following:
[root@imaster4 tmp]# atftp localhost
tftp> get pxelinux.0
tftp> quit
4. Installing xCAT on a CSM cluster
There are many advantages to adding xCAT tools to a CSM cluster. These
advantages include additional functionality such as remote BIOS flash,
many hardware setup functions, and HPC stack tools.
4.1 Get and install xCAT - well maybe...
Go to http://xcat.org for information on
how to do this. But before you do, read on so that you don't install unnecessary
things. ECT will begin shipping xCAT tools that it uses, so in many cases
you will not have to install xCAT separately.
4.2 What not to install
You don't need to install atftp or conserver since these are already installed.
You will still have all the same functionality any other xCAT server would
have, however, if you wish you may want to use the xCAT atftp since it
is a later version and some of the bugs have been worked out. See section
3.1 for information on that.
4.3 Run csm2xcat
csm2xcat will convert the CSM database into xCAT tables. csm2xcat is not
an onto function, since there are more tables then data contained in the
CSM database. For this reason, you should verify and add tables that you
need when the function is done.
[root@imaster4 xcat]# csm2xcat
Writing tables in /opt/xcat/etc
Let's see if CSM is installed...................1.3.1.0
Getting Current CSM configuration...
csm2xcat complete
5. Installing a node with CSM or xCAT
If you started out with xCAT on your system and then followed the instructions
in section 3 on migrating to CSM, then you may already be set up to do
the installation of a node. You may use xCAT or CSM.
5.1 Installing a node via xCAT with CSM installed
Follow the standard xCAT procedures for installing a compute node. Once
you are done, then you need to make that node part of the CSM cluster.
You will find that the node is in PreManaged state in CSM:
[root@imaster4 bin]# lsnode -a InstallStatus
i3: PreManaged
To make this node part of the CSM cluster, run updatenode. Before doing
so check that RedHat CDs have been copied (or linked via xcat2csm -migrate)
to the appropriate directory under /csminstall. You can run csmsetupks
to do this if the CDs haven't been copied there yet.
[root@imaster4 RedHat]# csmsetupks -n i3
Copying Red Hat Images from /mnt/cdrom.
Insert Red Hat Linux 7.3 disk 1.
Press Enter to continue.
...
Now that the CDs are in place run updatenode -P to put all the premanaged
nodes in managed state. This will install the CSM client software on the
nodes.
[root@imaster4 bin]# updatenode -P
i3.clusters.com: Setting Management Server to imaster4.
i3.clusters.com: Updating RPMs
i3.clusters.com: Node Install - Successful.
[root@imaster4 RedHat]# lsnode -a InstallStatus
i3: Managed
If you had problems with this command check that the following attributes
have been set in the CSM database:
InstallPkgArchitecture, InstallOSName,ManagementServer, InstallDistribution*,
and InstallCSMVersion.
5.2 Installing a node via CSM with xCAT installed
There really shouldn't be any interference with xCAT when trying to do
a full CSM install.
Provided the CD's were copied in the right directory, you should be
able to accomplish the full install with 2 commands: csmsetupks and installnode.
You should note however that xCAT and CSM both overwrite the /etc/dhcpd.conf
file. Both programs have a way of backing them up, but you may want to
copy yours into a safe place before running csmsetupks (CSM) or makedhcp
(xCAT)
[root@imaster4 etc]# csmsetupks -xn i3
Setting up PXE.
Generating /etc/dhcpd.conf file for MAC address collection.
Setting up Kickstart.
10684 blocks
Adding nodes to /etc/dhcpd.conf file for Kickstart install: i3.clusters.com.
[root@imaster4 etc]# installnode -n i3
...
Common CSM/xCAT tasks
CSM remote bios flash
Before I get more into this, let me mention that there is some work underway
to make this process much easier. That should be out 3rd Quarter 2003.
There is an excellent remote flash document included in the xcat-dist-core
tarball. Let me repeat what that says here, just so there is no mistake:
Remote Flashing is the ability to perform BIOS and Firmware upgrades
remotely. xCAT support for remote flashing is very limited and should be
considered experimental. Use at your own risk.
The same goes for CSM and the xcat2csm tools.
Provided you read the warning, let's go through an example of updating
the remote flash of an x345 node that is currently a CSM Managed node.
In this example, CSM has been completely installed but xCAT has not. Here
is how to proceed:
-
Get the latest xCAT distributions. Untar these files so that /opt/xcat
is the top directory of the compacted files:
[root@devmstr opt]# cd /opt/
[root@devmstr opt]# tar zxvf xcat-dist-core-1.1.8.tgz
xcat/bin/
xcat/bin/rcad
xcat/bin/rpower
xcat/bin/nodeset
...
xcat/windows/sid.cmd
xcat/windows/reboot.cmd
[root@devmstr opt]# tar zxvf xcat-dist-ibm-1.1.8.tgz
xcat/flash/basefs.dos/command.com
xcat/flash/tools/sr-cmdr.exe
...
xcat/i686/sbin/hawkname
[root@devmstr opt]# tar zxvf xcat-dist-oss-1.1.8.tgz
...
[root@devmstr opt]# tar zxvf xcat-dist-intel-1.1.8.tgz
...
-
Get the latest csm.ect that contains xcat2csm and csm2xcat. This is available
off the alphaworks website:
http://www.alphaworks.ibm.com/tech/ect4linux
Install this by running
rpm -i csm.ect-<version>.rpm
-
Fill in the HWModel attribute. You can do this by running
chnode -n node HWModel=hw-model-type
e.g:
[root@devmstr bin]# chnode node1-node3 HWModel=x345
[root@devmstr bin]# lsnode -a HWModel
node1: x345
node2: x345
node3: x345
-
Run csm2xcat
[root@devmstr bin]# ./csm2xcat
csm2xcat: env XCAT not defined!
csm2xcat: XCATROOT does not exist
It doesn't appear that you have xCAT installed
We will use "/opt/xcat" as XCATROOT
Writing tables in /opt/xcat/etc
Let's see if CSM is installed...................1.3.1.0
Getting Current CSM configuration...
csm2xcat complete
-
Now Take a look at the files that csm2xcat created in /opt/xcat/etc There
are several attributes that need to be filled or checked so that remote
flash will work.
-
Look at the nodemodel.tab is your node listed with the correct model?
e.g.
node1.clusters.com x345
- mpa.tab
- mp.tab
- Make flash directory for your nodes: Since I'm doing this for an x345,
here's what I do:
[root@devmstr flash]# cd /opt/xcat/flash
[root@devmstrflash]# cp -r x340 x345
[root@devmstr x345]# echo " " > bios.post.bat
(The bios.post.bat file tells the node to reboot when the bios is
complete.
For the x345's we don't want to reboot after they are done, because
they may go into infinite loop. The reason for this is that we have no way
to update the /tftpboot/pxelinux.cfg/<IP in HEX> file. The same is
true for all machines that are not using the e100pro that the standard x340's
x342's, and x330's use.)
- Make floppy into dd file: Go to
www.pc.ibm.com/support and
get the latest bios update images then make a floppy using the directions as
specified on the web page.
Take the floppy and mount it on your linux machine and run:
[root@vallard root]# dd if=/dev/fd0 of=/tmp/345.bios.2.05
Now take the 345.bios.2.05 file and
cp /tmp/345.bois.2.05 /opt/xcat/flash/x345
ln -s 345.bios.2.05 bios.dd
ln -s 345.bios.2.05 cmos.dd
cd ../basefs.dos
vi autoexec.bat
Make it say: "DONE,DONE,DONE"
and remove the lines below (or put #'s in front of them):
#e100bpkt.com 0x60 16
#sshdos -P -S -s #FLASHPASS# #FLASHUSER# #MIP#
#boot e100bpkt.com -u
-
Now run the following commands to update the BIOS
[root@devmstr bin]# /opt/xcat/flash/mkflash
[root@devmstr bin]# ./rflash node2.clusters.com bios rflash can render
a very large number of machines brain dead and useless. Are you sure you
know what you are doing? (YES to go on)
YES
node2.clusters.com: flash x345-bios
node2: ping to 176.60.22.8 failed
[root@devmstr bin]#
The ping fails because we have not yet set up xCAT fping. This is fine and you can ignore the ping message. There now remains for us only to reboot
the node. If there were any other errors then you should check the xCAT tables
listed above.
The xCAT documentation says that you shouldn't watch the reboot
through the rconsole... I accidentally did this one time, and was pleased
that it still functioned. This was only on an x345 however. I dare you to live
dangerously, but don't come to me if you break something. This procedure is
not warrented and we're not responsible if you break your machine.
When the bios update is complete, you will see this on the screen:
Thanks for using the POST/BIOS Update Utility
A:\>cmosutil.exe /r a:\cmos
NS417 chipset detected
CMOS settings restored from file
A:\>echo "DONE,DONE,DONE"
"DONE,DONE,DONE"
A:\>
A:\>
A:\>
Now, you need to remove the hex file so that the next time the node boots up it
doesn't try to update its BIOS again:
[root@devmstr pxelinux.cfg]# cd /tftpboot/pxelinux.cfg/
[root@devmstr pxelinux.cfg]# ls -ltr | tail -1
-rwxrwxrwx 1 root root 119 Feb 11 13:40 B03C320C
[root@devmstr pxelinux.cfg]# cp default B03C320C
cp: overwrite `B03C320C'? y
[root@devmstr pxelinux.cfg]#
Finally, reboot the node:
rpower -n node2 reboot
Unfortunately, at this point the rconosole output will be lost. This is because the bios has been set to defaults with the new upgrade. Upon boot, the node will hang at
a bios screen indicating errors. You should now go into the bios and configure
console redirection, as well as put the startup sequence back to floppy,
CDROM, Network, Hard Disk. This will get rid of the "126" configuration errors
that were seen before. I believe the reason for this error message is just
to show the user that the BIOS has been upgrades, but I am not sure about
that.
- All done. Now after you have tested this a few times with one node and
are ready to do the rest of your nodes, (presuming they all the same bios level)
run the rflash against a range of nodes and reboot them.
Once completed (and they all
say "DONE,DONE,DONE") copy /tftpboot/pxelinux.cfg/default
over each HEX file in the /tftpboot/pxelinux.cfg directory.
(or you can just remove them all
too.). Then reboot them again, modifying the BIOS and all should come out
correctly.
Collecting MAC addresses via the switch
Collecting MAC addresses (The Media Access Control address) via the switch is the process where we display the
mac address of a node, by querying the switch that the NIC is attached to.
This is nice in that we don't have to go look at the back of each ethernet
card for the MAC address. Which may or may not be visible.
There are several tasks that must be done first for the xcat/csm tools to be
able to collect the mac addresses:
-
Install the csm.ect rpm. It is available at
http://alphaworks.ibm.com/tech/ect4linux.
- There are only a few tables that need to be filled in by you.
- /opt/xcat/etc/nodelist and /opt/xcat/etc/sitetab
If you have CSM installed, then you can run csm2xcat as explained above in this document to fill in
these two tables. Otherwise use the xCAT template sitetab, as found in
/opt/xcat/samples/etc/sitetab and fill in the nodelist with your nodes.
-
/opt/xcat/etc/passwd.tab The scripts will need to telnet into
the switch in order to query the MAC addresses. Make sure that the switch
is set up to telnet into it. Put the enable password in this table as:
cisco password
Regardless of the name or type of your cisco switch the first field should be cisco,
and the second field should be your password.
-
/opt/xcat/etc/<switchtype>.tab
where switchtype can be: cisco[3500 | 3524 | 3548 | 3550 | 2950] or extreme.
The table should be filled out as follows:
<node> <switch>,<port number>
node1 mgcisco1,1
node2 mgcisco1,2
node3 mgcisco1,3
node4 mgcisco1,4
...
-
As long as these tables are all set up, then you should be able to run
/opt/csm/bin/switch_mac -n node1-node3 to get the MAC address
of those three nodes.
[root@devmstr etc]# /opt/csm/bin/switch_mac -n node1,node2,node3,node4
MACDATA:node1 eth0 NA 00:02:55:7b:07:6a NA NA
MACDATA:node2 eth0 NA 00:02:55:7b:06:8e NA NA
MACDATA:node3 eth0 NA 00:02:55:7b:06:5a NA NA
MACDATA:node4 eth0 NA 00:02:55:7b:05:c0 NA NA
You'll notice that the output is a bit strange. There are several fields.
The important thing is that there is the node name (2nd field) and the MAC address (5th field). The reason for this output is for future implementations of
getadapters which will be out in the May '03 release of CSM. The other fields
should be ignored.