|
NOTE: The LC Compaq Clusters are in the process of being phased out. TC2K was removed from service on 12/8/04. All references to TC2K herein are historical only. Also, because the Compaqs are being phased out, this tutorial is no longer being maintained as of 3/05.
LC Compaq Cluster Systems |
There are 3 Compaq clusters systems within LC - two in the OCF
and one in the SCF. Their general configuration information is shown below. For
additional details, see:
![]()
|
![]() |
![]()
|
![]() |
Hardware Overview |
Alpha 21264 Architecture:
![]()
|
![]() |
Hardware Overview |
Basic Chip | Processor |
---|---|
21064 | EV4 - EV4x |
21164 | EV5 - EV5x |
21264 | EV6 - EV6x |
21364 | EV7 |
Model Series | AlphaServer System | Base Processor/Chip | Max CPUs | Max Clock Speed |
---|---|---|---|---|
GS Series | GS1280 | EV7/21364 | 16 | 1.15 GHz |
GS320 | EV6/21264 | 32 | 1.25 GHz | |
GS160 | EV6/21264 | 16 | 1.25 GHz | |
GS80 | EV6/21264 | 8 | 1.25 GHz | |
ES Series | ES80 | EV7/21364 | 8 | 1 GHz |
ES47 | EV7/21364 | 4 | 1 GHz | |
ES45 | EV6/21264 | 4 | 1 or 1.25 GHz | |
ES40 | EV6/21264 | 4 | 667 or 833 MHz | |
DS Series | DS25 | EV6/21264 | 2 | 1 GHz |
DS20L | EV6/21264 | 2 | 833 MHz | |
DS20E | EV6/21264 | 2 | 833 or 667 MHz | |
DS10L | EV6/21264 | 1 | 600 MHz | |
DS10 | EV6/21264 | 1 | 600 MHz | |
SC Series | SC20 | EV6/21264 | 256 | 833 MHz |
SC45 | EV6/21264 | 4096 | 1.25 GHz | |
Alpha Processor Design:
C-chip: responsible for control of the I/O and memory subsystem.
Issues commands and addresses to the D-chips and P-chips, which
are then responsible for the actual data transfer.
D-chips: responsible for all data movement to memory and I/O. Implement the crossbar switch, which allows multiple concurrent operations, and contain internal queues to allow extensive pipelining of operations. Each D-chip contains multiple interfaces that include memory bus data ports, CPU data ports, P-chip data ports, and a C-chip interface. | P-chips: interface with both C-chip and D-chips and provide a 64-bit, 33 MHz, PCI 2.1 compliant interface. Each PCI bus provides 256 MB/s of I/O bandwidth. This 64-bit interface fully supports 32-bit PCI operation and all 32 bit PCI devices. The C-chip controls the Pchip. All data transfers to or from the P-chip are done through the D-chips. Supports PIO, DMA and PCI To PCI (PTP) transfers, providing maximum performance and the ability to transfer data without CPU involvement. |
Hardware Overview |
Primary components:
Topology:
![]() |
![]() |
Features:
Software and Development Environment |
The software and development environment for the Compaq clusters is
similar to what is generally described in the Introduction to LC Resources
tutorial. Items specific to the Compaq clusters are discussed below.
Tru64 Operating System:
setenv JAVA_HOME /usr/opt/java122 set path = ($JAVA_HOME/bin $path)
Located in /usr/lib. Link with -lcxml. See man dxml for details, or download the "Compaq Extended Math Library Reference Guide" from the Tru64 UNIX Online Documentation web page: http://h30097.www3.hp.com/docs.
Batch System:
User Filesystems:
Compilers |
Available Compilers:
Compiler Invocation Commands:
Language | Command | Description |
---|---|---|
C | cc | Tru64 C |
gcc | GNU C | |
guidec | KAI Guide C | |
C++ | cxx | Tru64 C++ |
g++ | GNU C++ | |
guidec++ | KAI Guide C++ | |
KCC | KAI C++ | |
Fortran | f77,f90,f95 | Tru64 Fortran |
g77 | GNU Fortran | |
guidef77,guidef90 | KAI Guide Fortran |
Parallel Usage:
Compiler | Pthreads | OpenMP | MPI |
---|---|---|---|
cc | ![]() |
![]() |
![]() |
gcc | ![]() |
![]() |
|
guidec | ![]() |
![]() |
![]() |
cxx | ![]() |
![]() |
![]() |
g++ | ![]() |
![]() |
|
guidec++ | ![]() |
![]() |
![]() |
KCC | ![]() |
![]() |
|
f77,f90,f95 | ![]() |
![]() |
|
g77 | ![]() |
||
guidef77,guidef90 | ![]() |
![]() |
Useful Tru64 Compiler Options:
Option | Description | ||||
---|---|---|---|---|---|
-annotations [option] | Fortran90 only. Adds annotations to source listing that explain optimizations. See man page for options. | ||||
-arch [target] | Specifies the version of the Alpha processor to generate instructions for. The default is generic. Using host allows processor specific instructions with possible performance improvement. | ||||
-automatic
-recursive |
Fortran only. Essentially the same - places local variables on the run-time stack and compiles subroutines/functions for possible recursion. The defaults are -noautomatic -norecursive. | ||||
-check_bounds | Perform run time checking for array subscripts and character substrings | ||||
-convert [option] | Fortran only. Converts unformatted data to various big endian / little endian formats | ||||
-fast | Recommended for performance. This option is actually a short-hand for
several options:
|
||||
-fpe, -fpe[n] | Fortran only. Specifies how to handle different floating point exceptions. -fpe is the same as -fp0 and is recommended. | ||||
-g0 - turns off all debugging information
-g1 - generates traceback information only (default) -g,-g2 (same) - generate traceback and symbolic debugging information and sets -O0. -g3 - more debugging information than g2 without setting -O0 |
|||||
-I | For include paths, as usual | ||||
-ieee | C, C++ only. Support all portable features of the IEEE Standard for Binary Floating-Point Arithmetic | ||||
-L,-l | For library paths and names, as usual | ||||
-lmpi | C, C++: link with MPI | ||||
-lfmpi -lmpi | Fortran: link with MPI | ||||
Various degrees of optimization; -O0 is no optimization | |||||
-omp | Turns on OpenMP | ||||
-p,-pg | Turns on prof (-p) or gprof (-pg) profiling | ||||
-pthread | Link with POSIX threads library | ||||
-real_size [n]
-r[n] -i[n] -integer_size [n] -double_size[n] |
Fortran only. Various ways to specify size of data types | ||||
-source_listing | Produce a source code listing file | ||||
-V, -version | Print compiler version information. -V for C, -version for Fortran | ||||
-v | Verbose |
Compiler Documentation:
MPI |
What's Available?
MPI Implementation | Machines | Comments |
---|---|---|
Compaq MPI shared memory | GPS, SC | Native MPI. On-node communications only. Not thread safe. |
Quadrics MPI | TC2K | Native MPI. Uses shared memory for on-node communications and message passing over the Quadrics switch for inter-node communications. Not thread-safe. Recommended. |
MPICH shared memory | GPS, SC, TC2K | On-node communications only. Not thread safe |
MPICH P4 | GPS, SC, TC2K | Inter-node communications. Not thread safe. Not recommended due to poor performance. |
Compiling with Compaq's MPI
C Examples | cc code.c -lmpi gcc code.c -lmpi |
C++ Examples | cxx code.C -lmpi g++ code.C -lmpi KCC code.C -lmpi |
Fortran Examples | f90 code.F -lfmpi -lmpi f77 code.f -lfmpi -lmpi |
![]() | Note: The -pthread flag is recommended by Compaq. If -pthread is used, it should be included on both compilation and load commands, as this option will automatically add appropriate thread-safe options for both the compiler and the loader. If your application is not threaded, you may omit -pthread, but this is not recommended since the thread-safe mode is also safe for single-threaded applications. In addition to the use of -pthread as a compiling and loading option, KCC requires -pthread (alternatively, --thread_safe) when building a library archive (combining .o files into a .a file). |
Compiling with MPICH
C Example | mpicc code.c |
C++ Examples | mpiCC code.C mpicxx code.C* |
Fortran Examples | mpif77 code.f mpif90 code.F |
* Note that mpicxx has been created as an alternative to mpiCC. mpicxx invokes the vendor C++ compiler, whereas mpiCC invokes g++. All other scripts use the native compiler.
mpif77 - version 1.1.2
old_mpif77 - version 1.2.1
new_mpif77 - version 1.2.4
man -P/usr/local/mpi/man mpiCC man -P/usr/local/new_mpi/man mpiCC man -P/usr/local/old_mpi/man mpiCC
MPICH_CC | alternate C compiler | |
MPICH_CLINKER | alternate C linker | |
MPICH_F77 | alternate Fortran 77 compiler | |
MPICH_F77LINKER | alternate Fortran 77 linker | |
MPICH_F90 | alternate Fortran 90 compiler | |
MPICH_F90LINKER | alternate Fortran 90 linker | |
MPICH_CCC | alternate C++ compiler | |
MPICH_CCLINKER | alternate C++ linker |
These environment variables have been used successfully in the past to change the MPICH compiler/linker commands to work with the KCC, Guide and GNU compilers on the Compaqs. However, depending upon your application's specific requirements, there may be problems with creating incompatible object files due to compilation options, include file dependencies, or libraries.
Compiling MPICH P4
mpicc_p4 mpiCC_p4 mpicxx_p4 mpif77_p4 mpif90_p4 mpirun_p4
Running on Compaq Clusters |
GPS and SC | TC2K |
---|---|
Primarily designated for serial and single node parallel jobs. | Designated as a parallel resource exclusively. |
Configured very heterogeneously. There are different types of nodes with different memory, CPUs, time and job limits. | Configured very homogeneously. All compute nodes are similar if not identical in configuration. |
The concept of a "pool" of nodes is not defined. Users can select nodes individually, or by characteristics such as processor type, memory or how long a job may run. | All nodes fall into one of two clearly defined pools: pdebug or pbatch. Selection of individual nodes or selection by node characteristics is not applicable. |
Nodes are shared with other users with most nodes being used simultaneously for interactive and batch jobs. | Nodes are not shared with other users (except for the two login nodes). When your job runs, the allocated nodes are dedicated to you. |
Job Limits:
Running on Compaq Clusters |
dmpirun [option list] [executable] [args] mpirun [option list] [executable] [args]For example:
dmpirun -np 4 mycode mpirun -np 4 mycodewould run 4 mycode MPI processes. Note that the syntax is picky - must have a space between -np 4 and all dmpirun/mpirun options must appear before the executable name.
TC2K:
prun [option list] [executable] [args]Note that prun options must preceed your executable.
prun -n4 -ppdebug my_app | 4 process job run interactively in pdebug partition |
prun -n2 -c2 my_threaded_app | 2 process job with 2 threads per proces |
prun -N8 my_app | Request that 8 nodes be used for job (total of 32 CPUs) |
prun -n4 -o my_app.out my_app | 4 process job that redirects stdout to file my_app.out |
prun -n4 -ppdebug -i all my_app | 4 process interactive job; each process accepts input from stdin |
Option | Description |
---|---|
-c [#cpus/task] | The number of CPUs used by each MPI process. Use this option if each process in your code spawns multiple POSIX or OpenMP threads. |
-h | Print list of options. |
-i [file]
-o [file] |
Redirect input/output to file specified. |
-I | Allocate CPUs immediately or fail. By default, prun blocks until resources become available. |
-m block|cyclic | Specifies whether to use block (the default) or cyclic distribution of processes over nodes. |
-n [#processes] | Number of processes job requires. |
-N [#nodes] | Number of nodes on which to run job. |
-p [partition] | Specify an RMS partition on which to run job. |
-s | Print usage stats as job exits. |
-t | Prefix each line of output with process number. |
-v -vv -vvv | Increasing levels of verbosity. |
Running on Compaq Clusters |
TC2K:
tckk14{joeuser}214: rinfo MACHINE CONFIGURATION tckk production PARTITION CPUS STATUS TIME TIMELIMIT NODES root 512 tckk[0-127] pdebug 0/40 running 15:01:59:39 tckk[16-25] pbatch 368/384 running 12:23:56:13 tckk[26-31,34-63,66-95,9 8-127] RESOURCE CPUS STATUS TIME USERNAME NODES pbatch.20258 64 allocated 09:36:32 joek tckk[38-39,48-49,52-54,5 6,58-59,63,66,72,87,89,110] pbatch.20263 96 allocated 08:35:52 yo3ng tckk[40-47,50-51,55,60-6 2,92,111,113-116,123-126] pbatch.20270 16 allocated 07:44:04 user23 tckk[29-31,34] pbatch.20271 16 allocated 07:40:34 user23 tckk[67-70] pbatch.20299 8 allocated 05:17:23 bl4ee tckk[93-94] pbatch.20328 8 allocated 03:34:52 we44born tckk[81-82] pbatch.20329 8 allocated 03:29:54 we44born tckk[79-80] pbatch.20330 8 allocated 03:24:54 we44born tckk[117-118] pbatch.20353 120 allocated 02:11:13 r9mirez3 tckk[36-37,71,73-78,83-8 6,88,90-91,95,98-100,105-109,112,119-122] pbatch.20538 24 allocated 02:13 wtweis tckk[26-28,35,57,101] JOB CPUS STATUS TIME USERNAME NODES pbatch.20989 64 running 09:36:29 joek tckk[38-39,48-49,52-54,5 6,58-59,63,66,72,87,89,110] pbatch.20994 96 running 08:35:51 yo3ng tckk[40-47,50-51,55,60-6 2,92,111,113-116,123-126] pbatch.21001 16 running 07:44:02 user23 tckk[29-31,34] pbatch.21002 16 running 07:40:33 user23 tckk[67-70] pbatch.21029 8 running 05:17:21 bl4ee tckk[93-94] pbatch.21058 7 running 03:34:51 we44born tckk[81-82] pbatch.21059 7 running 03:29:53 we44born tckk[79-80] pbatch.21060 7 running 03:24:53 we44born tckk[117-118] pbatch.21083 120 running 02:11:11 r9mirez3 tckk[36-37,71,73-78,83-8 6,88,90-91,95,98-100,105-109,112,119-122] pbatch.21268 24 running 02:12 wtweis tckk[26-28,35,57,101] |
tckk15{joeuser}214: spjstat Scheduling pool data: -------------------------------------------------------- Pool Memory Cpus Nodes Usable Free -------------------------------------------------------- pbatch 3400Mb 4 96 95 1 pdebug 3400Mb 4 10 10 10 Running job data: ------------------------------------------------------- Resource ID User Name Nodes Pool Status ------------------------------------------------------- 60999 draarer 8 pbatch allocated 61055 puiles 4 pbatch allocated 61053 ewsgmann 7 pbatch allocated 61046 oidner10 2 pbatch allocated 61045 ffyong 1 pbatch allocated 61032 98ais 6 pbatch allocated 61031 uayis 6 pbatch allocated 61030 ipoiad 8 pbatch allocated 61025 8ah9u 2 pbatch allocated 61020 ewqqu 2 pbatch allocated 61019 nvccero 16 pbatch allocated 61017 ppotsu 6 pbatch allocated 61014 yeshsch 15 pbatch allocated 61007 con9nell 11 pbatch allocated |
Running on Compaq Clusters |
Killing Interactive Jobs:
rcontrol kill resource [resource name] signal 9
where resource name is given by the rinfo command's RESOURCE section. For example:
[jjj@epcra0 ~]$ rinfo MACHINE CONFIGURATION epcra all PARTITION CPUS STATUS TIME TIMELIMIT NODES root 258 epcra[0-127] epcrai pbatch 241/252 running 07:31 epcra[2-127] RESOURCE CPUS STATUS TIME USERNAME NODES pbatch.2415 1 allocated 00:46 jjj epcra127 JOB CPUS STATUS TIME USERNAME NODES pbatch.2677 1 running 00:46 jjj epcra127 |
Running on Compaq Clusters |
Submitting Batch Jobs:
# Sample LCRM script to be submitted with psub # These commands are for LCRM #PSUB -c tc2k,pbatch # explicitly say where to run #PSUB -r t2d22 # sets job name #PSUB -tM 1:00 # sets maximum total CPU time #PSUB -b micphys # sets bank account #PSUB -ln 2 # uses 2 nodes #PSUB -x # export current env var settings #PSUB -o /home/db/t2d22.log # sets output log name #PSUB -e /home/db/t2d22.err # sets error log name #PSUB -nr # do NOT rerun job after system reboot #PSUB -ro # write stdout immediately (no spooling) #PSUB -re # write stderr immediately (no spooling) #PSUB -mb # send email at execution start #PSUB -me # send email at execution finish #PSUB # no more psub commands # Shell commands start here set echo echo LCRM job id = $PSUB_JOBID cd /p/ba2/db/t2d22 prun -n 8 ./my_mpiprog echo 'ALL DONE' |
psub run.sh
GPS/SC: Specifying Memory (and Other) Constraints:
#PSUB -c 15000Mb,gps
#PSUB -c 'tc06|tc07' #PSUB -c gps08 #PSUB -c ev67
![]() | Note that the LCRM constraint specification is rather fussy. For example, given the case of -c 15000Mb,gps, there is no space between "15000" and the "Mb" or on either side of the comma. Units are only in megabytes on top of that. See the psub man page for additional rules and tips regarding constraints. |
Quick Summary of Common Batch Commands:
Command | Description |
---|---|
psub | Submits a job to LCRM |
pstat | LCRM job status command |
rinfo | RMS job status command. |
prm | Remove a running or queued job |
phold | Place a queued job on hold |
prel | Release a held job |
palter | Modify job attributes (limited subset) |
lrmmgr | Show host configuration information |
pshare | Queries the LCRM database for bank share allocations, usage statistics, and priorities. |
defbank | Set default bank for interactive sessions |
newbank | Change interactive session bank |
Running on Compaq Clusters |
#PSUB -ln 4 prun -n16 a.out
#PSUB -ln 4 prun -n4 -c4 a.out
Task Block Cyclic ------ ----- ------ task 0 node0 node0 task 1 node0 node1 task 2 node0 node0 task 3 node0 node1 task 4 node1 node0 task 5 node1 node1 task 6 node1 node0 task 7 node1 node1
#PSUB -ln 20 prun -N2 -n8 myjob prun -N3 -n12 myjob prun -N4 -n16 myjob .... prun -N20 -n80 myjob
GPS/SC: Effectively Utilizing CPUs:
#PSUB -cpn 4
Running on Compaq Clusters |
Error/Problem | Action |
---|---|
MPI aborts and leaves running processes consuming CPU time | Use ps -u [userid] to identify processes and then kill -9 [pid] to terminate them. May need to check all hosts that your job used. |
MPI aborts and leaving shared memory segments still allocated but unusable. | GPS/SC: Run the ipcs command to determine if your shared memory segments are still allocated. If so, then run mpiclean to remedy the problem. May need to do this on all hosts that your job used. |
MPI_INIT : MPIRUN chose the wrong device |
GPS/SC: You compiled with DMPI, but ran with mpirun (MPICH). |
MPI_Comm_size reports np=1 | GPS/SC: Happens even when you ran with dmpirun -np 4, for example Cause: You compiled with MPICH, but ran with dmpirun (DMPI) |
ump_init failure | GPS/SC: Not enough shared memory available. Run mpiclean. If problem persists, call the LC Hotline. |
Can't run multi-host MPI jobs. Processes never start on remote hosts. | Cause: ssh configuration problems. To verify, type: ssh -v [remote-host] uname -n |
sc_sec_sec_creds failed: Requested key is unavailable (dce / sec)
- or - prun: Error: scsec: keytab not found, please run scgenkeytab |
TC2K only. DCE/DFS related. The simplest way to work around these problems, provided you don't need DCE credentials, is to set the environment variable SCSECDISABLED to 1. This tells prun that it shouldn't try to propagate your credentials to the processes it initiates on remote nodes. Most errors starting with sc_sec_sec can be fixed by setting this environment variable. Note: scgenkeytab does not work with OTP authentication. |
Unaligned access pid=12504 |
Unaligned accesses can significantly degrade performance. When a fault is
generated, the operating system must realign the data. Some common causes
for unaligned accesses include 1) incorrectly passing pointers;
2) Passing a real to a function/subroutine that is expecting an integer;
3) Intermixing character and integer*4 types with real*8 types within a
common block.
To fix, use uac: uac p nofix noprint sigbus. Running uac causes unaligned accesses to generate a sigbus, which debuggers will interpret as a breakpoint. Then run your code under TotalView. Your code will automatically stop whenever an unaligned access occurs. Examine the source where the fault occurred, and fix the problem. See the uac man page for details. |
Job killed for using too much memory | On GPS/SC, jobs are killed if they exceed the memory limit. Some hints to
find out just how much memory your job requires.
For further studies on memory utilization you may use available memory tools, such as Third and Electric Fence. |
Debugging |
Available Debuggers:
Using TotalView on the Compaq Clusters:
ssh-agent /bin/csh ssh-add
totalview prun -a -n 8 -p pdebug a.out
mpirun -tv -np 4 myprogFor Compaq MPI programs, use TotalView's -a option. For example:
totalview dmpirun -a -np 4 myprogWhat happens from here is similar to TC2K.
This completes the tutorial.
![]() |
Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise. |
Where would you like to go now?
References and More Information |