|
Abstract |
This tutorial is intended to be an introduction to using LC's IA32 Linux
clusters. It begins by providing a brief historical background of Linux clusters
at LC, noting their success and adoption as a production, high performance
computing platform. The primary hardware components of an IA32 cluster are then
presented, including Intel's IA32 Xeon processor and the Quadrics
interconnect switch. The hardware configuration for each of LC's production Linux clusters completes the hardware related information.
After covering the hardware related topics, software topics are discussed, including the LC development environment, compiling with Intel and PGI compilers, and how to run both batch and interactive parallel jobs. Special attention is paid to IA32 issues in each of these areas. Available debuggers and performance related tools/topics are briefly discussed, however detailed usage is beyond the scope of this 1.5 hour presentation. A lab exercise using one of LC's IA32 Linux clusters follows the presentation.
Level/Prerequisites: Intended for those who are new to developing
parallel programs in LC's Intel IA32 cluster environment. A basic
understanding of parallel programming in C or Fortran is assumed.
The material covered by EC3501 - Introduction to Livermore Computing Resources would
also be useful.
Background of Linux Clusters at LLNL |
The Linux Project:
Alpha Linux Clusters:
PCR Clusters:
MCR Cluster...and More:
ALC | OCF | 960 nodes |
ILX | OCF | 67 nodes |
PVC | OCF | 64 nodes |
SPHERE | OCF | 84 nodes |
LILAC | SCF | 768 nodes |
ACE | SCF | 160 nodes |
GVIZ | SCF | 64 nodes |
Which Led To...
|
![]() |
Hardware Overview |
IA32 Xeon Design Facts/Features: (circa LC's
clusters)
Adapted from a similar diagram in the
"Overview of Recent Supercomputers" whitepaper
by Aad J. van der Steen, Utrecht University, Netherlands, 2003.
Floating Point Unit:
SIMD Vector Units:
Hyper-Threading:
Chipsets, Memory and Peformance:
Example: Intel E7500 Chipset Xeon System |
![]() |
Hardware Overview |
Racks:
Node / Rack Configurations:
Hardware Overview |
Primary components:
Topology:
![]() |
![]() |
Features:
Performance:
LC Linux Cluster Systems |
ALC:
![]() |
![]() ![]() |
PVC:
SPHERE:
ILX:
PENGRA:
SAN PLANS:
ACE:
QUEEN:
GVIZ:
Software and Development Environment |
NOTE: The software and development environment for LC's
IA32 Linux clusters is similar to what is described in the
Introduction to LC Resources
tutorial. Only a summary or items specific to the Linux clusters are
discussed below.
CHAOS Operating System:
Batch System:
File Systems:
Compilers:
MKL - Intel Math Kernel Library
Debuggers and Performance Analysis Tools:
Man Pages:
IA32 Compilers |
Compiler Invocation Commands:
icc | serial/OpenMP C |
icpc | serial/OpenMP C++ |
ifort | serial/OpenMP Fortran 77 and 90 |
mpiicc | script for C with Quadrics MPI |
mpiicpc | script for C++ with Quadrics MPI |
mpiifort | script for Fortran with Quadrics MPI |
Versions:
Compiler | Shell | Command |
---|---|---|
C/C++ | bsh/ksh | . /usr/local/intel/compiler81/bin/iccvars.sh |
csh/tcsh | source /usr/local/intel/compiler81/bin/iccvars.csh | |
Fortran | bsh/ksh | . /usr/local/intel/compiler81/bin/ifortvars.sh |
csh/tcsh | source /usr/local/intel/compiler81/bin/ifortvars.csh |
Common / Useful Options:
Option | Description | C/C++ | Fortran |
---|---|---|---|
-align keyword | Align data as specified by keyword. See man page for details. | ![]() |
|
-ansi_alias
-no-ansi_alias |
Can help performance. Directs the compiler to assume the
following:
C/C++ Default = -no-ansi_alias (off) Fortran Default = -ansi_alias (on) |
![]() |
![]() |
-assume keyword
-assume buffered_io |
Specifies assumptions made by the compiler. One option that may improve I/O performance is buffered_io, which causes sequential file I/O to be buffered rather than being written to disk immediately. See the ifort man page for details. | ![]() |
|
-auto
-automatic -nosave -save
|
Places variables, except those declared as SAVE, on the run-time stack.
The default is -auto_scalar (local scalar of types INTEGER, REAL,
COMPLEX, or LOGICAL are automatic). However, if you specify -recursive
or -openmp, the default is -auto.
Places variables, except those declared as AUTOMATIC, in static memory. However, if you specify -recursive or -openmp, the default is -auto. |
![]() |
|
-autodouble | Defines real variables to be REAL(KIND=8). Same as specifying -r8. | ![]() |
|
-c | Stop the compilation after an object file has been produced - creates a *.o file and does not link. | ![]() |
![]() |
-check keyword | Enable runtime error checking actions according to keyword. | ![]() |
|
-convert keyword | Specifies the format for unformatted files, such as big endian, little endian, IBM 370, Cray, etc. | ![]() |
|
-Dname[=value] | Defines a macro name and associates it with a specified value. Equivalent to a #define preprocessor directive. | ![]() |
![]() |
-fast | Shorthand for several combined optimization options: -O3, -ipo -static | ![]() |
![]() |
-fpp
-cpp |
Invoke Fortran preprocessor. -fpp and -cpp are equivalent. | ![]() |
|
-fpe[n] | Fortran compilers greater than version 8.1:
Specifies the run-time floating-point exception handling behavior:
|
![]() |
|
-g | Build with debugging symbols. Note that -g does not imply -O0 in the Intel compilers; -O0 must be specified explicitly to turn all optimizations off. | ![]() |
![]() |
-help | Print compiler options summary | ![]() |
![]() |
-Idirectory | Add directory to include file search path | ![]() |
![]() |
-ip | Enable single-file interprocedural optimizations. | ![]() |
![]() |
-ipo | Enable multi-file interprocedural optimizations. | ![]() |
![]() |
-Ldirectory | Add directory to library search path | ![]() |
![]() |
-mcpu=pentium4
-march=pentium4 |
Optimize for pentium 4 / Xeon processor (default) | ![]() |
|
-module directory | Specifies the directory where module (.mod) files should be placed when created and where they should be searched for in USE statements. | ![]() |
|
-mp | 'Maintain precision' - favor conformance to IEEE 754 standards for floating-point arithmetic. | ![]() |
![]() |
-mp1 | Improve floating-point precision - less speed impact than -mp. | ![]() |
![]() |
-o name | Create an object file called name. | ![]() |
![]() |
-O0 | Turn off optimizer - recommended if using -g for debugging. | ![]() |
![]() |
-O, -O1, -O2, -O3 | Optimization levels. (O,O1,O2 are essentially equivalent). -O3 is the most aggressive optimization level. Note that Intel compilers perform optimization by default. | ![]() |
![]() |
-openmp | Turns on OpenMP. Supports OpenMP 2.0. | ![]() |
![]() |
-opt_report
-opt_report_file filename -opt_report_level [min|med|max] -openmp_report[0|1|2] -par_report[0|1|2|3] |
Various reporting options on optimization, OpenMP, or auto-parallelization. See man pages for more information. | ![]() |
![]() |
-p | Enables function profiling with the gprof tool. Same as -qp | .![]() |
![]() |
-parallel | Enable auto-parallelizer to generate multi-threaded code for eligible loops. | ![]() |
![]() |
-prof_gen
-prof_file -prof_use |
Used for profile guided optimization. | ![]() |
![]() |
-pthread, -lpthread | Link with Pthreads library | ![]() |
|
-r8
-r16 -real_size 64 -real_size 128 |
Different ways to specify the default size of real and/or double-precision numbers. | ![]() |
|
-recursive | Compiles all functions for possible recursion. | ![]() |
|
-reentrancy keyword | Specifies how to compile for multithreaded code. | ![]() |
|
-shared | Create a shared object (.a, .so) | ![]() |
![]() |
-static | Enables linking to shared libraries (.so) statically. | ![]() |
![]() |
-tpp7 | Optimize for pentium 4 / Xeon (default) | ![]() |
![]() |
-unroll[n] | Set maximum number of times to unroll loops. Omit n to use default heuristics. Use n=0 to disable loop unroller. | ![]() |
![]() |
-V | Display compiler version information | ![]() |
![]() |
-w
Disable all warning messages |
![]() ![]() | ||
-w[0|1|2]
Increasing levels of warning message reporting. Default=1 |
![]() ![]() | ||
-Wall (C/C++)
-warn (Fortran) |
Enable all warning messages | ![]() |
![]() |
-xW -axW | Utilize SSE2 instructions - turns on auto-vectorizer for the pentium 4 / Xeon. -axW is used only if you intend on running your code on an IA32 architecture other than pentium 4 / Xeon. | ![]() |
![]() |
-Zp[n] | Align structures at n (1,2,4,8,16) byte boundaries. | ![]() |
![]() |
GNU Compatibility:
Caveats:
#include <stdio.h> int main() { int i = 2; i /= 0; printf("i = %d\n",i); } |
#include <stdio.h> int main() { float i = 2; i /= 0; printf("i = %f\n",i); } |
IA32 Compilers |
Command | Comments |
---|---|
. /usr/local/pgi6/linux86/6.0/bin/startpgi.sh |
Version 6.0, for bash/ksh/sh users |
source /usr/local/pgi6/linux86/6.0/bin/startpgi.csh |
Version 6.0, for tcsh/csh users |
Option | Description |
---|---|
-fast | Turn on optimizations |
-g | Generate symbolic debug information |
-help | Display help |
-Kieee | Force IEEE 754 arithmetic |
-Ktrap=[fp,inv,denorm,divz,ovf,unf,inexact] | Unmask FPU exceptions |
-mp | Turn on OpenMP |
-Mvect=[prefetch,sse] | Enable prefetch, SSE |
-O, -O1, -O2 | Optimization levels |
-pc32
-pc64 -pc80 |
Set precision of FPU significand to 32, 64, or 80 bits respectively |
-V | Display version information |
-v | Verbose mode |
-w | Suppress warning messages |
MPI |
MPI Build Scripts:
Language | Script Name | Underlying Compiler |
---|---|---|
C | mpicc | gcc |
mpiicc | icc | |
mpipgcc | pgcc | |
C++ | mpiCC | g++ |
mpiicpc | icpc | |
mpipgCC | pgCC | |
Fortran | mpif77 | g77 |
mpiifort | ifort | |
mpipgf77 | pgf77 | |
mpipgf90 | pgf90 |
Static Linking:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/rms/lib/dbg
Libelan Environment Variables:
Performance:
MPI |
Language | Script Name | Underlying Compiler |
---|---|---|
C | mpicc | gcc |
mpiicc | icc | |
mpipgcc | pgcc | |
C++ | mpiCC | g++ |
mpiicpc | icpc | |
mpipgCC | pgCC | |
Fortran | mpif77 | g77 |
mpiifort | ifort | |
mpipgf77 | pgf77 | |
mpipgf90 | pgf90 |
Running on IA32 Clusters |
Quadrics Switch | No Switch |
---|---|
Most Linux Clusters | ILX, ACE, QUEEN |
Designated as a parallel resource. | Primarily designated for serial and single node parallel jobs. |
All nodes fall into clearly defined pools, such as pdebug and pbatch. Additional pools may be present. Selection of individual nodes or selection by node characteristics is not applicable. | The concept of a "pool" of nodes is not defined. Users can select nodes individually, or by characteristics such as processor type, memory or how long a job may run. |
Nodes are not shared with other users (except for the login nodes). When your job runs, the allocated nodes are dedicated to you. | Nodes can be shared with other users |
Job Limits:
Running on IA32 Clusters |
Batch Jobs:
psub myjobscript
# Sample LCRM script to be submitted with psub #PSUB -c mcr,pbatch # explicitly say where to run #PSUB -r t2d22 # sets job name #PSUB -tM 1:00 # sets maximum total CPU time #PSUB -b micphys # sets bank account #PSUB -ln 2 # uses 2 nodes #PSUB -x # export current env var settings #PSUB -o /home/db/t2d22.log # sets output log name #PSUB -e /home/db/t2d22.err # sets error log name #PSUB -nr # do NOT rerun job after system reboot #PSUB -ro # write stdout immediately (no spooling) #PSUB -re # write stderr immediately (no spooling) #PSUB -mb # send email at execution start #PSUB -me # send email at execution finish #PSUB # no more psub commands # job commands start here # Display job information for possible diagnostic use set echo hostname echo LCRM job id = $PSUB_JOBID sinfo squeue # Run info cd /p/ba2/db/t2d22 srun -n 4 ./my_mpiprog echo 'ALL DONE' |
Quick Summary of Common Batch Commands:
Command | Description |
---|---|
psub | Submits a job to LCRM |
pstat | LCRM job status command |
prm | Remove a running or queued job |
phold | Place a queued job on hold |
prel | Release a held job |
palter | Modify job attributes (limited subset) |
lrmmgr | Show host configuration information |
pshare | Queries the LCRM database for bank share allocations, usage statistics, and priorities. |
defbank | Set default bank for interactive sessions |
newbank | Change interactive session bank |
Running on IA32 Clusters |
ssh mcr
ssh alc
ssh pengra
srun [option list] [executable] [args]Note that srun options must preceed your executable.
srun -n4 -ppdebug my_app |
4 process job run interactively in pdebug partition |
srun -n2 -c2 my_threaded_app |
2 process job with 2 threads per process. Assumes pbatch partition. |
srun -N8 my_app |
Request that 8 nodes be used for job (total of 16 CPUs). Assumes pbatch partition. |
srun -n4 -o my_app.out my_app |
4 process job that redirects stdout to file my_app.out. Assumes pbatch partition. |
srun -n4 -ppdebug -i my.inp my_app |
4 process interactive job; each process accepts input from a file called my.inp instead of stdin |
Option | Description |
---|---|
-c [#cpus/task] |
The number of CPUs used by each process. Use this option if each process in your code spawns multiple POSIX or OpenMP threads. |
--core=light |
Specifies creation of lightweight core files. May be useful for very large process jobs which are crashing and filling disk space with core files. Note double dashes before "core" in this option. The default is --core=normal, which may actually be limited by your shell corefilesize setting. |
-d |
Specify a debug level - integer value between 0 and 5 |
-i [file] -o [file] |
Redirect input/output to file specified |
-I |
Allocate CPUs immediately or fail. By default, srun blocks until resources become available. |
-J |
Specify a name for the job |
-l |
Label - prepend task number to lines of stdout/err |
-m block|cyclic |
Specifies whether to use block (the default) or cyclic distribution of processes over nodes |
-n [#processes] |
Number of processes that the job requires |
-N [#nodes] |
Number of nodes on which to run job |
-O |
Overcommit - srun will refuse to allocate more than one process per CPU unless this option is also specified |
-p [partition] |
Specify a partition on which to run job |
-s |
Print usage stats as job exits |
-v -vv -vvv |
Increasing levels of verbosity |
-V |
Display version information |
Clusters Without a Switch: (ACE, QUEEN, ILX)
mpirun [option list] [executable] [args]For example:
mpirun -np 2 mycodewould run a 2 process MPI job called "mycode". Note that the syntax is picky - must have a space between -np 2 and all mpirun options must appear before the executable name. Use the mpirun -h command for more information.
Running on IA32 Clusters |
mcr36% squeue | grep test1 24688 pbatch test110 qmtang R 1:49:09 25 mcr[373-397] 68865 pbatch test1 blaise R 0:11 1 mcr[563] mcr36% scancel 68865 |
mcr36% pstat 25156 t1.cmd blaise 000000 cs RUN mcr N 25157 t1.cmd blaise 000000 cs STAGING mcr N mcr36% prm 25156 remove running job 25156 (blaise, 000000, cs)? [y/n] y mcr36% |
Clusters Without a Switch: (ACE, QUEEN, ILX)
Running on IA32 Clusters |
mcr36% ju Partition total down used avail cap Jobs pdebug 64 0 55 9 86% bbnliu-16, pmorris-1, rreed-8.... pbatch 1048 5 818 225 78% ggk-1, vo4-320, griinman-25, ... |
mcr38% spjstat Scheduling pool data: -------------------------------------------------------- Pool Memory Cpus Nodes Usable Free -------------------------------------------------------- pbatch 3300Mb 2 1048 1047 39 pdebug 3300Mb 2 64 64 48 Running job data: ------------------------------------------------------- Job ID User Name Nodes Pool Status ------------------------------------------------------- 28309 kkio 360 pbatch Running 27132 nnning 32 pbatch Running 26937 wwwook 256 pbatch Running 28359 pyrota 8 pbatch Running 27515 sssuru 256 pbatch Running 28479 qupder 96 pbatch Running 66340 wickris 16 pdebug Running |
mcr36% sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST pdebug up 30:00 60 alloc mcr[40-55,57-98,102-103] pbatch* up infinite 835 alloc mcr[104-144,152-204,...] pdebug up 30:00 4 idle mcr[56,99-101] pbatch* up infinite 3 drain* mcr[310,422,519] pbatch* up infinite 2 drain mcr[421,514] pbatch* up infinite 208 idle mcr[145-151,205,218...] |
mcr38% squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 25232 pbatch ynffrno2 wwilta R 11:42:55 8 mcr[464,473-474,479,887-890] 28309 pbatch cpmd ikuo R 8:08:06 360 mcr[104-463] 27132 pbatch thana800 nonini R 7:45:06 32 mcr[518-525,527-533,793,796 ...] 26937 pbatch isj awwwok R 7:27:16 256 mcr[481-509,534-757,792,794-795] 28359 pbatch ynffrno2 wwilta R 7:09:36 8 mcr[808,812-818] 27515 pbatch mcr.psub treuru R 4:48:07 256 mcr[758-791,802-807,809-811 ...] 28479 pbatch ge87h76g hopder R 1:55:36 96 mcr[465,467,469,472,477,526 ...] 66335 pdebug mgmuuyqN zxxnnuc R 18:30 13 mcr[53-65] |
mcr36% pstat -m mcr JID NAME USER ACCOUNT BANK STATUS EXEHOST CL 8942 mo_108.0 3ood 000000 squeeze *WCPU mcr N 8949 u2.psub uyang 477530 micphys *WCPU mcr N 16346 do800 kers 000000 micphys *WCPU mcr N 17873 b4f lkwggner 000000 micphys *WCPU mcr N 17874 b4f lkwggner 000000 micphys *DEPEND mcr N 22678 valduc3d01 kbbbta 000000 cms *MULTIPLE mcr N 22684 ExpandingTube-3 jwen 529004 axcode RUN mcr N 22685 ExpandingTube-3 jwen 529004 axcode *DEPEND mcr N 22879 mo4.psub uyang 477530 squeeze *WCPU mcr N 22991 vlcc_8.10.16 m55rath5 000000 fph2o *DEPEND mcr N 24640 test.thunder lirin 530001 clchange RUN mcr N 24653 do90 kers 000000 micphys RUN mcr N 24655 do70 kers 000000 micphys RUN mcr N ... ... ... 24656 amr100gp kgitsu 000000 chemd RUN mcr N 24839 rh315_100gp kgitsu 000000 micphys RUN mcr N 24840 origi htang 530001 lines *WCPU mcr N 24841 rh315_100gp kgitsu 000000 micphys *TOOLONG mcr N 24842 dpd ggee 000000 cms RUN mcr N 24873 methanol bmundy 000000 fph2o RUN mcr N 24879 new-ExpandingTu jwen 529004 axcode RUN mcr N 24880 amr100gp kgitsu 000000 chemd *TOOLONG mcr N 43344 sspex_test2.ksh qitera1 000000 folding *DEPEND mcr N 43345 sspex_test2.ksh qitera1 000000 folding *DEPEND mcr N 43346 sspex_test2.ksh qitera1 000000 folding *DEPEND mcr N |
Clusters Without a Switch: (ACE, QUEEN, ILX)
Running on IA32 Clusters |
Interactive | Batch |
---|---|
srun -n16 -ppdebug a.out |
#PSUB -ln 8 srun -n16 a.out |
Interactive | Batch |
---|---|
srun -n20 -c2 -ppdebug a.out |
#PSUB -ln 20 srun -n20 -c2 a.out |
Task Block Cyclic ------ ----- ------ task 0 node0 node0 task 1 node0 node1 task 2 node1 node2 task 3 node1 node3 task 4 node2 node0 task 5 node2 node1 task 6 node3 node2 task 7 node3 node3
#PSUB -ln 20 srun -N10 -n20 myjob srun -N11 -n22 myjob srun -N12 -n24 myjob .... srun -N20 -n40 myjob
Clusters Without a Quadrics Switch:
#PSUB -np 2This will prevent LCRM from over-allocating the node.
Running on IA32 Clusters |
Compiler Data Size Limit:
C/C++ | Fortran |
---|---|
#include <stdio.h> #define N 2147483647 int main(int argc, char *argv[]) { long int i; static char A[N]; for (i=0; i<N; i++) A[i] = 'x'; printf("Sample result = %c \n",A[N-1]); } |
program testit integer arraysize parameter(arraysize=2147483647) character A(arraysize) integer i do i=1, arraysize A(i) = "x" enddo write(*,*)'Sample result A= ', A(i-1) end |
C/C++ | error: array is too large |
Fortran | A common block or variable may not exceed 2147483647 bytes |
Malloc Limits:
Shell Stacksize Limits:
Shell | Limits |
---|---|
csh/tcsh | cputime unlimited filesize unlimited datasize unlimited stacksize unlimited coredumpsize 16 kbytes memoryuse unlimited vmemoryuse unlimited descriptors 1024 memorylocked 4 kbytes maxproc 1024 |
bash/ksh/sh | time(cpu-seconds) unlimited file(blocks) unlimited coredump(blocks) 32 data(kbytes) unlimited stack(kbytes) unlimited lockedmem(kbytes) 4 memory(kbytes) unlimited nofiles(descriptors) 1024 processes 1024 |
Program Stack Limits:
Pthreads Stack Limits:
#Threads | Approx. Max. Size (MB) |
---|---|
2 | 1072 |
4 | 712 |
8 | 352 |
16 | 182 |
32 | 92 |
OpenMP Stack Limits:
setenv KMP_STACKSIZE 12000000
#Threads | Approx. Max. Size (MB) |
---|---|
2 | 2145 |
4 | 900 |
8 | 400 |
16 | 190 |
32 | 100 |
Miscellaneous
total used free shared buffers cached Mem: 4052 1614 2438 0 44 1250 -/+ buffers/cache: 319 3733 Swap: 6145 0 6145
total used free shared buffers cached Mem: 2026 1781 244 0 68 1412 -/+ buffers/cache: 300 1725 Swap: 4096 5 4091
In Conclusion:
Running on IA32 Clusters |
Environment Variable | Description |
---|---|
LIBELAN_WAITTYPE | Default value is POLL. Sets wait type to polling for Elan communications; POLL should always be used for Quadrics MPI jobs. |
MALLOC_MMAP_MAX_ | Default value of 0. Forces malloc to use sbrk() rather than mmap() to allocate memory. Improves performance of MPI collectives because it prevents aggressive reclaiming of pages mapped on the Elan card DMA memory. |
MALLOC_TRIM_THRESHOLD_ | Default value of -1. Used in conjunction with MALLOC_MMAP_MAX_ (see above) |
LIBELAN_GALLOC_EBASE | Default value of 0xb0000000. With LIBELAN_GALLOC_MBASE and LIBELAN_GALLOC_SIZE, used to resize the Elan global memory heap for MPI collective operations. EBASE Refers to a pointer to a base virtual address in Elan memory to be used for the global heap. |
LIBELAN_GALLOC_MBASE | Default value of 0xb0000000. Refers to a pointer to the main memory base for resizing the Elan global memory heap. |
LIBELAN_GALLOC_SIZE | Default value of 16777216. The size, in bytes, of the Elan global memory heap. Cannot exceed 32 MB due to amount of memory on Elan cards. |
MPI_USE_LIBELAN | By default, not set, which equates to a value of 1. If set to 0, turns off Elan library optimizations. Use only for debugging the Elan libraries if problems within these libraries are suspected. |
Compiler Hints:
#pragma vector aligned - Indicates that memory references in a vectorizable loop are aligned.
#pragma novector - Do not vectorize the loop that follows the directive.
#pragma unroll(n) - Unroll the subsequent for loop n times.
#pragma nounroll - Do not unroll the subsequent for loop.
#pragma loop count (n) - Indicates that the loop count for a given loop is likely to be n.
Web Documentation:
Debugging |
Available Debuggers:
Debugger | More Info |
---|---|
TotalView |
|
DDT |
|
GDB |
|
DDD |
|
PGDBG |
|
IDB |
|
TotalView:
totalview srun -a -n processes -ppdebug prog [prog args]
DDT:
ddt prog
srun -ppdebug -N4 ddt prog
GDB:
gdb a.out
gdb a.out core.1234
gdb a.out 12345
Command | Action |
---|---|
b,break N | Set breakpoint at line N |
b,break funcname|line | Set breakpoint at function named funcname or at specified line |
bt | Print a stack backtrace |
c,cont | Continue after breakpoint |
h,help | Print list of help topics |
i,info registers | Show registers |
i,info float | Show floating-point registers |
l,list N | List N lines of code (default is 10) |
n,next | Execute next program line; step over function calls |
q,quit | Quit |
r,run | Run program |
s,step | Execute next program line; step into function |
DDD:
ddd a.out
ddd a.out core.1234
ddd a.out 12345
PGDBG:
pgdbg a.out
pgdbg -dbx -text a.out
pgdbg -core corefilename a.out
IDB:
idb -gdb a.out
idb a.out core.1234
idb -gdb a.out -pid 12345
set path = ($path /usr/local/intel/idb_80/bin)
setenv IDB_HOME /usr/local/intel/idb_80/bin
Debugging in Batch: batchxterm:
batchxterm display machine #nodes #minutesWhere:
cd ~/projects totalview srun -a -n8 myprog
A Few Additional Useful Debugging Hints:
srun -N12 -x "mcr40 mcr41" -ppdebug myjob
csh/tcsh | limit coredumpsize 64 |
---|---|
ksh/bsh | ulimit -c 64 |
Tools |
We Need a Book!
setenv GMON_OUT_PREFIX 'gmon.out.'`/bin/uname -n`
Known Problems/Issues |
A Partial List:
References and More Information |
This completes the tutorial.
![]() |
Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise. |
Where would you like to go now?