IA32 Linux Clusters Overview

Table of Contents

  1. Abstract
  2. Background of Linux Clusters at LLNL
  3. Hardware Overview
    1. IA32 Xeon Processor
    2. Nodes, Racks, and Configurations
    3. Quadrics Interconnect
  4. LC IA32 Linux Cluster Systems
  5. Software and Development Environment
  6. Compilers
    1. Intel Compilers
    2. PGI Workstation
  7. MPI
    1. Quadrics MPI
    2. MPICH
  8. Running on IA32 Systems
    1. Overview
    2. Batch Versus Interactive
    3. Starting Jobs
    4. Terminating Jobs
    5. Displaying Queue and Job Status Information
    6. Optimizing CPU Usage
    7. Memory Constraints
    8. Performance Considerations
  9. Debugging
  10. Tools
  11. Known Problems/Issues
  12. References and More Information
  13. Exercise



Abstract


This tutorial is intended to be an introduction to using LC's IA32 Linux clusters. It begins by providing a brief historical background of Linux clusters at LC, noting their success and adoption as a production, high performance computing platform. The primary hardware components of an IA32 cluster are then presented, including Intel's IA32 Xeon processor and the Quadrics interconnect switch. The hardware configuration for each of LC's production Linux clusters completes the hardware related information.

After covering the hardware related topics, software topics are discussed, including the LC development environment, compiling with Intel and PGI compilers, and how to run both batch and interactive parallel jobs. Special attention is paid to IA32 issues in each of these areas. Available debuggers and performance related tools/topics are briefly discussed, however detailed usage is beyond the scope of this 1.5 hour presentation. A lab exercise using one of LC's IA32 Linux clusters follows the presentation.

Level/Prerequisites: Intended for those who are new to developing parallel programs in LC's Intel IA32 cluster environment. A basic understanding of parallel programming in C or Fortran is assumed. The material covered by EC3501 - Introduction to Livermore Computing Resources would also be useful.



Background of Linux Clusters at LLNL


The Linux Project:

Alpha Linux Clusters:

PCR Clusters:

MCR Cluster...and More:

Which Led To...

  • In September, 2003 the RFP for LC's first IA-64 cluster was released. Proposal from California Digital Corporation, a small local company, was accepted.

  • 1024 node system comprised of 4-CPU Itanium 2 "Madison Tiger4" nodes

  • Thunder debuted as #2 on the Top500 Supercomputers list in June, 2004.

  • Thunder will not be discussed in this tutorial, because it is an IA64 architecture machine. It is discussed in more detail in the "Using Thunder" tutorial.

  • For more information see: http://www.llnl.gov/linux/thunder.

  • Oh yea...did we mention PELOTON? Stay tuned...
Thunder Photo



Hardware Overview

IA32 Xeon Processor

Basics:

IA32 Xeon Design Facts/Features: (circa LC's clusters)

Floating Point Unit:

SIMD Vector Units:

Hyper-Threading:

Chipsets, Memory and Peformance: Chipset and memory images



Hardware Overview

Nodes, Racks, and Configurations

Nodes: Rack mounted nodes

Racks:

Node / Rack Configurations:



Hardware Overview

Quadrics Interconnect

Primary components:

Topology:

Features:

Performance:



LC Linux Cluster Systems

OCF

MCR: MCR

ALC: ALC

PVC:

SPHERE:

ILX: ILX

PENGRA:

SAN PLANS:

LC OCF SAN Configuration


SCF

LILAC:

ACE:

QUEEN:

GVIZ:



Software and Development Environment


NOTE: The software and development environment for LC's IA32 Linux clusters is similar to what is described in the Introduction to LC Resources tutorial. Only a summary or items specific to the Linux clusters are discussed below.

CHAOS Operating System:

Batch System:

File Systems:

Compilers:

MKL - Intel Math Kernel Library

Debuggers and Performance Analysis Tools:

Man Pages:



IA32 Compilers

Intel Compilers

Features:

Compiler Invocation Commands:

Versions:

Common / Useful Options:

GNU Compatibility:

Caveats:



IA32 Compilers

PGI Workstation



MPI

Quadrics MPI

General Info:

MPI Build Scripts:

Static Linking:

Libelan Environment Variables:

Performance:



MPI

MPICH

General Info:

Running on IA32 Clusters

Overview

Big Differences:

Job Limits:



Running on IA32 Clusters

Batch Versus Interactive

Interactive Jobs:

Batch Jobs:

Quick Summary of Common Batch Commands:



Running on IA32 Clusters

Starting Jobs

Clusters with a Quadrics Switch:

Clusters Without a Switch: (ACE, QUEEN, ILX)



Running on IA32 Clusters

Terminating Jobs

Clusters with a Quadrics Switch:

Clusters Without a Switch: (ACE, QUEEN, ILX)



Running on IA32 Clusters

Displaying Queue and Job Status Information

Clusters with a Quadrics Switch:

Clusters Without a Switch: (ACE, QUEEN, ILX)



Running on IA32 Clusters

Optimizing CPU Usage

Clusters with a Quadrics Switch:

Clusters Without a Quadrics Switch:



Running on IA32 Clusters

Memory Constraints

32-bit Architecture Limit

Compiler Data Size Limit:

Malloc Limits:

Shell Stacksize Limits:

Program Stack Limits:

Pthreads Stack Limits:

OpenMP Stack Limits:

Miscellaneous

In Conclusion:



Running on IA32 Clusters

Performance Considerations

Environment Variables:

Compiler Hints:

Web Documentation:



Debugging

Available Debuggers:

TotalView: Small TotalView screen shot

DDT: Small ddd screen shot

GDB:

DDD: Small ddd screen shot

PGDBG: Small pgdbg screen shot

IDB: Small idb screen shot

Debugging in Batch: batchxterm:

A Few Additional Useful Debugging Hints:



Tools


We Need a Book!



Known Problems/Issues


A Partial List:

  1. Run-time malloc problems related to Elan environment variable settings. Login and see news malloc.

  2. Incomplete error messages from Fortran compiler and the setting of your NLS environment variable. Login and see /usr/local/docs/linux.basics. Then search for "NLSPATH" to read the details. Fortran 8.0 only.

  3. Support for large files - how to support file sizes larger than 2 GB. Login and see /usr/local/docs/Large_Files.txt or Large_Files.pdf.

  4. Lustre parallel file system issues and problems: login and see /usr/local/docs/lustre.basics. You may also want to see luster_purge as well.

  5. Ignore references to the /usr/local/docs/memory_constraints document as that information is now wrong since the CHAOS 2.0 upgrade in 2004.

  6. MPI related issues (cross platform): http://www.llnl.gov/computing/mpi/changes.html


References and More Information


This completes the tutorial.

Evaluation Form       Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?