Using Thunder

presented by

Blaise Barney
Livermore Computing


Table of Contents

  1. Abstract
  2. Background of Linux Clusters at LLNL
  3. Hardware Overview
    1. Thunder Configuration
    2. Itanium 2 Processor
    3. E8870 Chipset
    4. Quadrics Interconnect
    5. SAN Architecture
  4. Accounts and Access
  5. Software and Development Environment
  6. Intel Compilers
  7. Quadrics MPI
  8. Running on Thunder
    1. Batch Versus Interactive
    2. Starting and Terminating Jobs
    3. Displaying Queue and Job Status Information
    4. Optimizing CPU Usage
    5. Memory Constraints
    6. Performance Considerations
  9. Debugging
  10. Tools
  11. Known Problems/Issues
  12. References and More Information
  13. Exercise



Abstract


This tutorial is intended to be an introduction to using LC's IA64 Thunder Linux cluster. It begins by providing a brief historical background of Linux clusters at LC, noting their success and adoption as a production, high performance computing platform. The primary hardware components of Thunder are then presented, including a summary of Thunder's overall configuration, Intel's IA64 Itanium 2 processor, the E8870 Chipset and the Quadrics interconnect switch.

After covering the hardware related topics, a brief discussion on how to obtain an account and access Thunder follows. Software topics are then discussed, including the LC development environment, compiling with the Intel compilers, Quadrics MPI and how to run both batch and interactive parallel jobs. Special attention is paid to IA64 issues in each of these areas as relevant. Available debuggers and performance related tools/topics are briefly discussed, however detailed usage is beyond the scope of this presentation. The tutorial concludes with a brief listing of known issues and problems and where to go for more information. A lab exercise using the IA64 Thunder Linux cluster follows the presentation.

Level/Prerequisites: Intended for those who are new to developing parallel programs in LC's Intel IA64 cluster environment. A basic understanding of parallel programming in C or Fortran is assumed. The material covered by EC3501 - Introduction to Livermore Computing Resources would also be useful.



Background of Linux Clusters at LLNL


The Linux Project:

Alpha Linux Clusters:

PCR Clusters:

MCR Cluster...and More:

Which Led To...

  • In September, 2003 the RFP for LC's first IA-64 cluster was released. Proposal from California Digital Corporation, a small local company, was accepted.

  • 1024 node system comprised of 4-CPU Itanium 2 "Madison Tiger4" nodes

  • Thunder debuted as #2 on the Top500 Supercomputers list in June, 2004. At the time of this writing (4/05), Thunder ranks as #5 in the Top500 list. See www.top500.org for details.

  • LC's Thunder web pages are located at: http://www.llnl.gov/linux/thunder.
Thunder Photo



Hardware Overview

Thunder Configuration

Summary:

Additional Details:



Hardware Overview

Itanium 2 Processor

Background:

Itanium 2 Block Diagram:

Description:



Hardware Overview

E8870 Chipset

Block Diagram:

Components:



Hardware Overview

Quadrics Interconnect

Primary components:

Topology:

Features:

Performance:



Hardware Overview

SAN Architecture


LC OCF SAN Configuration

Accounts and Access


Accounts:

Access:



Software and Development Environment


Note: Like the IA32 Linux clusters, Thunder's software and development environment is very similar to that described in the Introduction to LC Resources tutorial. Only highlights and items specific to Thunder are discussed below.

CHAOS Operating System:

Batch System:

File Systems:

Compilers:

MKL - Intel Math Kernel Library

Debuggers and Performance Analysis Tools:

Man Pages:



Intel Compilers

Optimizing Compilers:

Compiler Invocation Commands:

Versions:

Common / Useful Options:

GNU Compatibility:

Caveats:



Quadrics MPI


Quadrics MPI:

MPI Build Scripts:

Static Linking:

Libelan Environment Variables:

Performance:

Known Problems:



Running on Thunder

Batch Versus Interactive

A Few General Notes First:

Interactive Jobs:

Batch Jobs:

Quick Summary of Common Batch Commands:



Running on Thunder

Starting and Terminating Jobs

Invoking the Executable:

Terminating Jobs:



Running on Thunder

Displaying Queue and Job Status Information



Running on Thunder

Optimizing CPU Usage



Running on Thunder

Memory Constraints

What Constraints?

Process Stack vs. Heap Memory:

Pthreads Stack Limits:

OpenMP Stack Limits:

In Conclusion:



Running on Thunder

Performance Considerations

Quadrics Environment Variables:

Compiler Hints:

Local MPI Test Results on Thunder:

Web Documentation:



Debugging

Available Debuggers:

TotalView: Small TotalView screen shot

DDT: Small ddd screen shot

GDB:

DDD: Small ddd screen shot

IDB: Small idb screen shot

Debugging in Batch: batchxterm:

A Few Additional Useful Debugging Hints:



Tools


We Need a Book!



Known Problems/Issues


Just Getting Started...

What We Have So Far:

  1. ELAN_EXCEPTION @ --: 6 (Initialisation error)
    Failed elan4_attach(6000000000005850, 6000000000004f00) 16: Device or resource busy. Job crashes during startup. Login and see news elan4_errors.

  2. Support for large files - how to support file sizes larger than 2 GB. Login and see /usr/local/docs/Large_Files.txt or Large_Files.pdf.

  3. Lustre parallel file system issues and problems: login and see /usr/local/docs/lustre.basics. You may also want to see luster_purge as well.

  4. Needing at least twice the amount of stack space really required for pthreads and OpenMP threads. Discussed in the Memory Constraints section of this tutorial.

  5. Assorted Thunder gottchas: login and see /usr/local/docs/thunder_pitfalls.

  6. Intel parallel compiler commands give warning message:
    /usr/local/intel/compiler81/lib/libimf.so.6: warning: log2l is not implemented and will always fail

  7. Disruptions and downtimes due to equipment move to the new TSF building. Stay tuned to your login banners, news items and the latest news from the LC Customer meetings. Materials online HERE. (LLNL internal)

  8. MPI related issues (cross platform): http://www.llnl.gov/computing/mpi/changes.html

  9. Porting from IA32 (or other 32-bit) systems to IA64. Lots to be said but need to collect it yet. Try the web - especially Intel's web site intel.com and do a search with relevant keywords. Just one important example for now:

    IA32 Data Sizes IA64 Data Sizes
    int=             4 bytes
    unsigned int=    4 bytes
    long=            4 bytes
    unsigned long=   4 bytes
    *int=            4 bytes
    float=           4 bytes
    *float=          4 bytes
    double=          8 bytes
    *double=         4 bytes
    
    int=             4 bytes
    unsigned int=    4 bytes
    long=            8 bytes
    unsigned long=   8 bytes
    *int=            8 bytes
    float=           4 bytes
    *float=          8 bytes
    double=          8 bytes
    *double=         8 bytes
    

  10. A memory leak occurs in the elan library when creating/destroying communicators. It causes eventual slowdown in codes that create/destroy large numbers of MPI communicators.

  11. A number of changes have been made to the Quadrics Elan and MPI libraries. If you encounter a problem while running, please try the following:


References and More Information


This completes the tutorial.

Evaluation Form       Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?