Livermore Computing Resource Management System (LCRM)

Table of Contents

  1. Abstract
  2. LCRM Overview
  3. Resource Allocation & Control System (RAC)
  4. LCRM Bank Structure
  5. Bank Shares
  6. User RAC Utilities
    1. defbank
    2. newbank
    3. pshare
    4. bac
    5. brlim
    6. pquota
    7. lrmusage
    8. LCRM Usage GUI
  7. Production Workload Scheduler (PWS)
  8. LCRM Job Scheduling
  9. Batch Job Limits
  10. Building a Job Control Script
  11. Optimizing CPU Usage
  12. Batch Utilities and Commands
    1. psub: Submitting a Job
    2. pstat, spjstat, ju: Displaying Job Status
    3. prm: Cancelling a Job
    4. phold, prel: Holding and Releasing Jobs
    5. palter: Changing a Job's Attributes
    6. pexp: Expediting a Job
    7. phist: Job Memory Statistics and History
    8. phstat: Showing a Host's Attributes
    9. plim: Showing a Machine's Job Limits
    10. lrmmgr: Obtaining Configuration Information
  13. Batch Debugging, I/O and Miscellaneous Considerations
  14. References and More Information
  15. Exercise



Abstract


The Livermore Computing Resource Management System (LCRM) is a product of LLNL Livermore Computing Center (LC). Its primary purpose is to allocate computer resources, according to resource delivery goals, for LC's production computer systems. It is the batch system that LC users use to submit, monitor, and interact with their production computing jobs.

This tutorial begins with a brief overview of LCRM and its two primary functional components, the Resource Allocation and Control System and the Production Workload Scheduler. Each of these components is then further explored, with a practical focus on describing commands and utilities that are provided for the user's interaction with LCRM. Building job command scripts, running parallel jobs, and job scheduling policies are also included. The lecture is followed by a lab exercise.

Note: LCRM was formerly known as the Distributed Production Control System (DPCS)

Level/Prerequisites: Beginner. The material covered by the following tutorials would also be useful:
EC3501: Introduction to Livermore Computing Resources
EC3503: IBM POWER Systems Overview
EC3516: IA32 Linux Clusters Overview



LCRM Overview

Resource Delivery Goals:

Architecture:



Resource Allocation & Control System (RAC)



LCRM Bank Structure

LCRM bank structure example


Bank Shares



User RAC Utilities

The following commands enable you to query/set Resource Allocation & Control System (RAC) parameters. Only a brief description of each is provided. Additional detailed information (man page) can be obtained by clicking on the hyperlinked command names.

defbank


newbank


pshare


bac


brlim


pquota


lrmusage / pcsusage


LCRM Usage GUI



Production Workload Scheduler (PWS)



LCRM Job Scheduling


Fair Share with Half-Life Decay of Usage:

Other Considerations:



Batch Job Limits


In General:

How Do I Find Out What the Limits Are?

  1. The most up to date information can be found by logging in and issuing the command news job.lim.[system]. For example:
    news job.lim.thunder    news job.lim.white
    news job.lim.mcr        news job.lim.gps
    news job.lim.ilx        news job.lim.um

    If you're not sure of the actual command to use, try news job.limits - it usually provides helpful hints.

  2. Job limits can also be found by consulting the "LC Job Limits for All OCF Production Machines" (LLNL internal) web page. Essentially, this is listing of the job.lim output for all OCF machine on a single page.

  3. Use the tables below. They cover every LC production machine, with the caveat that they are not necessarily as up to date as the previous two methods. The tables below reflect job limits as of 6/05.


ASC IBM Systems:


Intel Systems:


Compaq Systems:



Building a Job Control Script

LCRM Job Control Options:

-tM versus -tW ?

Other Notes:



Optimizing CPU Usage


ASC IBMs:

Note that for threaded processes, having "unused" CPUs is actually the right thing to do, since the threads will need to execute on them.

Linux clusters with a Quadrics switch:

Compaqs and Linux clusters without a Quadrics switch:



Batch Utilities and Commands


LCRM provides the following utilites/commands for managing your batch job. A brief description of each is provided. Additional detailed information can be reviewed in each command's man page by clicking on the hyperlinked command name.

psub


pstat


spjstat & spj


ju


prm


phold & prel


palter


pexp


phist


phstat


plim


lrmmgr



Batch Debugging, I/O and Miscellaneous Considerations


Batch Debugging

I/O Issues

Miscellaneous


This completes the tutorial.

Evaluation Form       Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?



References and More Information