Administration

There are two types of allocations of time on Summit:

  1. Initial
  2. Project

Initial
All new accounts will receive an Initial allocation of 50,000 SU/yr. The Initial allocation will expire after 1 year.

Project
If users exhaust their Initial allocation of 50K SU’s but still require more time, they can apply for a Project allocation.
If you want to apply for a Project Allocation, please submit the Summit Project Allocation Request Form
The Summit Management and Allocations Review Committee will review and assess Project applications and respond within 1 week on the status of the request.

Service Units (SU):

Allocations of time on Summit are based on Service Units (SU) and CPU core-hours as follows:

1 SU = 1 CPU core-hour (i.e. a single fully utilized CPU core on 1 compute node for 1 hour)

SU’s are distributed across CPU nodes, GPU nodes and HiMem nodes as shown below.  KnL-F nodes will be included when they are installed (expected delivery date is May, 2017).  GPU and HiMem node SU’s are scaled up from CPU node SU’s, based on the differential (higher) costs for GPU and HiMem nodes.

380 CPU nodes
24 cores/node * 24 hrs/day * 365 days/yr = 210,240 CPU core-hrs/node/yr = 210,240 SU/node/yr
380 CPU nodes * 210,240 SU/node/yr = 79,891,200 SU/yr

10 GPU nodes
2.5X CPU node SU
2.5 * 210,240 SU/node/yr = 525,600 SU/node/yr
10 GPU nodes * 525,600 SU/node/yr = 5,256,000 SU/yr

5 HiMem nodes
12X CPU node SU
12 * 210,240 SU/node/yr = 2,522,880 SU/node/yr
5 HiMem nodes * 2,522,880 SU/node/yr = 12,614,400 SU/yr

Total SU’s
79,891,200 SU/yr + 5,256,000 SU/yr + 12,614,400 SU/yr = 97,761,600 SU/yr

The Total annual SU’s available on Summit are distributed among the three NSF grant participants as shown below:

RMACC = 10% of Total = 0.10 * 97,761,600 SU/yr = 9,776,160 SU/yr
CSU = 25% of non-RMACC share = 0.25 * (97,761,600 SU/yr – 9,776,160 SU/yr) = 21,996,360 SU/yr
UCB = 75% of non-RMACC share = 0.75 * (97,761,600 SU/yr – 9,776,160 SU/yr) = 65,989,080 SU/yr

The Summit system will support a Condominium Computing Model (“Condo Model”) for researchers who choose this method of participation.

In the Condo Model, costs are split between researchers and Central IT. Researchers purchase their own compute nodes and Central IT provides the hosting environment and support services for those nodes.

Researchers purchase:

  • CPU compute nodes
  • GPU accelerators (if applicable)
  • KnL-F accelerators (if applicable; available Q2 2017)
  • Memory
  • Disk storage

Central IT provides:

  • Data center facility
  • Shared service nodes (i.e. login nodes)
  • Shared OmniPath interconnect switches and cables
  • Ethernet management switches and cables
  • Shared scratch storage
  • Server racks
  • Power
  • Cooling
  • Security
  • Purchase, order & install equipment
  • Install OS
  • System administration
  • Assist with software application installation

Through academic discounts and volume purchase agreements, Central IT can negotiate on behalf of researchers to get reduced prices for compute node resources.

The Summit Condo Model Financials spreadsheet below lists the three types of compute nodes presently available to researchers: Haswell CPU nodes, GPU nodes and HiMem nodes. The CPU nodes include Intel Haswell CPU’s. The GPU nodes include 2X Nvidia Tesla K80 GPU accelerators. The HiMem nodes include 2 TB RAM. (Note: Intel Knights-Landing Phi (KnL-F) nodes will become available in Q2 2017).

The spreadsheet highlights the cost per node for CSU researchers (i.e.“CSU/CU subsidized cost / node”). Currently, limited funds are available on a first-come first-serve basis to assist researchers with the purchase of compute nodes.

Grant & Proposal Information

The following statement may be used for Summit Acknowledgements:

“This work utilized the RMACC Summit supercomputer, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder and Colorado State University. The RMACC Summit supercomputer is a joint effort of the University of Colorado Boulder and Colorado State University.”

The following document includes information that may be used for grants, RFP’s and other solicitations:

Information for Grants and Proposals

Technical

There are several methods for remote login to the Summit system, as shown below.

 

SSH

 

Method 1 (recommended):

ssh csu_eID@colostate.edu@login.rc.colorado.edu
csu_password,push

where “csu_password” is your regular CSU eID password and “push” is a literal string (case insensitive)(see DUO Two-factor Authentication instructions)  Don’t forget the comma “,”.

 

Method 2:

ssh csu_eID@colostate.edu@login.rc.colorado.edu
csu_password,DUO_key

where “csu_password” is your regular CSU eID password and “DUO_key” is the two-factor key obtained from your cell phone DUO app (see DUO Two-factor Authentication Instructions). Don’t forget the comma “,”.

 

NOTE: The DUO_key mentioned above cycles every 15 seconds. Therefore, in your cell phone DUO app, once you tap the key icon and generate a 6-digit DUO_key, you must enter or use that key within 15 seconds, otherwise it will expire and you’ll have to generate another key.

 

SSH Client Software

Here are some popular client software applications for ssh:

Apple OSX Terminal

Windows PuTTY

Linux Terminal

There are several methods for file transfers to/from the Summit system, as shown below.

SFTP

NOTE: The DUO_key mentioned below cycles every 15 seconds. Therefore, in your cell phone DUO app, once you tap the key icon and generate a 6-digit DUO_key, you must enter or use that key within 15 seconds, otherwise it will expire and you’ll have to generate another key.

Method 1

sftp csu_eID@colostate.edu@login.rc.colorado.edu
csu_password,push

where “csu_eID” is your regular CSU eID, “csu_password” is your regular CSU password and “push” is a literal string (case insensitive)(see DUO Two-factor Authentication instructions). Don’t forget the comma “,”.

Method 2

sftp csu_eID@colostate.edu@login.rc.colorado.edu
csu_password,DUO_key

where “csu_eID” is your regular CSU eID, “csu_password” is your regular CSU password and “DUO_key” is the two-factor key obtained from your cell phone DUO app (see DUO Two-factor Authentication instructions). Don’t forget the comma “,”.

Method 3

Install Filezilla on your workstation.

Launch Filezilla, then choose “File -> Site Manager…”

Choose “New Site”, enter a name (i.e. Summit) in the My Sites list, and then enter the following information:

  • Host: login.rc.colorado.edu
  • Protocol: choose “SFTP – SSH File Transfer Protocol”
  • Logon Type: Normal
  • User: csu_eID@colostate.edu
  • Password: csu_password,push

where “csu_eID” is your regular CSU eID, “csu_password” is your regular CSU password and “push” is a literal string (case insensitive)(see DUO Two-factor Authentication instructions). Don’t forget the comma “,”.  Then click OK. This step saves the profile information for Summit.

To use Filezilla, choose “File -> Site Manager…”.  In the My Sites list choose the profile name for Summit (i.e. Summit), and click Connect.  Immediately, the DUO application on your cell phone should prompt you to “Approve” the request.  You will then be connected to Summit in sftp mode.   A file list in your Summit home directory should display in the “Remote Site” area of Filezilla.  At this point you can “drag-and-drop” files to/from Summit.

Method 4

For Windows OS only.

Download and install WinSCP on your workstation.

Configure WinSCP as follows:

Host: login.rc.colorado.edu
Protocol: SFTP
Username: csu_eID@colostate.edu
Password: csu_password,DUO_key

where “csu_eID” is your regular CSU eID, “csu_password” is your regular CSU password and “DUO_key” is the two-factor key obtained from your cell phone DUO app (see DUO Two-factor Authentication instructions). Don’t forget the comma “,”.

Globus

Efforts are underway to add Globus as a file transfer method.  We’ll post updates here on progress with Globus.

There are several directories available for user accounts as shown below.  csu_eName is your CSU eName.

 

Home directory
/home/csu_eName@colostate.edu
2 GB
Permanent
Daily incremental backups

 

Project directory
/projects/csu_eName@colostate.edu
250 GB
Permanent
Daily incremental backups

 

Scratch directory – global
/scratch/summit/csu_eName@colostate.edu
1 PB (petabyte)
Files purged after 90 days
NO backups

 

Scratch directory – local
/scratch/local
200 GB
SSD on each node
Files purged when job is finished
NO backups

 

Summit uses GPFS (General Parallel File System) for fast parallel I/O.

Summit includes the Lmod environment module system to simplify shell configuration and software application management.  Some common module commands are shown below.

Command syntaxPurpose
module listShow modules that are currently loaded
module availShow all modules that are available to be loaded
module spiderShow all modules that are available to be loaded (more extensive list)
module load module_nameLoad module module_name
module unload module_nameUnload module module_name
module swap module_name_1 module_name_2Unload module_name_1 and load module_name_2
module show module_nameShow information about module_name
module help module_nameShow information about module_name
module helpDescribe module commands and parameters

In the Lmod system, most software is not accessible by default.  Instead, it has to be loaded into your Linux shell environment using various module commands above.  This system allows users and system administrators to manage multiple versions of software concurrently and to easily switch between versions.

“Loading a module” sets or modifies a user’s environment variables to enable access to the software package provided by that module. For instance, the $PATH variable might be updated so that appropriate executables for that package can be used.

The Lmod environment module system is hierarchical.  There are five layers to support programs built with compiler and library consistency requirements. Modules can only be loaded once their dependencies have been satisfied. This prevents accidental loading of modules that are inconsistent with each other.  For example, in order to load an MPI-dependent program, it’s first necessary to load a compiler (i.e. Intel) and then an MPI implementation (i.e. IMPI) consistent with that compiler.

The five Lmod layers are:

  • Independent programs
  • Compilers
  • Compiler dependent programs
  • MPI implementations
  • MPI dependent programs

Summit supports a variety of compilers, interpreters and languages as shown below.  There are often several versions of each item in the list.  To see which versions are available on Summit enter:

module avail

OR

module spider

NameDescription
iccIntel C compiler
icpcIntel C++ compiler
ifortIntel Fortran compiler
gccGNU C compiler
g++GNU C++ compiler
gfortranGNU Fortran compiler
pgccPGI C compiler
pgCCPGI C++ compiler
pgfortranPGI Fortran compiler
module load intel
module load impi
mpicc
Intel MPI C compiler
module load intel
module load impi
mpicxx
Intel MPI C++ compiler
module load intel
module load impi
mpif90
Intel MPI Fortran compiler
module load openmpi
mpicc
OpenMPI C compiler
module load openmpi
mpicxx
OpenMPI C++ compiler
module load openmpi
mpif90
OpenMPI Fortran compiler
nvccNvidia CUDA compiler
pythonPython interpreter
perlPERL interpreter

Software Compilation

If you need to compile your own software applications, here is additional information about how to do it on Summit.


SLURM
 is the batch queueing system used on Summit.  The configuration of the batch queue system is shown below.

Partitions

Partition (short name)Partition (long name)Compute node typeDefault timeMax timeQoS
shassummit-haswellHaswell CPU nodes (380 nodes)4 hr24 hrN, D, C
sgpusummit-gpuNvidia K80 GPU nodes (10 nodes)4 hr24 hrN, D, C
sknlsummit-knlIntel Knights-Landing Phi nodes (20 nodes)4 hr24 hrN, D, C
smemsummit-himemHigh memory nodes (5 nodes)24 hr168 hr (7 days)N, D, L, C

QoS (quality of service)

QoSDescriptionLimits
Normal (N)DefaultPartition max; normal priority
Debug (D)Quick turnaround for testing1 hr; 1 job per user; 32 nodes max; priority boost
Long (L)For jobs with long runtimes168 hr (7 D); normal priority
Condo (C)For users who purchased compute nodes ("condo model")168 hr (7 D); normal priority

Slurm Commands

Command namePurposeExamples
sinfoView information about nodes and partitionssinfo
squeueView information about jobssqueue

squeue --account=acctname
sbatchSubmit a batch script for later executionsbatch scriptfile
scancelTerminate a running jobscancel --name=jobname

Condo users can use Summit as described here.

Condo jobs have the following privileges

  • can request longer run times (up to 168 hrs. (7 D))
  • get queue priority boost (equal to 1 D boost)
  • can access all compute nodes

To properly activate Condo shares, Condo users should send the following info to richard.casey@colostate.edu

  • full name
  • csu_eName
  • condo group ID (see table below)

Note: If you are unsure about which condo group ID to use, submit the info anyway.  We’ll determine the right condo group ID for you.

You will receive an email note when your condo group ID assignment is complete.  You’ll then be able to submit jobs using your Condo allocation.

The table below shows the condo group ID that has been assigned to each principal investigator and their department affiliation.

PIDept.Condo group ID
Michael AntolinBiologybio
Wolfgang BangerthMathematicsmat
Asa Ben-HurComputer Sciencehal
Stephen GuzikMechanical Engineeringcfd
Tony RappeChemistryakr
Chris WeinbergerMechanical Engineeringcrw
Ander WilsonStatisticsfhw

Condo Job Submission

To submit jobs using your Condo allocation, include the following lines in your Slurm batch job file

#SBATCH --qos condo
#SBATCH -A csu-smmit-xxx

where “xxx” is your condo group ID from the table above.  Note the double-dash for the “qos”parameter. When you submit a Slurm batch job file with these parameters, the job will run with the additional privileges described above.

The Summit system architecture is designed as shown below:

  • 9,120 Haswell CPU cores; 128 GB RAM / compute node
  • 99,840 GPU cores; Nvidia K80 GPU cards
  • 1,440 Knights-Landing Phi cores
  • 5 HiMem compute nodes with 2TB RAM / node
  • 1 Petabyte DDN SFA14K scratch storage
  • 100 GB/sec. OmniPath interconnect
  • 500 TF/sec. peak performance
  • Schematic rack layout
  • System components

R – The R Project for Statistical Computing

R is available on Summit.  Here are instructions to load and execute R.  After logging in to Summit enter:

>ssh scompile
>ml R
>R

This will load the R module and open an R session, where you can run most R commands.

HPC Summit