September 18, 2017

Summit Workshop:

An introductory workshop “Getting Started on Summit” will be offered on Wednesday, Sept. 20, 10am – noon, in the Library Events Hall, Morgan Library, Room #167.  The workshop is open to anyone currently using, or interested in using, the CSU/CU Summit high-performance computing system (hpc.colostate.edu).     


July 5, 2017

Summit Users,

The 2017 RMACC Symposium
https://www.rmacc.org/HPCSymposium
will be held Aug. 15-17 at CU Boulder.  If you would be interested in giving a short presentation about your research using the Summit supercomputer, please let me know (richard.casey@colostate.edu) by July 10.

Richard Casey


June 18, 2017

Change to memory available to compute jobs on Summit nodes.

Over the last few months we have been able to collect more data about memory usage and node performance on Summit, and as a result have determined that it is necessary to allot slightly more memory to system processes and less to running jobs.  Thus, immediately following the July 5, 2017, planned maintenance downtime, the amount of memory available to compute jobs will be reduced by 10 GB per node.

The resulting maximum memory that Slurm will allow a job to use will then be:

Partition per-core limit (MiB) per-node limit (MiB)
shas (Haswell) 4,944 118,658
sgpu (GPU) 4,944 118,658
smem (High-memory) 42,678 2,048,544

If your jobs are tuned to use just under the existing memory limits, you will probably need to reduce their memory usage slightly in order to avoid job failures.  You can check on the maximum memory usage of a completed job by running “sacct –job=Your-JobID –format=jobid,maxrss” .  The maxrss is maximum memory usage (“resident set size”.)  Note that Slurm queries the job’s resource usage only a few times per minute, so the number reported by sacct may not include brief memory spikes.

This change targets a failure mode observed on Summit in which user applications starve system processes for memory. In particular, we have observed GPFS (which serves /scratch/summit/) running out of memory, resulting in job failures due to GPFS being killed.


June 18, 2017

Research Computing will perform regularly-scheduled planned maintenance
Wednesday, 5 July 2017.

July’s activities include:

* Installation of an Uninterruptible Power System (UPS) in the HPCF
supporting Summit. This is the first of three outages toward the
deployment of the UPS, with future outages 7/19-7/20 and 8/2-8/4.
* Update to Summit Slurm resource management to allocate 10GiB of system
memory for the OS and scratch file system.
* Testing and deployment of Slurm Prolog/Epilog scripts to automate the
creation and removal of job-specific node-local scratch directories.

Maintenace is scheduled to take place between 07:00 and 19:00, though service
will be restored as soon as all activities have concluded. During the
maintenance period no jobs will run on Summit or Crestone resources, and
Summit scratch will be unavailable during the installation of the UPS in the
HPCF.

More information on the upcoming multi-day Summit outages towards the
deployment of a supporting UPS will be provided following the July PM.

If you have any questions or concerns, please contact rc-help@colorado.edu.


May 26, 2017

Planned maintenance Wednesday, 7 June 2017

Research Computing will perform regularly-scheduled planned maintenance
Wednesday, 7 June 2017. This month’s activities include:

* A complete HPCF power-down to support work in CINC
* An upgrade of one of the network interface cards (NIC) in a DTN server to
bring both to 40GbE
* An upgrade of PetaLibrary/archive GPFS
* An upgrade of the GitLab software
* Final decommissioning of the Janus compute environment, including
Janus-era GPU and himem resources

Maintenance is scheduled to take place between 07:00 and 19:00, though service
will be restored as soon as all activities have concluded. During the
maintenance period no jobs will run on Summit, and Janus and Summit scratch
will be unavailable during the HPCF power-down. Access to PetaLibrary/archive
and gitlab will be momentarily interrupted during their respective upgrades.  If you have
any questions or concerns, please contact rc-help@colorado.edu.


May 9, 2017

The University of Colorado Research Computing group is offering a workshop on Parallel Programming in Matlab, Python and R.  CSU researchers are welcome to attend.


May 1, 2017

Here is the slideset for the Getting Started on Summit workshop that was presented on May 1, 2017.


April 28, 2017

Planned maintenance Wednesday, 3 May 2017

Research Computing will perform regularly-scheduled planned maintenance
Wednesday, 3 May 2017. This month’s activities include:
* update of software on Summit compute nodes and Omnipath switches;
* update of Gitlab;
* replacement of failed parts in Janus Scratch;
* testing larger datagram MTU for Omnipath network on Summit;
* performance validation of the Janus compute environment.

Maintenance is scheduled to take place between 07:00 and 19:00, though
service will be restored as soon as all activities have concluded. During the
maintenance period no jobs will run on Summit resources. If you have
any questions or concerns, please contact rc-help@colorado.edu.


April 10, 2017

The workshop “Getting Started on Summit” will be held Monday, May 1, 10-11am, Library Events Hall.


April 4, 2017

Summit will be down 0600 – 1900, Wed., April 5 for scheduled maintenance.


April 3, 2017

The Summit Supercomputer Training Workshop on April 4 has been cancelled and will be rescheduled for sometime later in April.


March 15, 2017

The Summit Supercomputer Training Workshop has been rescheduled from Monday, March 20 to Tuesday, April 4, 10-11am, in the Library Event Hall.


 

 

 

 

 

 

February 2, 2017

The Summit High-Performance Computing System is now running in full production mode.  You may create an account by following instructions on the Accounts page.


January 25, 2017

The new Summit High-Performance Computing System will be available to the CSU community on February 2, 2017.


May 5, 2016

ISTeC Seminar on Intel HPC Directions

This announcement is for an ISTeC-sponsored seminar on directions Intel has for its High-Performance Computing technology. This technology is important to CSU, as the nodes in the shared HPC system to be installed in June at CU Boulder are the latest Intel nodes. Also, phase two of that installation will be a smaller Knights Landing Intel-based system that uses the next-generation chip that has the on-chip OmniPath Intel interconnect.

Come hear about Intel’s directions in their HPC technology, and how it overlays with the two systems to be installed at CU Boulder.

Title of Seminar: ISTeC seminar – Intel’s HPC Directions
Date: Thursday, May 12, 2016
Time: 3-4:30 PM
Location: Room 203 of Morgan Library


March 21, 2016

HPC Wire

The new Summit Supercomputer was featured in an HPC Wire press release. Check out the press release to see what’s coming to CSU/CU in June.


October 09, 2015

High-powered supercomputer to boost Rocky Mountain research via Colorado State University Source.