SCC uses the Grid Engine (GE) queuing system (Son of Grid Engine 8.1.8) for simulation job management (Please see the GE tutorial). The GE can be used in text mode and graphical (qmon) on the frontend server. All nodes are able to submit and execute jobs.
The following GE queues are currently available:
Queue | Slots | Default/max. run time | Usage | User |
---|---|---|---|---|
scc | 3892 | 7 days/10 days | All SCC nodes | all |
long | 2736 | 7 days/120 days | Long running jobs | all |
old | 488 | 7 days/30 days | Older Nodes (AMD and Intel) | all |
pc | 616 | 7 days/7 days | Workstations | @theophys |
gpu | 308 | 7 days/10 days | GPU nodes | all |
You can select a single queue or let GE decide by specifying needed resources (see below), but consider that requesting high values may impact negatively in your job scheduling.
All queues support serial and parallel jobs. For parallel jobs use the parallel environments whether your job uses shared memory (like OpenMP) or distributed memory (like MPI).
Parallel Environment | Usage | Max. Slots | Example |
---|---|---|---|
smp/openmp | Shared Memory (single node) | 16-64 | -pe smp 20 |
mpi | Distributed Memory | all | -pe mpi 42 |
mpi-20 | Distributed Memory (exclusive nodes) | n x 20 | -pe mpi-20 160 |
mpi-8/mpi-12/mpi-16/mpi-24 | Distributed Memory (exclusive nodes) | n x 8/12/16/24 | -pe mpi-8 16 |
The GE slots refer to real CPU core. To use Hyper-threading you need to specify the number of used cores explicitly in your job script (see GE tutorial).
All queues are configured for fair scheduling (ticket based job priority) and reservation (handle serial and parallel jobs at the same time) to treat all users and jobs fairly. The available resources per user obviously depend on the contribution of the users group to SCC.
For GPU-jobs please check the GPU-page.
Resources can be requested with the -l option. Especially h_vmem and h_rt are important for most jobs. All available resources are:
Resource | Example (qsub option) | Explaination |
---|---|---|
h_vmem | -l h_vmem=4G | request 4 GB memory PER SLOT for the job (default 1 GB, max 768 GB) |
h_rt | -l h_rt=48:00:00 | request 2 days run time (default 7 days, max 120 days) |
infiniband | -l ib | request Infiniband interconnect (fast network) |
exclusive | -l ex | request exclusive usage of a single node (use only for MPI jobs on single nodes, add "-w w" if job is rejected) |
max10 | -l m10 | limit your number of used slots by all jobs to 10 (useful if you don't want to fill the group quota) |
max100 | -l m100=2 | limit your number of used slots by all jobs to 50 |
max1000 | -l m1000=2.5 | limit your number of used slots by all jobs to 400 |
cputype | -l p="haswell|ivybridge" | request CPU type (epyc3,epyc,cascadelake,skylake,broadwell,haswell, ivybridge, sandybridge, phi, corei7, core2, core2duo) |
epyc3, epyc,cl,sl,bw,hw, ivy, sandy, phi,corei7, core2, core2duo | -l hw | request exact CPU type (EPYC 3 / EPYC / Cascadelake / Skylake / Broadwell / Haswell / Ivy Bridge / Sandy Bridge / Phi / Core i7 / Core 2 Quad / Core 2 Duo) |
avx | -l avx | Only nodes supporting AVX |
avx2 | -l avx2 | Only nodes supporting AVX2 (Haswell and higher) |
avx512 | -l avx512 | Only nodes supporting AVX512 (Skylake and Cascadelake) |
with following settings:
- default GE options: -cwd -q scc,old,pc,long -R y
- work in current working directory
- default queue: dlr, scc and pc
- reservation with back filling for parallel jobs
- fair scheduling (ticket based job priority)
- multiple queues per node without over subscription
- starter method to set OMP_NUM_THREADS to NSLOTS and set Modules environment
You can use qquota to see limitations applying for you. If you have a lot of jobs, please consider using array jobs.
Please do not send jobs to single nodes like "qsub -q scc@scc042". The best choise is almost always to use any queue and let the queuing system decide by the resources you specify. You may limit the selection of nodes by specifying certain hostgroups by using "-q scc@@scc-ivy-64GB", etc.
Hostgroups | Specification | Number of nodes |
---|---|---|
@scc-epyc3-256GB | AMD EPYC 3 CPU, 256 GB RAM | 4 |
@scc-cascadelake-192GB | Cascadelake CPU, 192 GB RAM | 8 |
@scc-skylake-192GB | Skylake CPU, 192 GB RAM | 28 |
@scc-broadwell-128GB | Broadwell CPU, 128 GB RAM | 4 |
@scc-broadwell-512GB | Broadwell CPU, 512 GB RAM | 12 |
@scc-haswell-64GB | Haswell CPU, 64 GB RAM | 8 |
@scc-haswell-256GB | Haswell CPU, 256 GB RAM | 4 |
@scc-ivy-64GB | Ivy Bridge CPU, 64 GB RAM | 16 |
@scc-ivy-256GB | Ivy Bridge CPU, 256 GB RAM | 35 |
@scc-sandy-64GB | Sandy Bridge CPU, 64 GB RAM | 2 |
@scc-sandy-128GB | Sandy-Bridge CPU, 128 GB RAM | 5 |
@scc-sandy-256GB | Sandy-Bridge CPU, 256 GB RAM | 3 |
@scc-gpu | Ivy Bridge CPU, Tesla K20 GPU | scc066 |
@scc-gpu2 | Haswell CPU, Tesla K80 GPU | scc116,scc117 |
@scc-gpu3 | Silver 4114 CPU, 4 NVIDIA V100 GPU | scc146 |
@scc-gpu-epyc | AMD EPYC 7401P, 8 RTX 2080TI | scc195-scc199 |
AMD EPYC 7713, 4 NVIDIA L40 | scc192 |