Job Scheduling

This information is applicable for IITKGP internal users. Other users in NSMapp, NSMext, etc., are requested to contact CDAC for information on storage, charging policy, and queuing priority.

Scheduler

PARAM Shakti uses Slurm 22.05.09 (open-source) as the workload manager for the HPC facility. The following partitions (queues) have been defined to meet different computational requirements.

Partition Min-Max Cores / Nodes per Job Max Walltime Priority Comments
Shared 1-36 cores 03 days 200 Compute nodes only. Node sharing is allowed between different jobs.
Medium 1-10 nodes 03 days 200 Compute and high-memory nodes. Nodes are allocated exclusively; node sharing is not allowed.
Large 1-10 nodes 07 days 10 Compute and high-memory nodes. Intended for long-running jobs; node sharing is not allowed.
GPU GPU nodes only. Configuration remains unchanged.
Partition Details
  1. Shared Partition: Designed for serial and OpenMP jobs. Users may request a minimum of 1 core and a maximum of 36 cores. The maximum walltime is 3 days.
  2. Medium Partition: Suitable for single-node and multi-node jobs. Entire nodes are allocated exclusively. Users may request between 1 and 10 nodes. The parameter --ntasks-per-node=40 must not be changed. Maximum walltime is 3 days.
  3. Large Partition: Similar to the medium partition but intended for longer jobs with a maximum walltime of 7 days. Due to its lower priority, this partition should be used only when longer walltime is required.
  4. GPU Partition: Dedicated to GPU-based workloads. Configuration remains unchanged.

Types of Nodes
  1. Compute Nodes (cn): The system comprises 384 compute nodes, labelled cn001 to cn384. These nodes are distributed across the shared and medium/large partitions based on historical usage and job queue demand.
  2. High-Memory Nodes (hm): A total of 36 high-memory nodes, labelled hm001 to hm036, are available only in the medium and large partitions. These nodes are intended for memory-intensive workloads.

    To request more than 4.3 GB of memory per core, users may specify either of the following in the job submission script:
    • #SBATCH --exclude=cn[085-384]
    • #SBATCH --mem-per-cpu=AA G (replace AA with a value between 4 and 18)
    This ensures allocation of the required memory (in GB) per CPU core.
  3. GPU Nodes (gpu): The system includes 22 GPU nodes, labelled gpu001 to gpu022. These nodes are accessible exclusively through the GPU partition. To request a GPU, include:
    #SBATCH --gres=gpu:1

Storage Policy

⚠️ Important

Users must submit jobs from the /scratch/$USER directory.

File System Quota Retention Period
/home Soft Limit: 40 GB
Hard Limit: 50 GB
*Except for UG students
Unlimited
/scratch Soft Limit: 2 TB 60 days
Files older than retention period will be deleted

Users can check their /home and /scratch quota using the myquota command.

Storage quota
Usage Instructions
  • /home/<username>/: Used for installing applications.
  • /scratch/<username>/: Used for project/research data.

Users are advised to maintain local backups of their data. Storage and charging policies are subject to change.


Data Safety

Users are advised to keep a backup of their data locally. Once the project/research work is completed, transfer the data from PARAM Shakti to your local system using the commands shown in the File Transfers section. Users' data stored on scratch directory on PS is NOT backed up or archived by the system administration. The PS administration is also not responsible for restoring damaged or lost files for users. Backing up and archiving data is the responsibility of the user and his/her Project Guide/Adviser.

EN