Slurm HPC executor (evopt.slurm_executor)
Warning
This module is part of the internal implementation of evopt and is not intended for direct use by end users. Its API may change without notice.
SLURM-based parallel execution for evolutionary optimization tasks.
This module provides a concurrent.futures.Executor implementation that submits tasks to a SLURM cluster for parallel execution. It handles job submission, monitoring, and result retrieval, providing a seamless interface for distributed computing on HPC clusters running SLURM workload manager.
The SlurmExecutor integrates with the Python concurrent.futures API, allowing code written for ProcessPoolExecutor to run on SLURM clusters with minimal changes.
Example
Basic usage submitting a function to SLURM:
>>> from evopt.slurm_executor import SlurmExecutor
>>> executor = SlurmExecutor(max_workers=4, cores_per_worker=2, memory_gb=8)
>>> future = executor.submit(my_function, arg1, arg2)
>>> result = future.result() # Blocks until the job completes
>>> executor.shutdown()
Using with concurrent.futures API:
>>> with SlurmExecutor(max_workers=10) as executor:
... results = list(executor.map(my_function, my_args))
- class evopt.slurm_executor.SlurmExecutor(max_workers: int = 1, cores_per_worker: int = 1, memory_gb: float = 4, wall_time: str = '1:00:00', qos: str | None = None)[source]
Bases:
ExecutorExecutor implementation that submits tasks as SLURM jobs.
This class provides a concurrent.futures.Executor interface for executing tasks on a SLURM cluster. It handles job submission, monitoring, and result retrieval, allowing for efficient parallel execution across multiple compute nodes.
The executor maintains a pool of worker slots limited by max_workers, submitting new jobs as slots become available and collecting results asynchronously.
- max_workers
Maximum number of concurrent SLURM jobs.
- Type:
int
- cores_per_worker
Number of CPU cores to request per job.
- Type:
int
- memory_gb
Memory in GB to request per job.
- Type:
float
- wall_time
Maximum wall time for each job in format “HH:MM:SS”.
- Type:
str
- qos
Quality of Service to request from SLURM.
- Type:
str or None
- job_manager
Manager for SLURM job operations.
- Type:
SlurmJobManager
- base_dir
Temporary directory for job files and results.
- Type:
str
Examples
Submit multiple tasks and process results as they complete:
>>> executor = SlurmExecutor(max_workers=4) >>> futures = [executor.submit(process_data, data_chunk) for data_chunk in data] >>> for future in concurrent.futures.as_completed(futures): ... try: ... result = future.result() ... print(f"Task completed with result: {result}") ... except Exception as e: ... print(f"Task failed: {e}") >>> executor.shutdown()
- shutdown(wait: bool = True)[source]
Shutdown the executor and clean up resources.
This method marks the executor as shutdown, cancels any running SLURM jobs, and prevents new jobs from being submitted. If wait is True, it will not return until all currently executing jobs have completed.
- Parameters:
wait – Whether to wait for running jobs to complete before returning. Default is True.
- Returns:
None
Example
>>> # Shutdown and cancel all running jobs >>> executor.shutdown(wait=False) >>> >>> # Shutdown and wait for all jobs to complete >>> executor.shutdown(wait=True)
- submit(fn, *args, **kwargs)[source]
Submit a function for execution as a SLURM job.
This method submits a function and its arguments to be executed as a SLURM job. It creates a temporary directory for the job, pickles the function and arguments, generates a Python script to run in the SLURM environment, and submits the job.
The method returns a Future object that can be used to check the status and retrieve the result of the job.
- Parameters:
fn (callable) – The function to execute.
*args – Positional arguments to pass to the function.
**kwargs – Keyword arguments to pass to the function.
- Returns:
A Future representing the execution of the function.
- Return type:
concurrent.futures.Future
- Raises:
RuntimeError – If the executor has been shut down.
Exception – Any exception raised during job submission is set on the future.
Example
>>> def my_task(x, y, z=1): ... return x * y * z >>> >>> executor = SlurmExecutor(max_workers=4) >>> future = executor.submit(my_task, 2, 3, z=4) >>> result = future.result() # Wait for the job to complete >>> print(result) # Should print 24