Resource Sharing Proof of Concept

The ERN Resource Sharing Proof of Concept uses the SLURM scheduler to launch high performance computing jobs on different clusters at multiple sites. The prototype implementation enabled launching of jobs across federated clusters at 9 cooperating campuses in five states, plus the Google Cloud platform. It is now being put into production use for a cooperative effort between Rutgers and Penn State.

The prototype implementation demonstrates the ability to launch containerized (Singluarity) jobs as well as traditional HPC jobs, from any site to any other site including the Google cloud, via SLURM scheduler.

Future Enhancements

We are working with with Google, Cisco, Internet2, SchedMD (SLURM), and OSG on several possible enhancements:

  • InCommon authentication and authorization
  • Policy based federation of computing resources
  • Reserving slices of the network
  • Cloud bursting
  • OSG federation with containerized SLURM (slurmd)
  • Testing different file system and data sharing approaches

Current Deployment

Our github repository contains details on the current deployment.