Optimally determine the number of cores to use to set up a new cluster, based on:
the number of cores available (see note);
the amount of free memory available on the local machine;
the number of cores requested vs. the number available, such that if requesting more cores than available, the number of cores used will be adjusted to be a multiple of the number of cores needed, so jobs can be run in approximately-even-sized batches. (E.g., if 16 cores available but need 50, the time taken to run 3 batches of 16 plus a single batch of 2 -- i.e., 4 batches total -- is the same as running 4 batches of 13.)
optimalClusterNumGeneralized(
memRequiredMB = 500,
maxNumClusters = parallel::detectCores(),
NumCoresAvailable = parallel::detectCores(),
availMem = pemisc::availableMemory()/1e+06
)
optimalClusterNum(
memRequiredMB = 500,
maxNumClusters = parallel::detectCores()
)
The amount of memory needed in MB
The number of nodes needed (requested)
The number of cores available on the local machine (see note).
The amount of free memory (RAM) available to use.
integer specifying the number of cores
R hardcodes the maximum number of socket connections it can use (currently set to 128 in R 4.1). Three of these are reserved for the main R process, so practically speaking, a user can create at most 125 connections e.g., when creating a cluster. See https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28.
We limit this a bit further here just in case the user already has open connections.