Parallel Computing with MPI
Vayesta can construct and solve multiple quantum embedding problems (as defined by the fragmentation) in parallel over distributed memory architecture, using the Message Passing Interface (MPI) and the Python bindings provided by mpi4py.
Not all functions have been tested in combinations with MPI. It is always adviced to perform a smaller test run, in order to verify that parallel and serial excecution yield the same results. Please open an issue on the GitHub page to report any bugs or unexpected behavior.
mpi4py can be installed using pip:
[~]$ pip install mpi4py
Running an MPI Job
Running a calculation in parallel is as simple as excecuting
[~]$ mpirun -np N jobscript.py
in the console, where
N is the desired number of MPI processes.
For the best possible parallelization, use as many MPI processes as there are fragments
(for example three for an atomic fragmentation of a water molecule), for which the scaling
with number of MPI processes should be favourable.
When it is necessary to use fewer MPI processes than fragments, the processes will then
calculate their assigned set of embedding problems sequentially.
It is never advised to use more MPI processes than there are fragments; the additional processes
will simply be idle.
Note that if a multithreaded BLAS library is linked, then the embedding problems assigned to each MPI rank will in general still be solved in a multithreaded fashion. Therefore, a good strategy is to assign the MPI ranks to separate nodes (ideally equal to the number of fragments) and use a multithreaded (e.g. OpenBLAS) library with multiple threads over the cores of the node for the solution and manipulation of each embedded problems.
Many multithreaded libraries do not scale well beyond 16 or so threads for typical problem sizes. For modern CPUs the number of cores can be significantly higher than this and, unless memory is a bottleneck, it can be beneficial to assign multiple MPI ranks to each node.
While any job script should in principle also work in parallel,
there are some additional considerations, which mainly concern file IO and logging.
They are demonstrated at this example, which can be found at
1# Run with: mpirun -n 3 python 90-mpi.py 2import pyscf 3import pyscf.gto 4import pyscf.scf 5import pyscf.cc 6import vayesta 7import vayesta.ewf 8from vayesta.mpi import mpi 9 10mol = pyscf.gto.Mole() 11mol.atom = """ 12O 0.0000 0.0000 0.1173 13H 0.0000 0.7572 -0.4692 14H 0.0000 -0.7572 -0.4692 15""" 16mol.basis = "cc-pVDZ" 17mol.output = "pyscf-mpi%d.out" % mpi.rank 18mol.build() 19 20# Hartree-Fock 21mf = pyscf.scf.RHF(mol) 22mf = mf.density_fit() 23mf = mpi.scf(mf) 24mf.kernel() 25 26# Embedded CCSD 27emb = vayesta.ewf.EWF(mf, bath_options=dict(threshold=1e-6)) 28emb.kernel() 29 30# Reference full system CCSD 31if mpi.is_master: 32 cc = pyscf.cc.CCSD(mf) 33 cc.kernel() 34 35 print("E(HF)= %+16.8f Ha" % mf.e_tot) 36 print("E(CCSD)= %+16.8f Ha" % cc.e_tot) 37 print("E(Emb. CCSD)= %+16.8f Ha" % emb.e_tot)
Vayesta will generate a separate logging file for each MPI rank, but PySCF does not. To avoid chaotic logging, it is adviced to give the
molobject of each MPI process a unique output name (see line 17).
PySCF does not support MPI by default. The mean-field calculation will thus simple be performed on each MPI process individually, and Vayesta will discard all solutions, except that obtained on the master process (rank 0). To save electricy, the function
vayesta.mpi.mpi.scf(mf)can be used to restrict the mean-field calculation to the master process from the beginning (see line 22). Alternatively, for a more efficient workflow in situations with significant mean-field overheads, the initial mean-field calculation can be performed separately, and the mean-field read in. See the PySCF documentation for how this can be done.
Output should only be printed on the master process. The property
mpi.rank == 0) can be used to check if the current MPI process is the master process (see line 30)