Get Started
If you prefer to use Makefile provided in scripts directory, please copy it into src directory and follow the below instruction.
Instead, if you prefer to use autotools, please follow the instruction described in MPI-Rockstar - Autotools Build Guide.
Requirements and Dependencies
Software |
Notes |
|---|---|
C/C++ compiler (GCC / Intel / Clang) |
Must support OpenMP |
MPI implementation (Open MPI / MPICH, etc., >= MPI-2.2) |
|
libtirpc + development headers |
Required when |
HDF5 library (optional, >= 1.8.0) |
Required only when mpi_rockstar_hdf5 is compiled |
You have to set the compiler that is being used in Makefile.
Compiling and Running
You can compile MPI-Rockstar with this command at the root directory of the code.
make mpi-rockstar -C src
When you need the HDF5 support to file IO, use this command after setting appropriate path to HDF5_INCLUDE and HDF5_LIB in Makefile
make mpi-rockstar_hdf5 -C src
if compiling is not successfully due to wrong paths/setting, once the issue is fixed in Makefile,
the user should first run make clean before proceeding to recompile again.
make clean
In some environments, due to a compatibility issue, you may encounter this or something similar error.
/usr/include/tirpc/rpc/xdr.h:111:52: error: unknown type name 'u_int'
You may be able to solve it by removing -I/usr/include/tirpc from .c.o: in the Makefile.
Then, you can run MPI-Rockstar as follows,
mpiexec -n <the number of MPI processes> mpi-rockstar -c <path to a configuration file>
You can find examples of configuration file in the /examples directory. Then, the number of OpenMP threads can be set by
export OMP_NUM_THREADS=<the number of OpenMP threads>
When you use a batch job system, please follow the instruction of the system.
If Rockstar is terminated for any reason, it is easy to restart it where it left off. Simply run as follows,
mpiexec -n <the number of MPI processes> mpi-rockstar -c OUTBASE/restart.cfg
Then, MPI-Rockstar will resume analysis from the last incomplete snapshot. restart.cfg is automatically generated by the code.
Here, OUTBASE is an output directory where MPI-Rockstar will write all of its data products, and you can also optionally specify it in your configuration file:
OUTBASE = "/desired/output/path" # default is current directory
Another important variable to set in your configuratin file is the force resolution of the simulation:
FORCE_RES = <force res. of sim., in Mpc/h; default 0.003>
Halos whose centers are closer than FORCE_RES are usually noise, and are subject to stricter removal tests than other halos.
In terms of memory usage, MPI-Rockstar will use about 60 bytes / particle maximum total for a cosmological simulation. Thus, if you have a 1024^3 particle simulation and 2GB of memory available per processor, you should plan on using at least 32 CPUs in parallel.
Test Dataset
For tests, we provide tiny (\(N=256^3\)) and small (\(N=1024^3\)) particle dataset around \(z\sim0\) from cosmological N-body simulations here. The former consists of single file (one snapshot), and the latter consists of 640 files per snapshot (2 snapshots are provided). After downloading these files, you can analyze those data as follows using configuration files in the /examples directory.
mpiexec -n XXX mpi-rockstar -c parallel_256.cfg
mpiexec -n XXX mpi-rockstar -c parallel_1024.cfg
mpiexec -n XXX mpi-rockstar_hdf5 -c parallel_1024.cfg #with HDF5 output and new configuration options
Abolished Configuration options from the Original Rockstar
MPI-Rockstar can no-longer run on single process, therefore, only PARALLEL_IO=1 is accepted (default value). The number of writer and reader processes are automatically set from the number of processes, therefore, NUM_WRITERS, NUM_READERS, FORK_READERS_FROM_WRITERS, and FORK_PROCESSORS_PER_MACHINE are abolished.
Performance Tips
The optimal MPI rank / OpenMP thread configuration depends strongly on the target system, especially on cores-per-node/NUMA/memory bandwidth and the interconnect performance (latency/bandwidth/topology). In general
For small-to-moderate runs (relatively small total number MPI rank), communication is less likely to dominate, so a good starting point is to use more MPI ranks with fewer threads (often
OMP_NUM_THREADS=1).For very large runs (very large numer of MPI rank) or when memory footprint becomes a concern, using fewer MPI ranks with more threads can be beneficial to reduce MPI overhead and memory consumption per rank.
Since this depends on the target system, we recommend a short benchmark test over a few configurations at fixed total core count (vary ranks-per-node and OMP_NUM_THREADS while keeping their product equal to the available physical cores), and selecting the best for the target.