Get Started

If you prefer to use Makefile provided in scripts directory, please copy it into src directory and follow the below instruction. Instead, if you prefer to use autotools, please follow the instruction described in MPI-Rockstar - Autotools Build Guide.

Requirements and Dependencies

Software	Notes
C/C++ compiler (GCC / Intel / Clang)	Must support OpenMP
MPI implementation (Open MPI / MPICH, etc., >= MPI-2.2)	`mpicc` / `mpicxx` wrappers must be available
libtirpc + development headers	Required when `<rpc/xdr.h>` is absent (TIPSY I/O)
HDF5 library (optional, >= 1.8.0)	Required only when mpi_rockstar_hdf5 is compiled

You have to set the compiler that is being used in Makefile.

Compiling and Running

You can compile MPI-Rockstar with this command at the root directory of the code.

make mpi-rockstar -C src

When you need the HDF5 support to file IO, use this command after setting appropriate path to HDF5_INCLUDE and HDF5_LIB in Makefile

make mpi-rockstar_hdf5 -C src

if compiling is not successfully due to wrong paths/setting, once the issue is fixed in Makefile, the user should first run make clean before proceeding to recompile again.

make clean

In some environments, due to a compatibility issue, you may encounter this or something similar error.

/usr/include/tirpc/rpc/xdr.h:111:52: error: unknown type name 'u_int'

You may be able to solve it by removing -I/usr/include/tirpc from .c.o: in the Makefile.

Then, you can run MPI-Rockstar as follows,

mpiexec -n <the number of MPI processes> mpi-rockstar -c <path to a configuration file>

You can find examples of configuration file in the /examples directory. Then, the number of OpenMP threads can be set by

export OMP_NUM_THREADS=<the number of OpenMP threads>

When you use a batch job system, please follow the instruction of the system.

If Rockstar is terminated for any reason, it is easy to restart it where it left off. Simply run as follows,

mpiexec -n <the number of MPI processes> mpi-rockstar -c OUTBASE/restart.cfg

Then, MPI-Rockstar will resume analysis from the last incomplete snapshot. restart.cfg is automatically generated by the code. Here, OUTBASE is an output directory where MPI-Rockstar will write all of its data products, and you can also optionally specify it in your configuration file:

OUTBASE = "/desired/output/path" # default is current directory

Another important variable to set in your configuratin file is the force resolution of the simulation:

FORCE_RES = <force res. of sim., in Mpc/h; default 0.003>

Halos whose centers are closer than FORCE_RES are usually noise, and are subject to stricter removal tests than other halos.

In terms of memory usage, MPI-Rockstar will use about 60 bytes / particle maximum total for a cosmological simulation. Thus, if you have a 1024^3 particle simulation and 2GB of memory available per processor, you should plan on using at least 32 CPUs in parallel.

Test Dataset

For tests, we provide tiny (\(N=256^3\)) and small (\(N=1024^3\)) particle dataset around \(z\sim0\) from cosmological N-body simulations here. The former consists of single file (one snapshot), and the latter consists of 640 files per snapshot (2 snapshots are provided). After downloading these files, you can analyze those data as follows using configuration files in the /examples directory.

mpiexec -n XXX mpi-rockstar -c parallel_256.cfg
mpiexec -n XXX mpi-rockstar -c parallel_1024.cfg
mpiexec -n XXX mpi-rockstar_hdf5 -c parallel_1024.cfg  #with HDF5 output and new configuration options

Abolished Configuration options from the Original Rockstar

MPI-Rockstar can no-longer run on single process, therefore, only PARALLEL_IO=1 is accepted (default value). The number of writer and reader processes are automatically set from the number of processes, therefore, NUM_WRITERS, NUM_READERS, FORK_READERS_FROM_WRITERS, and FORK_PROCESSORS_PER_MACHINE are abolished.

Performance Tips

The optimal MPI rank / OpenMP thread configuration depends strongly on the target system, especially on cores-per-node/NUMA/memory bandwidth and the interconnect performance (latency/bandwidth/topology). In general

For small-to-moderate runs (relatively small total number MPI rank), communication is less likely to dominate, so a good starting point is to use more MPI ranks with fewer threads (often OMP_NUM_THREADS=1).
For very large runs (very large numer of MPI rank) or when memory footprint becomes a concern, using fewer MPI ranks with more threads can be beneficial to reduce MPI overhead and memory consumption per rank.

Since this depends on the target system, we recommend a short benchmark test over a few configurations at fixed total core count (vary ranks-per-node and OMP_NUM_THREADS while keeping their product equal to the available physical cores), and selecting the best for the target.