
Normalized cost of 1.0, the bead-spring chains and granular systems The ratios indicate that if the atomic LJ system has a Hex-core 3.47 GHz Intel Xeon processors, using the Intel 11.1 iccĬompiler. Is on a Dell Precision T7500 desktop Red Hat linux box with dual Per atom per timestep for the 5 benchmark problems which follow. This is a summary of single-processor LAMMPS performance in CPU secs Scaled-size problems generally have parallelĮfficiencies of 80% or more across a wide range of processor counts. As the dataīelow illustrates, fixed-size problems generally have parallelĮfficiencies of 50% or better so long as the atoms/processor is a few Problems you want to run on a given number of processors. The timing and parallel efficiency data to estimate the CPU cost for Scales roughly linearly in the number of atoms simulated, you can use These benchmarks are meant to span a range of simulation styles andĬomputational expense for interaction forces. Loop time of 3.89418 on 8 procs for 100 steps with 32000 atoms Note that the CPU time (in seconds) forĪ run is what appears in the "Loop time" line of the output log file, Machines listed, you can send your timing results and machine info and If your platform is sufficiently different from the The files needed to run these benchmarks are part of the LAMMPSĭistribution. Production machines while other jobs were running, which can sometimes Seconds, and the actual run time was 12 seconds, then the efficiency ForĮxample, if perfect speed-up would have given a run-time of 10 ParallelĮfficiencies refer to the ratio of ideal to actual run time. ThusĪ scaled-size 64-processor run is for 2,048,000 atoms a 32K proc runĪll listed CPU times are in seconds for 100 timesteps.

In the simulation was P times larger than the one-processor run. Scaled-size means that when run on P processors, the number of atoms Problem with 32,000 atoms was run on varying numbers of processors. More information on machineĬharacteristics, including their "birth" year, is given at the bottomįor each of the 5 benchmarks, fixed- and scaled-size timings are shown Is in units of Mb/sec and microsecs at the MPI level, i.e. The "Processors" column is the most number of processors on that See the Kokkos, Intel, and GPU sectionsįor machine specifications for those GPU and Phi platforms. These are the parallel machines for which benchmark data is Courtenay Vaughan (Sandia), Cray XT3 results.Fiona Reid (Edinburgh Parallel Computing Centre), IBM p690+ results.Paul Crozier (Sandia), IBM BG/L results.Christian Trott (U of Technology Ilmenau), Lincoln results.Carl Ponder (NVIDIA), Keeneland results.Sikandar Mashayak (UIUC), Kokkos and other accelerator results on GPU and Phi clusters.Thanks to the following individuals for running the various benchmarks: Jun 2012, Potentials = relative CPU cost of different interatomic potentials.Aug 2012, Supercomputer = Titan development machine at ORNL.Aug 2012, Desktop system = Dual hex-core Xeons with 2 Tesla GPUs.GPU (Fermi) benchmarks using the GPU and USER-CUDA packages Sept 2014, GPU cluster = Dual 8-core Sandy Bridge Xeons with 2 Kepler GPUs.Oct 2016, CPU vs GPU vs KNL performance.Accelerator packages: GPU, KOKKOS, OPT, USER-CUDA, USER-INTEL, USER-OMP.GPU (Kepler) and Intel Xeon Phi benchmarks using all accelerator packages Protein = rhodopsin protein in solvated lipid bilayer.Polymer = bead-spring polymer melt of 100-mer chains.Lennard Jones = atomic fluid with Lennard-Jones potential.


One processor = relative CPU cost of the following 5 benchmarks.That input and sample output files for many of these benchmark testsĪre provided in the bench directory of the LAMMPS distribution.
#Computer benchmark comparison serial
On various machines, both in serial and parallel and on GPUs. This page lists LAMMPS performance on several benchmark problems, run
