The Scalasca Trace Tools are a collection of trace-based performance analysis tools that have been specifically designed for use on large-scale systems featuring hundreds of thousands of CPU cores, but also suitable for smaller HPC platforms. A distinctive feature of the Scalasca Trace Tools is its scalable automatic trace-analysis component which provides the ability to identify wait states that occur, for example, as a result of unevenly distributed workloads. Especially when trying to scale communication intensive applications to large process counts, such wait states can present severe challenges to achieving good performance. Besides merely identifying wait states, the trace analyzer is also able to pinpoint their root causes (i.e., delays), and to identify the activities on the critical path of the target application, highlighting those routines which determine the length of the program execution and therefore constitute the best candidates for optimization.
Downloading Scalasca
Please find the latest tarballs here:
https://perftools.pages.jsc.fz-juelich.de/cicd/scalasca/
Getting in contact
If you have any comments or questions regarding the use and installation of Scalasca, or want to report a bug you discovered, please email scalasca@fz-juelich.de
Staying up-to-date
You can also sign up to the Scalasca News mailing list to receive the latest news about new releases, tutorials, workshops, and other Scalasca-related events.
Citing Scalasca
If you find the Scalasca Trace Tools helpful for your research, please mention them in your publications. To cite the Scalasca Trace Tools in general, please use the following two publications:
- Zhukov, I. et al. (2015). Scalasca v2: Back to the Future. In: Niethammer, C., Gracia, J., Knüpfer, A., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2014, pp. 1-24. Springer, Cham. https://doi.org/10.1007/978-3-319-16012-2_1
- Geimer, M. et al. (2009). A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing 35(7), pp. 275-388. https://doi.org/10.1016/j.parco.2009.02.003
When referring to specific topics, please use one (or more) of the following:
- Parallel waitstate search: Geimer, M. et al. (2009). A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing 35(7), pp. 275-388. https://doi.org/10.1016/j.parco.2009.02.003
- Root-cause analysis: Böhme, D. et al. (2010). Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. Proc. 39th International Conference on Parallel Processing (ICPP), pp. 90-100. https://doi.org/10.1109/ICPP.2010.18
- Critical-path analysis: Böhme, D. et al. (2012). Scalable Critical-Path Based Performance Analysis. Proc. IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pp. 1330-1340. IEEE. https://doi.org/10.1109/IPDPS.2012.120
- Trace event timestamp correction: Becker, D. et al. (2013). Extending the scope of the controlled logical clock. Cluster Computing 16, pp. 171–189. https://doi.org/10.1007/s10586-011-0181-8
Acknowledgements
This work is supported by BMBF, DFG, Helmholtz POF, EU (FP7, Horizon 2020, ITEA-2), EuroHPC JU, US DOE, Siemens AG, and Intel.