Scholarly and Creative Works
2026
- Islam, T. Z., Zaeed, M., Schutte, H., & Marathe, A. (2026). Data-Driven Analysis for Understanding the Impact of Optimizations on a Multi-objective Space. International Journal of High Performance Computing and Applications (IJHPCA). Retrieved from https://arxiv.org/html/2408.10143v1
2025
- Ahmed, A. N., Banday, B., Jones, T., & Islam, T. Z. (2025). Attention-Informed Surrogates for Navigating Power-Performance Trade-offs in HPC. In Workshop on ML for Systems in Conjuction with NeurIPS.
- Dey, A., Antony, N., Dhakal, A. R., Thopalli, K., Thiagarajan, J. J., Patki, T., … Islam, T. Z. (2025). ModelX: A Novel Transfer Learning Approach Across Heterogeneous Datasets. In The 34th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC).
- Zaeed, M., Islam, T. Z., & Inđić, V. (2025). Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs. In Arxiv. Retrieved from https://arxiv.org/abs/2510.00932
- Fefey, E. G., & Islam, T. Z. (2025). Optimizing Deep Learning Inference on Heterogeneous Devices on the Edge: An ILP-Based Rate Monotonic Scheduling Approach. In 2025 IEEE Cloud Summit (pp. 23–28). https://doi.org/10.1109/Cloud-Summit64795.2025.00011
- Lahiry, A., Banday, B., & Islam, T. Z. (2025). WANDER: An Explainable Decision-Support Framework for HPC. In Arxiv. Retrieved from https://arxiv.org/abs/2506.04049
- Islam, T. Z., Marathe, A., Schutte, H., & Zaeed, M. (2025). Data-Driven Analysis to Understand GPU Hardware Resource Usage of Optimizations. Retrieved from https://arxiv.org/abs/2408.10143
- Ramadan, T., Pinnow, N., Phelps, C. L., Thiagarajan, J. J., & Islam, T. Z. (2025). Structure-Aware Representation Learning for Effective Performance Prediction. Concurrency and Computation: Practice and Experience, 37(9–11), e70046. https://doi.org/https://doi.org/10.1002/cpe.70046
- Sirvent, Ra\"ul, Carratal\’a-Saez, Roc\’\io, Gueroudji, A., Islam, T. Z., Pouchard, L., & Taufer, M. (2025). Reproducibility for HPC and Distributed Environments: Committees, Nondeterminism, Performance and Workflows. In Proceedings of the 3rd ACM Conference on Reproducibility and Replicability (pp. 29–40). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3736731.3746141
- Lahiry, A., Pokharel, A., Banday, B., Ockerman, S., Gueroudji, A., Zaeed, M., … Pouchard, L. (2025). A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces. Retrieved from https://arxiv.org/abs/2510.18300
- McInnes, L. C., Arnold, D., Balaprakash, P., Bernhardt, M., Cerny, B., Dubey, A., … Wu, L. (2025). Report of the 2025 Workshop on Next-Generation Ecosystems for Scientific Computing: Harnessing Community, Software, and AI for Cross-Disciplinary Team Science. Retrieved from https://arxiv.org/abs/2510.03413
- Maiterth, M., Brewer, W. H., Kuruvella, J. S., Dey, A., Islam, T. Z., Kabir, R., … Wang, F. (2025). HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling. In Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1959–1969). ACM. https://doi.org/10.1145/3731599.3767559
- Kelly, C., Xu, W., Pouchard, L. C., Van Dam, H., Islam, T. Z., Yoo, S., & Kleese Van Dam, K. (2025). Performance analysis and data reduction for exascale scientific workflows. The International Journal of High Performance Computing Applications, 39(4), 553–578. https://doi.org/10.1177/10943420251316253
- Dhakal, A. R., Islam, T. Z., Dey, A., Nichols, D., Bhatele, A., Patki, T., … Yeom, J.-S. (2025). xAMM: “Attention” to Details Improves Cross-Platform Prediction Accuracy. In 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (pp. 01–10). IEEE. https://doi.org/10.1109/ccgrid64434.2025.00067
- Banday, B., Thopalli, K., Islam, T. Z., & Thiagarajan, J. J. (2025). On The Role of Prompt Construction In Enhancing Efficacy and Efficiency of LLM-Based Tabular Data Generation. In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1–5). IEEE. https://doi.org/10.1109/icassp49660.2025.10888077
2024
- Fefey, E. G., & Islam, T. Z. (2024). Toward Efficient Deep Learning Inference: On-Node Heterogeneous Scheduling in Edge-Cloud Infrastructure. In 2024 IEEE Cloud Summit (pp. 73–78). https://doi.org/10.1109/Cloud-Summit61220.2024.00019
- Gueroudji, A., Phelps, C., Islam, T. Z., Carns, P., Snyder, S., Dorier, M., … Pouchard, L. C. (2024). Performance Characterization and Provenance of Distributed Task-based Workflows on HPC Platforms. In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 2032–2039). IEEE. https://doi.org/10.1109/scw63240.2024.00254
- Phelps, C., Lahiry, A., Islam, T. Z., & Pouchard, L. C. (2024). Reimagine Application Performance as a Graph: Novel Graph-Based Method for Performance Anomaly Classification in High-Performance Computing. In 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) (Vol. 12, pp. 240–245). IEEE. https://doi.org/10.1109/compsac61105.2024.00041
- Dey, A., Dhakal, A., Islam, T. Z., Yeom, J.-S., Patki, T., Nichols, D., … Bhatele, A. (2024). Relative Performance Prediction Using Few-Shot Learning. In 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 1764–1769). IEEE. https://doi.org/10.1109/compsac61105.2024.00278
- Banday, B. H., Islam, T. Z., & Marathe, A. (2024). PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI. In 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) (Vol. 46, pp. 188–197). IEEE. https://doi.org/10.1109/compsac61105.2024.00035
- Zaeed, M., Islam, T. Z., & Indict, V. (2024). Characterize and Compare the Performance of Deep Learning Optimizers in Recurrent Neural Network Architectures. In 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) (Vol. 12, pp. 39–44). IEEE. https://doi.org/10.1109/compsac61105.2024.00016
- Fefey, E., & Islam, T. Z. (n.d.). Toward Efficient Deep Learning Inference: On-Node Heterogeneous Scheduling in Edge-Cloud Infrastructure.
- Islam, T. Z., & Alhajjar, E. (2024, June 3). SIAM Task Force Anticipates Future Directions of Computational Science. Retrieved from https://www.siam.org/publications/siam-news/articles/siam-task-force-anticipates-future-directions-of-computational-science/
2023
- Phelps, C. L., & Islam, T. Z. (2023). Automatic Parallelization of Cellular Automata for Heterogeneous Platforms. In 47th Annual Computers, Software, and Applications Conference (COMPSAC).
- Dey, A., Phelps, C. L., Islam, T. Z., & Kelly, C. (2023). Signal Processing Based Method for Real-Time Anomaly Detection in High-Performance Computing. In 47th Annual Computers, Software, and Applications Conference (COMPSAC).
- Ramadan, T., Lahiry, A., & Islam, T. Z. (2023). Novel Representation Learning Technique Using Graphs for Performance Analytics. In 2023 International Conference on Machine Learning and Applications (ICMLA). IEEE. https://doi.org/10.1109/icmla58977.2023.00198
- Nicolae, B., Islam, T. Z., Ross, R., Van Dam, H., Assogba, K., Shpilker, P., … Pouchard, L. C. (2023). Building the I (Interoperability) of FAIR for Performance Reproducibility of Large-Scale Composable Workflows in RECUP. In 2023 IEEE 19th International Conference on e-Science (e-Science) (Vol. 11, pp. 1–7). IEEE. https://doi.org/10.1109/e-science58273.2023.10254808
2022
- Schutte, H., Phelps, C. L., Marathe, A., & Islam, T. Z. (2022). \textsclibNVCD: An Extendable and User-friendly Multi-GPU Performance Measurement Tool. In 46th Annual Computers, Software, and Applications Conference (COMPSAC).
- Guite, A., Islam, T. Z., Kelley, C., & Xu, W. (2022). Interactive Visual Analysis Tool for Anomaly Provenance Data. IEEE. Retrieved from https://sc22.supercomputing.org/proceedings/tech_poster/poster_files/rpost159s3-file2.pdf
- Zaeed, M., Islam, T. Z., Cho, Y., Li, S., Luo, H., & Liu, Y. (2022). Analysis and Visualization of Important Performance Counters To Enhance Interpretability of Autotuner Output. IEEE. Retrieved from https://sc22.supercomputing.org/proceedings/tech_poster/poster_files/rpost183s3-file2.pdf
- Pouchard, L., Islam, T. Z., Nicolae, B., & Ross, R. (n.d.). A (Meta)data Framework for Reproducing Hybrid Workflows with FAIR.
- Pouchard, L., Islam, T. Z., & Nicolae, B. (2022). RECUP: A (meta)data framework for reproducing hybrid workflows with FAIR. Retrieved from https://works-workshop.org/files/works22_pouchard.pdf
- Daw, C., Barragan-Cruz, B., Majeske, N., Jagodzinski, F., Islam, T. Z., & Hutchinson, B. (2022). Chapter 5: Low Rank Sampling Methods for Identifying Impactful Pairwise Protein Mutations. In Part of the Computational Biology book series (COBO). Springer Nature. https://doi.org/https://doi.org/10.1007/978-3-031-05914-8_4
- Islam, T. Z., & Zaeed, M. (2022, October). Dashing enabled GPTune Autotuner. Public git repository. Retrieved from https://gptune.lbl.gov
- Dey, A., & Islam, T. Z. (2022). Performance Modeling Across Heterogenous Domains Using Few-Shot Learning.
- Fefey, E., & Islam, T. Z. (2022). Characterization of Deep Learning Inference Workloads.
- Zaeed, M., & Islam, T. Z. (2022). Analysis and Visualization of Important Performance Counters to Enhance Interpretability of Autotuner Output.
- Pouchard, L., Islam, T. Z., & Nicolae, B. (2022). Challenges for Implementing FAIR Digital Objects with High Performance Workflows. Retrieved from https://riojournal.com/article/94835/instance/8003496/
- Islam, T. Z., & Zaeed, M. (2022, September). libNVCD:An easy-to-use, performance measurement and analysis tool for NVIDIA GPUs. Public git repository. Retrieved from https://gptune.lbl.gov
- Schutte, H., Islam, T. Z., Phelps, C., & Marathe, A. (2022). libNVCD: An Extendable and User-friendly Multi-GPU Performance Measurement Tool (pp. 73–82). IEEE. https://doi.org/10.1109/COMPSAC54236.2022.00019
- Islam, T. Z., Schutt, H., Phelps, C., & Marathe, A. (n.d.). GPUPD: An Extendable and User-friendly Multi-GPU Performance Measurement Tool. IEEE.
2021
- Islam, T. Z., & Phelps, C. (2021). HPC@SCALE: A Hands-on Approach for Training Next-Gen HPC Software Architects. In 2021 IEEE 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW) (pp. 29–34). IEEE. https://doi.org/10.1109/hipcw54834.2021.00011
- Jensen, Q., Jagodzinski, F., & Islam, T. Z. (2021). FILCIO: Application Agnostic I/O Aggregation to Scale Scientific Workflows. IEEE. https://doi.org/10.1109/COMPSAC51774.2021.00236
- Islam, T. Z., Wu Liang, P., Sweeney, F., Pragner, C., Thiagarajan, J. J., Sharmin, M., & Ahmed, S. (2021). College Life is Hard! - Shedding Light on Stress Prediction for Autistic College Students using Data-Driven Analysis. In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC) (Vol. 329, pp. 428–437). IEEE. https://doi.org/10.1109/compsac51774.2021.00066
- Ramadan, T., Islam, T. Z., Phelps, C., Pinnow, N., & Thiagarajan, J. J. (2021). Comparative Code Structure Analysis using Deep Learning for Performance Prediction. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 151–161). IEEE. https://doi.org/10.1109/ispass51385.2021.00032
2020
- Stratton, J., Albert, M., Jensen, Q., Ismailov, M., Jagodzinski, F., & Islam, T. Z. (2020). Towards Aggregation Based I/O Optimization for Scaling Bioinformatics Applications (pp. 1250--1255). IEEE. https://doi.org/10.1109/COMPSAC48688.2020.00-85
- Islam, T. Z. (2020). Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program. In NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI). Retrieved from https://arxiv.org/abs/2010.15584
- Islam, T. Z., & Zaeed, M. (2020, August). Dashing: An extendable and programmable toolbox of interpretable ML models. Public git repository. Retrieved from https://gptune.lbl.gov
- Islam, T. Z. (2020). Performance characterization data for AMReX applications developed by the DOE Exascale Computing Project (ECP). https://doi.org/10.5281/zenodo.3403037
2019
- Patki, T., Thiagarajan, J. J., Ayala, A., & Islam, T. Z. (2019). Performance optimality or reproducibility: that is the question (pp. 1--30). ACM/IEEE.
- Islam, T. Z., Ayala, A., Jensen, Q., & Ibrahim, K. (2019). Toward a Programmable Analysis and Visualization Framework for Interactive Performance Analytics. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (pp. 70--77). IEEE. https://doi.org/10.1109/ProTools49597.2019.00015
- Islam, T. Z. (2019, March). SCR: Scalable Checkpoint / Restart (SCR) Library. Retrieved from https://github.com/LLNL/scr
- Patki, T., Thiagarajan, J. J., Ayala, A., & Islam, T. Z. (2019). Performance optimality or reproducibility. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–30). ACM. https://doi.org/10.1145/3295500.3356217
- Islam, T. Z. (2019). On-node scaling dataset on HPC systems. https://doi.org/10.5281/zenodo.4315003
- Islam, T. Z., & Phelps, C. (2019, March). PyPerfdump. Retrieved from https://github.com/RECUP-DOE/pyperfdump
2018
- Thiagarajan, J. J., Anirudh, R., Kailkhura, B., Jain, N., Islam, T. Z., Bhatele, A., … Gamblin, T. (2018). PADDLE: Performance Analysis using a Data-driven Learning Environment (pp. 784--793). IEEE. https://doi.org/10.1109/IPDPS.2018.00088
- Islam, T. Z., Majeske, N., Jagodzinski, F., & Hutchinson, B. (2018). Low Rank Smoothed Sampling Methods for Identifying Impactful Pairwise Mutations (pp. 681–686). ACM. https://doi.org/https://doi.org/10.1145/3233547.3233714
- Moody, L., Pinnow, N., Lam, M. O., Menon, H., Schordan, M., Lloyd, G. S., & Islam, T. Z. (2018). Automatic Generation of Mixed-Precision Programs. Retrieved from https://sc18.supercomputing.org/proceedings/tech_poster/tech_poster_pages/post219.html
2017
- Islam, T. Z., Yu, W., Sato, K., Mohror, K., Zhu, Y., Moody, A., & Wang, T. (2017). MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffer (pp. 1174–1183). https://doi.org/10.1109/IPDPS.2017.39
- Banerjee, T., Hackl, J., Shringarpure, M., Islam, T. Z., Balachandar, S., Jackson, T., & Ranka, S. (2017). A new proxy application for compressible multiphase turbulent flows. Sustainable Computing: Informatics and Systems, 16, 11--24.
2016
- Islam, T. Z., Banerjee, T., Hackl, J., Shringarpure, M., Balanchandar, S., Jackson, T., & Ranka, S. (2016). CMT-Bone — A Proxy Application for Compressible Multiphase Turbulent Flows (pp. 173–182).
- Islam, T. Z., Mohror, K., & Schulz, M. (2016). Exploring the Capabilities of the New MPI_T Interface. The International Journal of High Performance Computing Applications (IJHPCA), 30(2), 212--222. https://doi.org/10.1145/2642769.2642781
- Banerjee, T., Hackl, J., Shringarpure, M., Islam, T. Z., Balachandar, S., Jackson, T., & Ranka, S. (2016). CMT-Bone — A Proxy Application for Compressible Multiphase Turbulent Flows. In IEEE 23rd International Conference on High Performance Computing (HiPC) (pp. 173–182). https://doi.org/10.1109/HiPC.2016.029
- Islam, T. Z., Thiagarajan, J. J., Bhatele, A., Schulz, M., & Gamblin, T. (2016). A Machine Learning Framework for Performance Coverage Analysis of Proxy Applications. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 538–549). IEEE. https://doi.org/10.1109/sc.2016.45
- Islam, T. Z., Mohror, K., Rountree, B., Schulz, M., Mohror, K., Supinski, B. R., … Sevoie, L. (2016). I/O Aware Power Shifting (pp. 740–749).
2015
- Fang, A., Laguna, I., Sato, K., Islam, T. Z., & Mohror, K. (2015). Fault Tolerance Assistant (FTA): An Exception Handling Approach for MPI Programs. ExaMPI15 Exascale MPI at Supercomputing 2015 (SC15). IEEE.
2014
- Ni, X., Kale, L., Islam, T. Z., Mohror, K., & Moody, A. (2014). Lossy Compression for Checkpointing: Fallible or Feasible? ACM/IEEE. Retrieved from http://sc14.supercomputing.org/sites/all/themes/sc14/files/archive/tech_poster/tech_poster_pages/post271.html
- Islam, T. Z., Rodgers, G. P., Hacker, T., & Anup, A. (2014). Batchsubmit: A high-volume Batch Submission System for Earthquake Engineering Simulation, 26, 2240–2252.
- Islam, T. Z., Bagchi, S., & Eigenmann, R. (2014). Reliable and Efficient Distributed Checkpointing System for Grid Environments (Vol. 12, pp. 593–613).
- Islam, T. Z., Tramn, J., Siegel, A., & Schulz, M. (2014). XSBench-the Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. Retrieved from https://www.mcs.anl.gov/papers/P5064-0114.pdf
- Islam, T. Z. (2014, May). Gyan: Performance Measurement Tool for MPI implementations. Retrieved from https://github.com/LLNL/mpi-tools
- Islam, T. Z. (2014). Reliable and Efficient Checkpoint/Recovery in Shared Grid Environments. Journal of Grid Computing, 12, 593--613. https://doi.org/https://doi.org/10.1007/s10723-014-9297-4
2012
- Islam, T. Z., Mohror, K., Bagchi, S., Moody, A., de Supinski, B. R., & Eigenmann, R. (2012). MCRENGINE: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression (pp. 10--pages). https://doi.org/10.1109/SC.2012.77
2009
- Islam, T. Z., Bagchi, S., & Eigenmann, R. (2009). FALCON: A System for Reliable Checkpoint Recovery in Shared Grid Environments (pp. 1–12). ACM. https://doi.org/https://doi.org/10.1145/1654059.1654110
- Hossain, M. S., Islam, T. Z., Bagchi, S., & Raghunathan, V. (2009). Fast and Collaborative Interference Avoidance for Wireless Medical Devices.
2007
- Islam, T. Z., Hossain, H., Ahmed, M., Al-Nayeem, A., & Akbar, M. M. (2007). gpNoCSim - A General Purpose Simulator for Network-On-Chip (pp. 254–257). IEEE. https://doi.org/10.1109/ICICT.2007.375388
