Scholarly and Creative Works
2025
- Vyas, S., Vyas, C., Sharotry, A., Jimenez, J., Qasem, A. M., & Mendez, F. A. (n.d.). Human Activity Recognition in MMH: Pilot Study on Lifting and Lowering Classification.
2024
- Ali, M., & Qasem, A. M. (2024). Alleviating dataset constraints through synthetic data generation in machine learning driven power modeling (pp. 52–58).
2023
- Novoa Ramirez, C. M., & Qasem, A. M. (2023). GPU-accelerated Parallel Solutions to the Quadratic Assignment Problem. ArXiv, 1–25. Retrieved from https://arxiv.org/abs/2307.11248
2022
- Bunde, D., Ahmed, K., Ayloo, S., Brown-Gaines, T., Fuentes, J., Jatala, V., … Yeh, T. (2022). Adopting Heterogeneous Computing Modules: Experiences from a ToUCH Summer Workshop. IEEE.
- Qasem, A. M., Ayguade, E., Cahill, K., Ostasz, M., Panda, D. K., & Tomko, K. (2022). Lightning Talks of EduHPC 2022. IEEE.
- Rafi, M. E. H., Williams, K. B., & Qasem, A. (2022). Raptor: Detecting CPU-GPU False Sharing Under Unified Memory Systems. https://doi.org/10.1109/IGSC55832.2022.9969376
- Rafi, M. E. H., & Qasem, A. M. (2022). Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning. https://doi.org/10.1109/IGSC55832.2022.9969364
- Girolamo, J. D., Hope, J., & Qasem, A. (2022). Uncovering Input-Sensitive Energy Bottlenecks in Oversubscribed GPU Workloads. Sustainable Computing: Informatics and Systems, 35. https://doi.org/https://doi.org/10.1016/j.suscom.2022.100654
- Qasem, A. M. (2022). YODA: A pedagogical tool for teaching systems concepts (Vol. 1, pp. 613–618).
- Qasem, A. M., & Bunde, D. (2022). Heterogeneous computing for undergraduates: Introducing the touch module repository (Vol. 2).
2021
- Hope, J., Gjergji, M., DiGirolamo, J., Alvarez, M., & Qasem, A. (2021). Characterizing Input-sensitivity in Tightly-Coupled Collaborative Graph Algorithms (pp. 287–296). https://doi.org/10.1109/CCGrid51090.2021.00038
- Ford, B., Qasem, A., Tesic, J., & Zong, Z. (2021). Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach. Retrieved from 601 University Dr
- Ford, B. W., Qasem, A. M., Tesic, J., & Zong, Z. (2021). Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach. In 2021 IEEE International Conference on Networking, Architecture and Storage (NAS) (pp. 1–6). IEEE. https://doi.org/10.1109/nas51552.2021.9605443
- Bunde, D., Schielke, P., & Qasem, A. M. (2021). Short Modules for Introducing Heterogeneous Computing. Journal of Computing Sciences in Colleges, 36(8), 95–96. https://doi.org/10.5555/3470135.3470145
- Bunde, D. P., Qasem, A., & Schielke, P. (2021). Short Modules for Introducing Heterogeneous Computing: Workshop.
- Bunde, D., Schielke, P., & Qasem, A. M. (2021). Teaching About Heterogeneity (SIGCSE21).
- Qasem, A. M., Bunde, D. P., & Schielke, P. (2021). A Module-based Introduction to Heterogeneous Computing in Core Courses. Journal of Parallel and Distributed Computing, 158, 56–66. https://doi.org/10.1016/j.jpdc.2021.07.011
2020
- Sultana, T., Allen, B., & Qasem, A. M. (2020). Intelligent Data Placement on Discrete GPU Nodes with Unified Memory (PACT20) (pp. 139–151). New York, NY: ACM. https://doi.org/10.1145/3410463.3414651
2019
- Qasem, A. (2019). A Gentle Introduction to Heterogeneous Computing for CS1 Students (pp. 10–16). IEEE. https://doi.org/10.1109/EduHPC49559.2019.00007
- Hope, J., Nag, T., & Qasem, A. (2019). Energy-efficient GPU graph processing with on-demand page migration. IEEE Computer Society.
- Aslan, S., Kellington, J. W., Sefat, M. S., & Qasem, A. M. (2019). Accelerating HotSpots in Deep Neural Networks on a CAPI-Based FPGA (pp. 248–256). https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00048
2018
- Qasem, A. M., Novoa Ramirez, C. M., **Kolla, C. S., & *Coyle, S. (2018). High-Accuracy Scalable Solutions to the Dynamic Facility Layout Problem. Super Computing (SC18) - The International Conference for High Performance Computing, Networking, Storage, and Analysis Proceedings, 1–2. Retrieved from https://sc18.supercomputing.org/proceedings/tech_poster/poster_files/post169s2-file3.pdf
- Qasem, A. M. (2018). Modules for Teaching Parallel Performance Concepts. In Topics in Parallel and Distributed Computing Enhancing the Undergraduate Curriculum: Performance, Concurrency, and Programming on Modern Platforms (pp. 59–77). Springer. https://doi.org/https://doi.org/10.1007/978-3-319-93109-8
- Sefat, M. S., Aslan, S., & Qasem, A. M. (2018). Hardware Acceleration of CNNs with Coherent FPGAs.
- Qasem, A. M. (2018). Modules for Teaching Parallel Performance Concepts. In Topics in Parallel and Distributed Computing: Introducing Concurrency in Undergraduate Courses (Vol. 2). Springer.
- Qasem, A. M., Aji, A. M., & Chu, M. L. (2018). Investigating data layout transformations in Chapel (pp. 915–924).
2017
- Saha, B. K., Rahman, S., Connors, T., & Qasem, A. M. (2017). A Machine Learning Approach to Automatic Creation of Architecture-sensitive Performance Heuristics.
- Connors, T., & Qasem, A. M. (2017). Automatically Selecting Profitable Thread Block Sizes for Accelerated Kernels.
- Qasem, A. M., & Teich, S. (2017). Evaluating the Impact of Data Layout and Placement on the Energy Efficiency of Heterogeneous Applications.
- Teich, S., & Qasem, A. M. (2017). Mitigating Register Pressure in GPU Kernels for Improved Energy Efficiency.
- Qasem, A. M., Aji, A., & Rodgers, G. (2017). Characterizing data organization effects on heterogeneous memory architectures.
2015
- Qasem, A. M., Burtscher, M., & Taheri, S. (2015). A tool for automatically suggesting source-code optimizations for complex GPU kernels.
- Qasem, A. M., Rahman, S., Burtscher, M., & Zong, Z. (2015). Maximizing hardware prefetch effectiveness with machine learning.
- Qasem, A. M., Novoa, C., & Chaparala, A. (2015). A SIMD tabu search implementation for solving the quadratic assignment problem with gpu acceleration.
- Qasem, A. M., Burtscher, M., Peng, W., Shi, H., Tamir, D., & Thiry, H. (2015). A module-based approach to adopting the 2013 ACM curricular recommendations on parallel computing.
- Novoa Ramirez, C. M., Qasem, A. M., & **Chaparala, A. (2015). A SIMD Tabu Search Implementation for Solving the Quadratic Assignment Problem with GPU Acceleration. In Proceedings of the 2015 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE ’15) (pp. 1–8). https://doi.org/doi.acm.org/10.1145/2792745.2792758
- Qasem, A. M., Novoa Ramirez, C. M., **Chaparala, A., & *Rishu, N. (2015). Autotuning GPU-accelerated Quadratic Assignment Problems (QAP) Solvers for Power and Performance. In Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications (HPCC) (pp. 1–6).
- Qasem, A. M., Gutierrez, M., Rahman, S., & Tamir, D. (2015). Neural network methods for fast and portable prediction of CPU power consumption.
- Qasem, A. M., Alvarado, C., & Tamir, D. (2015). Realizing energy-efficient thread affinity configurations with supervised learning.
- Qasem, A. M., & Connors, T. (2015). Power-performance analysis of metaheuristic search algorithms on the GPU.
- Qasem, A. M., Saha, B. K., & Rahman, S. (2015). MLTUNE: A tool-chain for automating the workflow of machine-learning based performance tuning (extended abstract).
- Qasem, A. M., & Rahman, S. (2015). Investigating prefetch potential on the Xeon Phi with autotuning (extended abstract).
- Qasem, A. M., Gutierrez, M., & Tamir, D. (2015). Evaluating neural network methods for PMC-based CPU power prediction.
- Qasem, A. M., Alvarado, C., & Tamir, D. (2015). Energy-efficient thread migration via dynamic characterization of resource utilization.
- Qasem, A. M., Chaparala, A., & Novoa, C. (2015). Autotuning gpu-accelerated qap solvers for power and performance.
2014
- Qasem, A. M. (2014). Exposing undergraduates to parallel performance concepts with a three-module sequence.
- Qasem, A. M., Shankar, S., Lakomski, G., Alvarado, C., Hay, R., Hyatt, C., & Tamir, D. (2014). Power aware work stealing in homogeneous multicore systems.
- Qasem, A. M., Hyatt, C., Lakomski, G., Alvarado, C., Hay, R., & Tamir, D. (2014). Power aware task matching and migration in heterogeneous processing environments.
- Qasem, A. M., Alvarado, C., & Tamir, D. (2014). Dynamic feedback-driven thread migration for energy-efficient execution of multithreaded workloads.
- Qasem, A. M., Novoa, C., & Chaparala, A. (2014). A SIMD solution for the quadratic assignment problem with GPU acceleration.
- **Chaparala, A., Novoa Ramirez, C. M., & Qasem, A. M. (2014). A SIMD Solution for the Quadratic Assignment Problem with GPU Acceleration. In Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE ’14) (pp. 1–8). https://doi.org/doi.acm.org/10.1145/2616498.2616521
2013
- Qasem, A. M., & Magee, J. (2013). Improving TLB performance on current chip multiprocessor architectures through demand-driven Superpaging. Software Practice and Experience (SPE), 43(6), 750–729.
- Qasem, A. M., Burtscher, M., Peng, W., Shi, H., Tamir, D., & Thiry, H. (2013). Integrating parallel computing into the undergraduate curriculum at Texas State University: Experiences from the first year.
- Qasem, A. M., Hyatt, C., Lakomski, G., & Tamir, D. (2013). Power aware task matching and migration in heterogeneous processing environments.
- Qasem, A. M., Holt, J., Bazzera, G., Miller, J., & Hoffman, H. (2013). A pattern language for adaptive parallel software.
- Qasem, A. M., Shankar, S., & Tamir, D. (2013). Towards an operating system based framework for energy-efficient scheduling of parallel workloads.
- Qasem, A. M., Rashid, H., Hay, R., & Novoa, C. (2013). Algorithmic choice in optimization problems: A performance study (extended poster abstract).
- Qasem, A. M., Rahman, S., & Hay, R. (2013). Enhancing learning-based autotuning with composite and diagnostic feature vectors (extended poster abstract).
- Qasem, A. M., Burtscher, M., Peng, W., Shi, H., & Tamir, D. (2013). Preparing computer science students for an increasingly parallel world: Teaching parallel computing early and often (extended poster abstract).
- **Rashid, H., Novoa Ramirez, C. M., Hay, R., & Qasem, A. M. (2013). Algorithmic Choice in Optimization Problems: A Performance Study. Super Computing (SC13) - The International Conference for High Performance Computing, Networking, Storage, and Analysis Proceedings, 1. Retrieved from http://sc13.supercomputing.org/sites/default/files/PostersArchive/tech_posters/post133s2-file3.pdf
2012
- Qasem, A. M., & Sarangkar, S. (2012). MATS: A Model-driven Adaptive Tuning System for Parallel Workloads. Journal of Parallel and Cloud Computing (JPCC), 1, 50–64.
- Qasem, A. M. (2012). High-Level Language Extensions For Fast Execution Of Pipeline-Parallelized Code On Current Chip Multi-Processor Systems. International Journal of Programming Languages and Applications (IJPLA), 2, 1–12.
- Qasem, A. M. (2012). Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels. American Journal of Computer Architecture, 1, 12–20.
- Qasem, A. M. (2012). Autotuning Strategies For Reducing Synchronization Costs In Multithreaded Kernels. Journal of Systems and Software (JSYS), 2, 152–165.
- ** Rashid, H., Novoa Ramirez, C. M., McKenney, M., & Qasem, A. M. (2012). Efficient Parallel Solutions to the Integral Knapsack Problem on Current Chip-multiprocessor Systems. International Journal of Parallel, Emergent and Distributed Systems, 27(1), 19–44.
- Qasem, A. M., Cade, M. J., & Tamir, D. (2012). Improved Energy Efficiency For Multithreaded Kernels Through Model-Based Autotuning (pp. 1–6).
- Qasem, A. M., Unkule, S., & Shaltz, C. (2012). Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality (pp. 21–40).
- Qasem, A. M. (2012). Ef?cient Execution of Time-step Computations with Pipelined Parallelism and Inter-thread Data Locality Optimizations (pp. 27–35).
- Qasem, A. M., & Tamir, D. (2012). Memory Performance Diagnosis Through Feedback Synthesis (pp. 5–10).
- Qasem, A. M., & Chen, S. (2012). Using Macro Features in Learning Algorithms for Optimizing Dense-matrix Computations (Vol. Technical Report CS-TR-2012-21). Department of Computer Science, Texas State University.
2011
- Qasem, A. M., Rashid, H., Novoa, C., & McKenney, M. (2011). Efficient Parallel Solutions to the Integral Knapsack Problem on Current Chip-multiprocessor Systems. International Journal of Parallel, Emergent and Distributed Systems (IJPEDS), 27, 19–44.
- Qasem, A. M., & Unkule, S. (2011). Register Pressure Aware Code Transformations On GPU (Extended Abstract), 19–20.
- Qasem, A. M., Rahman, F., & Yi, Q. (2011). Understanding Stencil Code Performance On Multicore Architecture (pp. 30–45).
- Qasem, A. M., Novoa, C., Rashid, H., & McKenney, M. (2011). Dynamic Programming Solutions for the Integral Knapsack Problem on Multicore Architectures (Extended Abstract).
2010
- **Rashid, H., Novoa Ramirez, C. M., & Qasem, A. M. (2010). An Evaluation of Parallel Knapsack Algorithms on Multicore Architectures. In Proceedings of the 2010 International Conference on Super Computing (CSC’10) (pp. 230–235).
- Qasem, A. M., & Sarangkar, S. (2010). Intelligent Feedback For Fast and Effective Autotuning (Extended Abstract).
- Qasem, A. M., Yi, Q., & Guo, J. (2010). Evaluating the Role of Optimization-Specific Search Heuristics in Effective Autotuning (short paper).
- Qasem, A. M., Guo, J., Rahman, F., & Yi, Q. (2010). Exposing Tunable Parameters in Multithreaded Numerical Code (pp. 46–60).
- Qasem, A. M. (2010). Locality-Conscious Superpaging for Improved TLB Behavior of Stencil Computations.
- Qasem, A. M., Yi, Q., & Sarangkar, S. (2010). Improving Autotuning Efficiency And Portability Through Feedback Diagnostics.
- Qasem, A. M., Rashid, H., & Novoa, C. (2010). An Evaluation Of Parallel Knapsack Algorithms On Multicore Architectures (pp. 230–235).
- Qasem, A. M., & Sarangkar, S. (2010). Restructuring Parallel Loops to Curb False Sharing on Multicore Architectures (pp. 1–7).
2009
- Qasem, A. M., & Cade, J. (2009). Balancing Data Locality And Parallelism on Shared-cache Multi-Core Systems (pp. 188–195).
- Qasem, A. M., & Magee, J. (2009). A Case for Compiler-driven Superpage Allocation.
2008
- Qasem, A. M., & Kennedy, K. (2008). Model-guided Empirical Tuning of Loop Fusion. International Journal of High Performance Systems Architecture (IJHPSA), 1, 183–198.
- Qasem, A. M., & Yi, Q. (2008). Exploring the optimization space of dense linear algebra kernels (pp. 343–355).
- Qasem, A. M. (2008). Evaluating an Early Stop Criterion and a Statistical Pruning Strategy of the Optimization Search Space (pp. 506–510).
2007
- Qasem, A. M., & Kennedy, K. (2007). Pruning the Optimization Search Space Using Architecture-aware Cost Models.
2006
- Qasem, A. M., Kennedy, K., & Mellor-Crummey, J. (2006). Automatic Tuning of Whole Applications Using Direct Search and a Performance-based Transformation System. The Journal of Supercomputing, 36, 183–196.
- Qasem, A. M., & Kennedy, K. (2006). Profitable Loop Fusion and Tiling Using Model-driven Empirical Search (pp. 249–258).
2005
- Qasem, A. M., & Kennedy, K. (2005). A Cache-conscious Profitability Model for Empirical Tuning of Loop Fusion (pp. 106–120).
- Qasem, A. M., & Kennedy, K. (2005). Evaluating a Model for Cache Conflict Miss Prediction (Vol. Technical Report CS-TR05-457). Department of Computer Science, Rice University.
2004
- Qasem, A. M., Kennedy, K., & Mellor-Crummey, J. (2004). Automatic Tuning of Whole Applications Using Direct Search and a Performance-based Transformation System.
2003
- Qasem, A. M., Jin, G., & Mellor-Crummey, J. (2003). Improving Performance with Integrated Program Transformations (Vol. Technical Report CS-TR03-419). Department of Computer Science, Rice University.
2002
- Qasem, A. M., Fowler, R., Mellor-Crummey, J., & Jin, G. (2002). A Source-to-source Loop Transformation Tool. Extended poster abstract.
2001
- Qasem, A. M., Whalley, D., Yuan, X., & van Engelen, R. (2001). Using a Swap Instruction to Coalesce Loads and Stores (pp. 235–240).
- Qasem, A. M., & Whalley, D. (2001). Using a Swap Instruction to Coalesce Loads and Stores (Vol. TR-010403). Department of Computer Science, Florida State University.