Faculty Profiles

Colleges and Departments

Dr. Martin Burtscher

Professor at Computer Science, College of Science & Engineering

Scholarly and Creative Works

2025

Mongandampulath Akathoott, A., & Burtscher, M. (n.d.). A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite Graphs.
Fallin, W. A., Azami, N., Di, S., Cappello, F., & Burtscher, M. (n.d.). Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error Bounds.
Fallin, W. A., Azami, N., & Burtscher, M. (2025). Efficient Lossless Compression of Scientific Floating-Point Data on CPUs and GPUs. In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (pp. 395–409). ACM. https://doi.org/10.1145/3669940.3707280
Ruys, W., Lee, H., You, B., Talati, S., Park, J., Almgren-Bell, J., … Biros, G. (n.d.). Performance Characterization of Python Runtimes for Multi-Device Task Parallel Programming. International Journal of Parallel Programming.

2024

Liu, Y., Azami, N., Vanausdal, A. R., & Burtscher, M. (2024). Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common Bugs. Retrieved from https://github.com/burtscher/Indigo3Suite/
Liu, Y., Vanausdal, A. R., & Burtscher, M. (2024). ECL-Suite: Data-race-free High-performance Graph Analytics Codes for GPUs. Retrieved from https://github.com/burtscher/ECL-Suite/
Burtchell, B. A., & Burtscher, M. (2024). Codes to Measure the Execution Time of a Suite of Single Synchronization Primitives from CUDA and OpenMP. Retrieved from https://github.com/burtscher/SyncPerformance/
Azami, N., Fallin, W. A., & Burtscher, M. (2024). FPcompress: Efficient Lossless GPU and CPU Compressors for Scientific Floating-Point Data. Retrieved from https://github.com/burtscher/FPcompress/
Jacobson, J., Burtscher, M., & Gopalakrishnan, G. (2024). HiRace: Accurate and Fast Data Race Checking for GPU Programs. In SC24: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–14). IEEE. https://doi.org/10.1109/sc41406.2024.00042
Burtchell, B. A., & Burtscher, M. (2024). Characterizing CUDA and OpenMP Synchronization Primitives. In 2024 IEEE International Symposium on Workload Characterization (IISWC) (pp. 295–308). IEEE. https://doi.org/10.1109/iiswc63097.2024.00034
Liu, Y., VanAusdal, A., & Burtscher, M. (2024). Performance Impact of Removing Data Races from GPU Graph Analytics Programs. In 2024 IEEE International Symposium on Workload Characterization (IISWC) (pp. 320–331). IEEE. https://doi.org/10.1109/iiswc63097.2024.00036
Liu, Y., Azami, N., Vanausdal, A. R., & Burtscher, M. (2024). Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common Bugs. ACM Transactions on Parallel Computing, 11.
Liu, Y., Azami, N., VanAusdal, A., & Burtscher, M. (2024). Sapphire: a Tool for Teaching Parallel Programming in Hundreds of Different Ways. In Proceedings of the 16th Annual International Conference on Education and New Learning Technologies. https://doi.org/10.21125/edulearn.2024.1136
Fallin, W. A., & Burtscher, M. (2024). Lessons Learned on the Path to Guaranteeing the Error Bound in Lossy Quantizers.
Rodriguez, A., Azami, N., & Burtscher, M. (2024). Adaptive Per-File Lossless Compression of Floating-Point Data.
Ruys, W., Lee, H., You, B., Talati, S., Park, J., Almgren-Bell, J., … Gligoric, M. (2024). A Deep Dive into Task-Based Parallelism in Python (pp. 1147–1149).
Burtchell, B. A., & Burtscher, M. (2024). Using Machine Learning to Predict Effective Compression Algorithms for Heterogeneous Datasets. In Proceedings of the 2024 Data Compression Conference. IEEE Computer Society.
Azami, N., Lawson, R., & Burtscher, M. (2024). LICO: An Effective, High-Speed, Lossless Compressor for Images. In Proceedings of the 2024 Data Compression Conference. IEEE Computer Society.

2023

Burtscher, M. (2023). ECL-SCC: A Strongly Connected Components Code for GPUs. Retrieved from https://cs.txstate.edu/~burtscher/research/ECL-SCC/
Fallin, W. A., & Burtscher, M. (2023). ECL-MST: A Minimum Spanning Tree Code for GPUs. Retrieved from https://cs.txstate.edu/~burtscher/research/ECL-MST/
Liu, Y., Azami, N., Vanausdal, A. R., & Burtscher, M. (2023). Indigo2: A Benchmark Suite of Hundreds of Parallel Implementations of 6 Graph Algorithms Written in CUDA, OpenMP, and C++ Threads. Retrieved from https://cs.txstate.edu/~burtscher/research/Indigo2Suite/
Azami, N., & Burtscher, M. (2023). LICO: A Fast Lossless Image Compressor. Retrieved from https://github.com/burtscher/LICO/
Alabandi, G. A. H., Sands, W., Biros, G., & Burtscher, M. (2023). A GPU Algorithm for Detecting Strongly Connected Components. https://doi.org/10.1145/3581784.3607071
Fallin, W. A., Gonzalez, A., Seo, J., Cornell, R., & Burtscher, M. (2023). A High-Performance MST Implementation for GPUs. https://doi.org/10.1145/3581784.3607093
Liu, Y., Azami, N., Vanausdal, A. R., & Burtscher, M. (2023). Choosing the Best Parallelization and Implementation Styles for Graph Analytics Codes: Lessons Learned from 1106 Programs. San Marcos, United States. https://doi.org/10.1145/3581784.3607038

2022

Burtscher, M., Kothari, A., & Fallin, W. A. (2022). SFP: A Fast Rectilinear Steiner Minimum Tree (RSMT) Heuristic. Retrieved from http://cs.txstate.edu/~burtscher/research/SFP/
Liu, Y., & Burtscher, M. (2022). ECL-APSP: An All-Pairs-Shortest-Paths CUDA Implementation of the Floyd-Warshall Algorithm. Retrieved from http://cs.txstate.edu/~burtscher/research/SFP/
Liu, Y., Azami, N., Vanausdal, A. R., & Burtscher, M. (2022). Indigo: A Suite of Thousands of Correct and Buggy Parallel Code Patterns. Retrieved from https://cs.txstate.edu/~burtscher/research/IndigoSuite/
Azami, N., & Burtscher, M. (2022). MPLG: Massively Parallel Log Graphs. Retrieved from https://cs.txstate.edu/~burtscher/research/MPLG/
Fallin, W. A., & Burtscher, M. (2022). Bit-flip Minimization. Retrieved from http://cs.txstate.edu/~burtscher/research/bit-flips/
Lee, H., Ruys, W., Yan, Y., Stephens, S., You, B., Fingler, H., … Biros, G. (2022). Parla: A High-level Orchestration System for Heterogeneous Architectures (pp. 1–15).
Azami, N., & Burtscher, M. (2022). Compressed In-memory Graphs for Accelerating GPU-based Analytics. In Proceedings of the 12th SC Workshop on Irregular Applications: Architectures and Algorithms.
Fallin, A., & Burtscher, M. (2022). Reducing Memory-Bus Energy Consumption of GPUs via Software-Based Bit-Flip Minimization. In Proceedings of the Workshop on Memory Centric High-Performance Computing.
Alabandi, G., & Burtscher, M. (2022). Improving the Speed and Quality of Parallel Graph Coloring. ACM Transactions on Parallel Computing, 9.
Liu, Y., Azami, N., & Burtscher, M. (2022). The Indigo Program-Verification Microbenchmark Suite of Irregular Parallel Code Patterns (pp. 24–34). Austin, United States.
Fallin, A., Kothari, A., He, J., Yanez, C., Pingali, K., Manohar, R., & Burtscher, M. (2022). A Simple, Fast, and GPU-friendly Steiner-Tree Heuristic. In Proceedings of the 23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing. Austin, United States.

2021

Alabandi, G. A. H., & Burtscher, M. (2021). graphB+: A Scalable Balancing Algorithm for Signed Social Network Graphs. Retrieved from http://cs.txstate.edu/~burtscher/research/graphB/
Alabandi, G. A. H., Tesic, J., Rusnak, L. J., & Burtscher, M. (2021). Discovering and balancing fundamental cycles in large signed graphs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–17). ACM. https://doi.org/10.1145/3458817.3476153
Maleki, S., Agarwal, U., Burtscher, M., & Pingali, K. (2021). BiPart: A Parallel and Deterministic Hypergraph Partitioner.

2020

Alabandi, G. A. H., Powers, E., & Burtscher, M. (2020). ECL-GC: A Fast Graph-Coloring Algorithm with Shortcutting for GPUs. Retrieved from http://cs.txstate.edu/~burtscher/research/ECL-GC/
Alabandi, G. A. H., Powers, E., & Burtscher, M. (2020). Increasing the Parallelism of Graph Coloring via Shortcutting (pp. 262–275).

2019

He, J., Burtscher, M., Manohar, R., & Pingali, K. (2019). SPRoute: A Scalable Parallel Negotiation-based Global Router (pp. 1–8).
Taheri, S., Briggs, I., Burtscher, M., & Gopalakrishnan, G. (2019). DiffTrace: Efficient Whole-Program Trace Analysis and Diffing for Debugging (pp. 1–12).
He, J., Burtscher, M., Manohar, R., & Pingali, K. (2019). SPRoute: A Scalable Parallel Negotiation-based Global Router.

2018

Burtscher, M., & Pingali, K. (2018). An Efficient CUDA Implementation of the Tree-based Barnes Hut n-Body Algorithm. Retrieved from http://cs.txstate.edu/~burtscher/research/ECL-BH/
Maleki, S., & Burtscher, M. (2018). PLR: An Automatic Parallelizer and Code Generator for Linear Recurrences. Retrieved from http://cs.txstate.edu/~burtscher/research/PLR/
Burtscher, M., Devale, S., Azimi, S., Jaiganesh, J., & Powers, E. (2018). A High-Quality and Fast Maximal Independent Set Implementation for GPUs. ACM Transactions on Parallel Computing, 5.
Taheri, S., Devale, S., Gopalakrishnan, G., & Burtscher, M. (2018). ParLoT: Efficient Whole-Program Call Tracing for HPC Applications. In Seventh Workshop on Extreme-Scale Programming Tools.
Burtscher, M. (2018). Computing a Movie of Zooming into a Fractal. In Workshop on Education for High-Performance Computing.
Lu, Y.-S., Ataei, S., He, J., Hua, W., Maleki, S., Yang, Y., … Manohar, R. (2018). Parallel Tools for Asynchronous VLSI Systems. In Workshop on Open-Source EDA Technology.
Jaiganesh, J., & Burtscher, M. (2018). A High-Performance Connected Components Implementation for GPUs (pp. 92–104).
Maleki, S., & Burtscher, M. (2018). Automatic Hierarchical Parallelization of Linear Recurrences (pp. 128–138).
Claggett, S., Azimi, S., & Burtscher, M. (2018). SPDP: An Automatically Synthesized Lossless Compression Algorithm for Floating-Point Data (pp. 335–344).

2017

Burtscher, M., & Devale, S. (2017). ECL-MIS: A Fast and High-Quality MIS Algorithm for GPUs. Retrieved from http://cs.txstate.edu/~burtscher/research/ECL-MIS/
Jaiganesh, J., & Burtscher, M. (2017). ECL-CC: A Fast Connected-Components Algorithm for GPUs. Retrieved from http://cs.txstate.edu/~burtscher/research/ECL-MIS/

2016

Maleki, S., Yang, A., & Burtscher, M. (2016). SAM: A GPU Prefix-Scan Template that Supports Higher Orders and Tuple Values. Retrieved from http://cs.txstate.edu/~burtscher/research/SAM/
Claggett, S., & Burtscher, M. (2016). SPDP: A Compression Algorithm for Single- and Double-Precision Floating-Point Data. Retrieved from http://cs.txstate.edu/~burtscher/research/SPDPcompressor/
Burtscher, M. (2016). Scientific IEEE 754 32-Bit Single-Precision Floating-Point Datasets. Retrieved from http://cs.txstate.edu/~burtscher/research/datasets/FPsingle/
Burtscher, M. (2016). FPcrush Real-Time Floating-Point Compressor Generator. Retrieved from http://cs.txstate.edu/~burtscher/research/FPcrush/
Burtscher, M., Hesaaraki, F., Mukka, H., & Yang, A. (2016). Real-Time Synthesis of Compression Algorithms for Scientific Data (pp. 264–275).
Dzhagaryan, A., Milenkovic, A., & Burtscher, M. (2016). Improving the Effectiveness of Data Transfers in Mobile Computing Using Lossless Compression Utilities. In Advances in Computer Communications and Networks (pp. 181–221). River Publishers.
Coplin, J., & Burtscher, M. (2016). Energy and Power Considerations of GPUs. In Advances in GPU Research and Practice (pp. 509–541). Elsevier.
Yang, A., Coplin, J., Mukka, H., Hesaaraki, F., & Burtscher, M. (2016). MPC: An Effective Floating-Point Compression Algorithm for GPUs. In Advances in GPU Research and Practice (pp. 327–347). Elsevier.
Coplin, J., Yang, A., Poppe, A., & Burtscher, M. (2016). Increasing Telemetry Throughput Using Customized and Adaptive Data Compression.
Maleki, S., Yang, A., & Burtscher, M. (2016). Higher-Order and Tuple-Based Massively-Parallel Prefix Sums (pp. 539–552).
Coplin, J., & Burtscher, M. (2016). Energy, Power, and Performance Characterization of GPGPU Benchmark Programs. In Twelfth Workshop on High-Performance, Power-Aware Computing.
Goodarzi, B., Burtscher, M., & Goswami, D. (2016). Parallel Graph Partitioning on a CPU-GPU Architecture. In Twenty Fifth International Heterogeneity in Computing Workshop.
Szczyrba, I., Szczyrba, R., & Burtscher, M. (2016). Geometric Representations of the n-anacci Constants and Generalizations Thereof. Journal of Integer Sequences, 19. Retrieved from <https://cs.uwaterloo.ca/journals/JIS/VOL19/Szczyrba/sz4.pdf>

2015

Claggett, S., & Burtscher, M. (2015). SPDP: A Compression Filter for HDF5. Retrieved from http://cs.txstate.edu/~burtscher/research/SPDP/
Yang, A., & Burtscher, M. (2015). MPC: a GPU-accelerated compressor/decompressor for single- and double-precision floating-point data. Retrieved from http://cs.txstate.edu/~burtscher/research/MPC/
Yang, A., Mukka, H., Hesaaraki, F., & Burtscher, M. (2015). MPC: A Massively Parallel Compression Algorithm for Scientific Data.
Rahman, S., Burtscher, M., Zong, Z., & Qasem, A. (2015). Maximizing Hardware Prefetch Effectiveness with Machine Learning.
Dzhagaryan, A., Milenković, A., & Burtscher, M. (2015). Quantifying Benefits of Lossless Compression Utilities on Modern Smartphones.
Taheri, S., Qasem, A., & Burtscher, M. (2015). A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels.
Szczyrba, I., Szczyrba, R., & Burtscher, M. (2015). Analytic Representations of the n-anacci Constants and Generalizations Thereof. Journal of Integer Sequences, 18. Retrieved from https://cs.uwaterloo.ca/journals/JIS/VOL18/Szczyrba/sz3.pdf
Burtscher, M., Peng, W., Qasem, A., Shi, H., Tamir, D., & Thiry, H. (2015). A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing.
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., … Zou, C. (2015). A Comparison of 3D Shape Retrieval Methods based on a Large-Scale Benchmark Supporting Multimodal Queries. Computer Vision and Image Understanding, 131, 1–27.
Coplin, J., & Burtscher, M. (2015). Effects of Source-Code Optimizations on GPU Performance and Energy Consumption. In Eighth Workshop on General Purpose Processing Using GPUs.
O’Neil, M. A., & Burtscher, M. (2015). Rethinking the Parallelization of Random-Restart Hill Climbing. In Eighth Workshop on General Purpose Processing Using GPUs.

2014

Burtscher, M. (2014). K20Power: Automatic Correction of the Power Profile from K20 and K40 Compute GPUs. Retrieved from http://www.cs.txstate.edu/~burtscher/research/K20power/
Burtscher, M. (2014). TSP_GPU2: A Fast GPU-based Solver for Large TSP Problems. Retrieved from http://www.cs.txstate.edu/~burtscher/research/TSP_GPU/
Coplin, J., & Burtscher, M. (2014). Power Characteristics of Irregular GPGPU Programs. In 2014 International Workshop on Green Programming, Computing, and Data Processing.
O’Neil, M. A., & Burtscher, M. (2014). Microarchitectural Performance Characterization of Irregular GPU Kernels.
Ge, R., Feng, X., Burtscher, M., & Zong, Z. (2014). PEACH: Performance and Energy Aware Cooperative Hybrid Computing.
Rabeti, H., & Burtscher, M. (2014). Feature Selection by Tree Search of Correlation-Adjusted Class Distances.
Uzelac, V., Milenkovic, A., Milenkovic, M., & Burtscher, M. (2014). Using Branch Predictors and Variable Encoding for On-the-Fly Program Tracing. IEEE Computer Society, 63, 1008–1020.
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., … Zou, C. (2014). SHREC’14 Track: Extended Large Scale Sketch-Based 3D Shape Retrieval. In Eurographics Workshop on 3D Object Retrieval.
Rocki, K., Burtscher, M., & Suda, R. (2014). The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both? (pp. 886–893).
Burtscher, M., Zecena, I., & Zong, Z. (2014). Measuring GPU Power with the K20 Built-in Sensor. In Seventh Workshop on General Purpose Processing Using GPUs (pp. 28–36).
Rocki, K., & Burtscher, M. (2014). The Future of Accelerator Programming. HPC Wire. Retrieved from http://www.hpcwire.com/2014/01/09/future-accelerator-programming/

2013

Burtscher, M., & Rabeti, H. (2013). A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches. Retrieved from http://www.cs.txstate.edu/~burtscher/research/ILCS/
Nasre, R., Burtscher, M., Mendez-Lojo, M., & Pingali, K. (2013). The LonestarGPU Benchmark Suite. Retrieved from http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu
Zecena, I., Burtscher, M., Jin, T., & Zong, Z. (2013). Evaluating the Performance and Energy Efficiency of N-Body Codes on Multi-Core CPUs and GPUs (pp. 1–8).
Rocki, K., Burtscher, M., & Suda, R. (2013). The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?, 442–443.
Burtscher, M., Peng, W., Qasem, A., Shi, H., & Tamir, D. (2013). Preparing Computer Science Students for an Increasingly Parallel World: Teaching Parallel Computing Early and Often.
Burtscher, M., Shi, H., Peng, W., Tamir, D., Qasem, A., & Thiry, H. (2013). Integrating Parallel Computing into the Undergraduate Curriculum at Texas State University: Experiences from the First Year. In Workshop on Parallel, Distributed, and High-Performance Computing in Undergraduate Curricula.
Ge, R., Vogt, R., Majumder, J., Alam, A., Burtscher, M., & Zong, Z. (2013). Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU. In 2nd International Workshop on Power-aware Algorithms, Systems, and Architectures (pp. 826–833).
Milenkovic, A., Dzhagaryan, A., & Burtscher, M. (2013). Performance and Energy Consumption of Lossless Compression/Decompression Utilities on Mobile Computing Platforms (pp. 254–263).
Burtscher, M., & Rabeti, H. (2013). GPU Acceleration of a Genetic Algorithm for the Synthesis of FSM-based Bimodal Predictors.
Burtscher, M., & Rabeti, H. (2013). A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches (pp. 1289–1298).
Nasre, R., Burtscher, M., & Pingali, K. (2013). Data-driven versus Topology-driven Irregular Computations on GPUs (pp. 463–474).
Dzhagaryan, A., Milenkovic, A., & Burtscher, M. (2013). Energy Efficiency of Lossless Data Compression on a Mobile Device: An Experimental Evaluation.
Burtscher, M., Peng, W., Qasem, A., Shi, H., & Tamir, D. (2013). Preparing Computer Science Students for the Multicore Era: Teaching Parallel Computing in the Undergraduate Curriculum.
Nasre, R., Burtscher, M., & Pingali, K. (2013). Atomic-free Irregular Computations on GPUs. In Sixth Workshop on General Purpose Processing Using GPUs (pp. 96–107).
Nasre, R., Burtscher, M., & Pingali, K. (2013). Morph Algorithms on GPUs (pp. 147–156).

2012

Burtscher, M., Nasre, R., & Pingali, K. (2012). A Quantitative Study of Irregular Programs on GPUs (pp. 141–151).
Szczyrba, I., Burtscher, M., & Szczyrba, R. (2012). Validating Critical Limits of the Universal Brain Injury Criterion (pp. 199–205).
Ratanaworabhan, P., Burtscher, M., Kirovski, D., & Zorn, B. (2012). Hardware Support for Enforcing Isolation in Lock-Based Parallel Programs (pp. 301–310).
Ratanaworabhan, P., Burtscher, M., Kirovski, D., Zorn, B., Nagpal, R., & Pattabiraman, K. (2012). Efficient Runtime Detection and Toleration of Asymmetric Races. IEEE Computer Society, 61, 548–562.
Mendez-Lojo, M., Burtscher, M., & Pingali, K. (2012). A GPU Implementation of Inclusion-based Points-to Analysis (pp. 107–116).

2011

O’Neil, M. A., & Burtscher, M. (2011). TSP_GPU: A Fast GPU-based Solver for the Traveling Salesman Problem. Retrieved from http://www.cs.txstate.edu/~burtscher/research/TSP_GPU/
Sopeju, O. A., & Burtscher, M. (2011). AutoSCOPE: Automatic Suggestions for Code Optimizations Using PerfExpert. Retrieved from http://www.cs.txstate.edu/~burtscher/research/PerfExpert/AutoSCOPE/
O’Neil, M. A., & Burtscher, M. (2011). GFC: A GPU-based Compressor for 64-Bit Floating-Point Data. Retrieved from http://www.cs.txstate.edu/~burtscher/research/GFC/
Burtscher, M., Kim, B.-D., Diamond, J., McCalpin, J., Koesterke, L., & Browne, J. (2011). PerfExpert: An Automated HPC Performance Measurement and Analysis Tool with Optimization Recommendations. Retrieved from http://www.tacc.utexas.edu/perfexpert/
Milenkovic, A., Uzelac, V., Milenkovic, M., & Burtscher, M. (2011). Caches and Predictors for Real-time, Unobtrusive, and Cost-Effective Program Tracing in Embedded Systems. IEEE Computer Society, 60, 992–1005.
Sopeju, O. A., Burtscher, M., Rane, A., & Browne, J. (2011). “AutoSCOPE: Automatic Suggestions for Code Optimizations Using PerfExpert.” (pp. 19–25).
O’Neil, M. A., Tamir, D., & Burtscher, M. (2011). A Parallel GPU Version of the Traveling Salesman Problem.
Szczyrba, I., Burtscher, M., & Szczyrba, R. (2011). “Computer Modeling of Diffuse Axonal Injury Mechanisms.” (pp. 401–407).
Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M. A., Kaleem, R., … Sui, X. (2011). The Tao of Parallelism in Algorithms (pp. 12–25).
Diamond, J., Burtscher, M., McCalpin, J., Kim, B.-D., Kecker, S., & Browne, J. (2011). Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications (pp. 32–43).
O’Neil, M. A., & Burtscher, M. (2011). Floating-Point Data Compression at 75 Gb/s on a GPU. In Fourth Workshop on General Purpose Processing Using GPUs (p. 7:1-7:7).
Hassaan, A., Burtscher, M., & Pingali, K. (2011). Ordered vs. Unordered: a Comparison of Parallelism and Work-efficiency in Irregular Algorithms (pp. 3–12).
Burtscher, M., & Pingali, K. (2011). An Efficient CUDA Implementation of the Tree-based Barnes Hut n-Body Algorithm. In GPU Computing Gems Emerald Edition (pp. 75–92).

2010

Arora, M., Burtscher, M., Deo, M., Hassaan, A., Kaleem, R., Kulkarni, M., … Sui, X. (2010). Galois: A System to Automatically Parallelize Irregular Programs at Runtime. Retrieved from http://iss.ices.utexas.edu/galois/
Livshits, B., Zorn, B., Burtscher, M., & Sinha, G. (2010). JSZap: Compressing JavaScript Source Code. Retrieved from http://research.microsoft.com/en-us/projects/jszap/
Burtscher, M., Kim, B.-D., Diamond, J., McCalpin, J., Koesterke, L., & Browne, J. (2010). PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications (pp. 1–11).
Uzelac, V., Milenkovic, A., Burtscher, M., & Milenkovic, M. (2010). Real-time Unobtrusive Program Execution Trace Compression Using Branch Predictor Events (pp. 97–106).
Sui, X., Nguyen, D., Burtscher, M., & Pingali, K. (2010). Parallel Graph Partitioning on Multicore Architectures. In the Languages and Compilers for Parallel Computing 23rd Annual Workshop (pp. 240–260).
Hassaan, A., Burtscher, M., & Pingali, K. (2010). Less is More: Trading off Work-efficiency for Scalability in Irregular Programs, 539–540.
Burtscher, M., Livshits, B., Sinha, G., & Zorn, B. (2010). JSZap: Compressing JavaScript Code.
Burtscher, M., & Ratanaworabhan, P. (2010). gFPC: A Self-Tuning Compression Algorithm.
Mendez-Lojo, M., Nguyen, D., Prountzos, D., Sui, X., Hassaan, M. A., Kulkarni, M., … Pingali, K. (2010). Structure-driven Optimizations for Amorphous Data-parallel Programs (pp. 3–14).

2009

Burtscher, M., & Ratanaworabhan, P. (2009). gFPC: A Self-Tuning Compression Algorithm. Retrieved from http://users.ices.utexas.edu/~burtscher/research/gFPC/
Burtscher, M., Hassaan, A., Kulkarni, M., Mathews, A., Mendez-Lojo, M., Nguyen, D., … Sui, X. (2009). ParaMeter: A Tool to Measure the Amorphous Data Parallelism of Programs. Retrieved from http://iss.ices.utexas.edu/parameter/
Burtscher, M., & Ratanaworabhan, P. (2009). pFPC: A Parallel Compressor for 64-Bit Floating-Point Data. Retrieved from http://users.ices.utexas.edu/~burtscher/research/pFPC/
Uzelac, V., Milenkovic, A., Milenkovic, M., & Burtscher, M. (2009). Real-time, Unobtrusive, and Efficient Program Execution Tracing with Stream Caches and Last Stream Predictors (pp. 173–178).
Burstedde, C., Burtscher, M., Ghattas, O., Stadler, G., Tu, T., & Wilcox, L. C. (2009). ALPS: A Framework for Parallel Adaptive PDE Solution. Journal of Physics: Conference Series, 180. Retrieved from http://www.iop.org/EJ/toc/1742-6596/180/1
Diamond, J., Kim, B.-D., Burtscher, M., Keckler, S., Pingali, K., & Browne, J. (2009). Multicore Optimization for Ranger.
Kulkarni, M., Burtscher, M., Cascaval, C., & Pingali, K. (2009). Lonestar: A Suite of Parallel Irregular Programs (pp. 65–76).
Burtscher, M., & Ratanaworabhan, P. (2009). pFPC: A Parallel Compressor for Floating-Point Data (pp. 43–52).
Ratanaworabhan, P., Burtscher, M., Kirovski, D., Nagpal, R., Pattabiraman, K., & Zorn, B. (2009). Detecting and Tolerating Asymmetric Races (pp. 173–184).
Kulkarni, M., Burtscher, M., Inkulu, R., Cascaval, C., & Pingali, K. (2009). How Much Parallelism is There in Irregular Applications? (pp. 3–14).
Burtscher, M., & Ratanaworabhan, P. (2009). FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Transactions on Computers, 58, 18–31.

2008

Szczyrba, I., Burtscher, M., & Szczyrba, R. (2008). Traumatic Brain Injury Simulations. Retrieved from http://www.funiosoft.com/brain/
Burtscher, M. (2008). CSC: A C-Subset to 3-Address-Code Compiler for Teaching Optimizing Compiler Courses.
Burtscher, M., Carribault, P., Cascaval, C., Kulkarni, M., Pingali, K., & Praun, C. von. (2008). The Lonestar Benchmark Suite. Retrieved from http://iss.ices.utexas.edu/lonestar/
Ratanaworabhan, P., & Burtscher, M. (2008). A Miss-Triggered Program Phase Detector based on Critical Basic Block Transitions. Retrieved from http://www.csl.cornell.edu/~paruj/cbbt.html
Burtscher, M., & Ke, J. (2008). Scientific IEEE 754 64-Bit Double-Precision Floating-Point Datasets. Retrieved from http://www.csl.cornell.edu/~burtscher/research/FPC/datasets.html
Szczyrba, I., Burtscher, M., & Szczyrba, R. (2008). On the Role of a Nonlinear Stress-Strain Relation in Brain Trauma (pp. 265–271). CSREA Press.
Burtscher, M., Kulkarni, M., Prountzos, D., & Pingali, K. (2008). On the Scalability of an Automatically Parallelized Irregular Application. In Languages and Compilers for Parallel Computing 21st Annual Workshop. Springer Verlag.
Ratanaworabhan, P., & Burtscher, M. (2008). Program Phase Detection based on Critical Basic Block Transitions (pp. 11–21).

2007

Szczyrba, I., Burtscher, M., & Szczyrba, R. (2007). Computational Modeling of Brain Dynamics during Repetitive Head Motions (pp. 143–149). CSREA Press.
Szczyrba, I., Burtscher, M., & Szczyrba, R. (2007). A Proposed New Brain Injury Tolerance Criterion Based on the Exchange of Energy between the Skull and the Brain.
Burtscher, M., & Ratanaworabhan, P. (2007). High Throughput Compression of Double-Precision Floating-Point Data (pp. 293–302).
Milenkovic, M., Milenkovic, A., & Burtscher, M. (2007). Algorithms and Hardware Structures for Unobtrusive Real-Time Compression of Instruction and Data Address Traces (pp. 283–292).

2006

Burtscher, M., & Ratanaworabhan, P. (2006). FPC: A Lossless High-Throughput Compression Algorithm for 64-Bit Floating-Point Data. Retrieved from http://www.csl.cornell.edu/~burtscher/research/FPC/
Ratanaworabhan, P., & Burtscher, M. (2006). Load-Optimized Source Code of the BioPerf Benchmark Suite. Retrieved from http://www.bioperf.org/RB06-BioPerf-source.tar.bz2
Ganusov, I., & Burtscher, M. (2006). Future Execution: A Prefetching Mechanism that Uses Multiple Cores to Speed up Single Threads. ACM Transactions on Architecture and Code Optimization, 3, 424–449.
Ratanaworabhan, P., & Burtscher, M. (2006). Load Instruction Characterization and Acceleration of the BioPerf Programs (pp. 71–79).
Ganusov, I., & Burtscher, M. (2006). Efficient Emulation of Hardware Prefetchers via Event-Driven Helper Threading (pp. 144–153).
Burtscher, M., & Szczyrba, I. (2006). Computational Simulation and Visualization of Traumatic Brain Injuries (pp. 101–107).
Ratanaworabhan, P., Ke, J., & Burtscher, M. (2006). Fast Lossless Compression of Scientific Floating-Point Data (pp. 133–142).
Jackson, S. J., & Burtscher, M. (2006). Self Optimizing Finite State Machines for Confidence Estimators. In 2006 Workshop on Introspective Architecture.

2005

Burtscher, M., & Sam, N. B. (2005). TCgen: A Flexible Tool to Automatically Generate Effective Trace Compressors out of User-Provided Trace Format Specifications. Retrieved from http://www.csl.cornell.edu/~burtscher/research/TCgen/
Liu, C. C., Ganusov, I., Burtscher, M., & Tiwari, S. (2005). Bridging the Processor-Memory Performance Gap with 3D IC Technology. IEEE Design & Test of Computers, 22, 556–564.
Burtscher, M., Ganusov, I., Jackson, S. J., Ke, J., Ratanaworabhan, P., & Sam, N. B. (2005). The VPC Trace-Compression Algorithms. IEEE Transactions on Computers, 54, 1329–1344.
Liu, C. C., Ganusov, I., Burtscher, M., & Tiwari, S. (2005). Improving Microprocessor Performance through 3D IC Technology. In Semiconductor Research Corporation’s TECHCON 2005 Conference.
Ganusov, I., & Burtscher, M. (2005). Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors (pp. 350–360).
Ke, J., Burtscher, M., & Speight, E. (2005). Tolerating Message Latency through the Early Release of Blocked Receives (pp. 19–29).
Ke, J., Burtscher, M., & Speight, E. (2005). Reducing Communication Time through Message Prefetching (pp. 557–563). CSREA Press.
Burtscher, M., & Szczyrba, I. (2005). On the Role of the Brain’s Geometry in Closed Head Injuries.
Burtscher, M., & Szczyrba, I. (2005). Numerical Modeling of Brain Dynamics in Traumatic Situations - Impulsive Translations (pp. 205–211). CSREA Press.
Ganusov, I., & Burtscher, M. (2005). On the Importance of Optimizing the Configuration of Stream Prefetchers. In 3rd Annual ACM SIGPLAN Workshop on Memory Systems Performance (pp. 54–61).
Sam, N. B., & Burtscher, M. (2005). Complex Load-Value Predictors: Why We Need Not Bother. In Fourth Annual Workshop on Duplicating, Deconstructing, and Debunking (pp. 16–24).
Sam, N. B., & Burtscher, M. (2005). On the Energy-Efficiency of Speculative Hardware (pp. 361–370). ACM Press.
Burtscher, M., & Sam, N. B. (2005). Automatic Generation of High-Performance Trace Compressors (pp. 229–240). IEEE Computer Society.

2004

Burtscher, M., Sam, N. B., Ganusov, I., Jackson, S. J., Ratanaworabhan, P., & Ke, J. (2004). VPC3: A High-Performance Trace Compression Algorithm. Retrieved from http://www.csl.cornell.edu/~burtscher/research/tracecompression/
Burtscher, M., & Ganusov, I. (2004). Automatic Synthesis of High-Speed Processor Simulators (pp. 55–66).
Ke, J., Burtscher, M., & Speight, E. (2004). Runtime Compression of MPI Messages to Improve the Performance and Scalability of Parallel Applications (pp. 59–65).
Sam, N. B., & Burtscher, M. (2004). Exploiting Type Information in Load-Value Predictors. In Second Value-Prediction and Value-Based Optimization Workshop (pp. 32–39).
Burtscher, M. (2004). VPC3: A Fast and Effective Trace-Compression Algorithm (pp. 167–176).

2003

Burtscher, M., & Jeeradit, M. (2003). Compressing Extended Program Traces Using Value Predictors (pp. 159–169).
Szczyrba, I., & Burtscher, M. (2003). On the Role of Ventricles in Diffuse Axonal Injuries (pp. 147–148).

2002

Jeeradit, M., & Burtscher, M. (2002). A Value Predictor Based Compression Algorithm for Program Traces. Retrieved from http://www.csl.cornell.edu/~burtscher/research/tracefilecompression/
Hoy, J., Burtscher, M., & Gehrke, J. (2002). A Data Miner to Find Frequent Sequential Patterns. Retrieved from http://www.csl.cornell.edu/~burtscher/research/patternmining/
Burtscher, M., & Zorn, B. G. (2002). Hybrid Load-Value Predictors. IEEE Transactions on Computers, 51, 759–774.
Speight, E., & Burtscher, M. (2002). Delphi: Prediction-Based Page Prefetching to Improve the Performance of Shared Virtual Memory Systems (pp. 49–55).
Burtscher, M., Diwan, A., & Hauswirth, M. (2002). Static Load Classification for Improving the Value Predictability of Data-Cache Misses (pp. 222–233).

2000

Burtscher, M., & Zorn, B. G. (2000). Hybridizing and Coalescing Load Value Predictors (pp. 81–92).

1999

Burtscher, M., & Zorn, B. G. (1999). Exploring Last n Value Prediction (pp. 66–76).
Burtscher, M., & Zorn, B. G. (1999). Prediction Outcome History-Based Confidence Estimation for Load Value Prediction. Journal of Instruction-Level Parallelism, 1. Retrieved from http://www.jilp.org/vol1/
Burtscher, M., & Zorn, B. (1999). Explaining and Exploiting Predictable Load Values.

1998

Burtscher, M., & Zorn, B. G. (1998). Profile-Supported Confidence Estimation for Load-Value Prediction. In PACT’98 Workshop on Profile and Feedback-Directed Compilation.