Select Publications
Journal articles
, 2002, 'Space-Time Equations for Non-Unimodular Mappings', International Journal of Computer Mathematics, 79, pp. 555 - 572, http://dx.doi.org/10.1080/00207160210953
, 2002, 'Time-Minimal Tiling When Rise Is Larger Than Zero', Parallel Computing, pp. 915 - 936
, 2000, 'Generating efficient tiled code for distributed memory machines', Parallel Computing, 26, pp. 1369 - 1410, http://dx.doi.org/10.1016/S0167-8191(00)00040-5
, 1999, 'Partitioning and Scheduling Loops on NOWs', Computer Communications, pp. 1017 - 1033
, 1998, 'Reuse-Driven Tiling for Improving Data Locality', International Journal of Parallel Programming, 26, http://dx.doi.org/10.1023/A:1018734612524
, 1997, 'On tiling as a loop transformation', Parallel Processing Letters, 7, pp. 409 - 424, http://dx.doi.org/10.1142/S0129626497000401
, 1997, 'Communication-Minimal Tiling of Uniform Dependence Loops', Journal of Parallel and Distributed Computin, 42, pp. 42 - 59, http://dx.doi.org/10.1006/jpdc.1997.1310
, 1997, 'On Tiling as a Loop Transformation', Parallel Processing Letters, 07, pp. 409 - 424, http://dx.doi.org/10.1142/S0129626497000401
, 1997, 'Unimodular transformations of non-perfectly nested loops', Parallel Computing, 22, pp. 1621 - 1645, http://dx.doi.org/10.1016/S0167-8191(96)00063-4
, 1996, 'Generalising the unimodular approach to restructure imperfectly nested loops', Parallel Processing Letters, 6, pp. 401 - 414, http://dx.doi.org/10.1142/S0129626496000388
, 1996, 'GENERALISING THE UNIMODULAR APPROACH TO RESTRUCTURE IMPERFECTLY NESTED LOOPS', Parallel Processing Letters, 06, pp. 401 - 414, http://dx.doi.org/10.1142/S0129626496000388
, 1996, 'Transformations of nested loops with non-convex iteration spaces', Parallel Computing, 22, pp. 339 - 368, http://dx.doi.org/10.1016/0167-8191(95)00069-0
, 1995, 'Closed-form mapping conditions for the synthesis of linear processor arrays', Journal of VLSI signal processing systems for signal, image and video technology, 10, pp. 181 - 199, http://dx.doi.org/10.1007/BF02407035
, 1994, 'Automating non-unimodular loop transformations for massive parallelism', Parallel Computing, 20, pp. 711 - 728, http://dx.doi.org/10.1016/0167-8191(94)90002-7
, 1992, 'A systolic array for pyramidal algorithms', Journal of VLSI Signal Processing, 4, pp. 89, http://dx.doi.org/10.1007/BF00930620
, 1992, 'ON THE LOADING, RECOVERY AND ACCESS OF STATIONARY DATA IN SYSTOLIC ARRAYS', LECTURE NOTES IN COMPUTER SCIENCE, 634, pp. 259 - 264, https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:A1992KQ20400031&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=891bb5ab6ba270e68a29b250adbe88d1
, 1992, 'The synthesis of control signals for one-dimensional systolic arrays', Integration, the VLSI Journal, 14, pp. 1 - 32, http://dx.doi.org/10.1016/0167-9260(92)90008-M
, 1991, 'A systolic array for pyramidal algorithms', Journal of VLSI signal processing systems for signal, image and video technology, 3, pp. 237 - 257, http://dx.doi.org/10.1007/BF00925834
, 1991, 'SPECIFYING CONTROL SIGNALS FOR SYSTOLIC ARRAYS BY UNIFORM RECURRENCE EQUATIONS', Parallel Processing Letters, 01, pp. 83 - 93, http://dx.doi.org/10.1142/S0129626491000033
, 1988, 'A new data structure for representing cell hierarchy in layout design', Computers & Graphics, 12, pp. 341 - 348, http://dx.doi.org/10.1016/0097-8493(88)90055-6
Conference Papers
, 2026, 'ATLAS: Efficient Dynamic GNN System Through Abstraction-Driven Incremental Execution', in Lecture Notes in Computer Science, pp. 17 - 33, http://dx.doi.org/10.1007/978-981-95-1021-4_2
, 2026, 'CeDMA: Enhancing Memory Efficiency of Heterogeneous Accelerator Systems Through Central DMA Controlling', in Lecture Notes in Computer Science, pp. 129 - 144, http://dx.doi.org/10.1007/978-981-95-1021-4_10
, 2026, 'TopServe: Task-Operator Co-scheduling for Efficient Multi-DNN Inference Serving on GPUs', in Lecture Notes in Computer Science, pp. 292 - 305, http://dx.doi.org/10.1007/978-3-031-99857-7_21
, 2025, 'Diff-MoE: Efficient Batched MoE Inference with Priority-Driven Differential Expert Caching', in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, pp. 1951 - 1965, presented at SC '25: The International Conference for High Performance Computing, Networking, Storage and Analysis, http://dx.doi.org/10.1145/3712285.3759903
, 2025, 'TENSORMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic Potential', in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, pp. 1631 - 1645, presented at SC '25: The International Conference for High Performance Computing, Networking, Storage and Analysis, http://dx.doi.org/10.1145/3712285.3759844
, 2025, 'MetaHG: Enhancing HGNN Systems Leveraging Advanced Metapath Graph Abstraction', in Eurosys 2025 Proceedings of the 2025 20th European Conference on Computer Systems, pp. 492 - 506, http://dx.doi.org/10.1145/3689031.3717492
, 2025, 'ReSBM: Region-based Scale and Minimal-Level Bootstrapping Management for FHE via Min-Cut', in International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS, pp. 924 - 939, http://dx.doi.org/10.1145/3669940.3707276
, 2025, 'ANT-ACE: An FHE Compiler Framework for Automating Neural Network Inference', in Cgo 2025 Proceedings of the 23rd ACM IEEE International Symposium on Code Generation and Optimization, pp. 193 - 208, http://dx.doi.org/10.1145/3696443.3708924
, 2025, 'Qiwu: Exploiting Ciphertext-Level SIMD Parallelism in Homomorphic Encryption Programs', in Cgo 2025 Proceedings of the 23rd ACM IEEE International Symposium on Code Generation and Optimization, pp. 523 - 537, http://dx.doi.org/10.1145/3696443.3708917
, 2025, 'Stack Filtering: Elevating Precision and Efficiency in Rust Pointer Analysis', in Cgo 2025 Proceedings of the 23rd ACM IEEE International Symposium on Code Generation and Optimization, pp. 331 - 346, http://dx.doi.org/10.1145/3696443.3708921
, 2025, 'VEGA: Automatically Generating Compiler Backends using a Pre-trained Transformer Model', in Cgo 2025 Proceedings of the 23rd ACM IEEE International Symposium on Code Generation and Optimization, pp. 90 - 106, http://dx.doi.org/10.1145/3696443.3708931
, 2025, 'MeHyper: Accelerating Hypergraph Neural Networks by Exploring Implicit Dataflows', in Proceedings International Symposium on High Performance Computer Architecture, pp. 920 - 933, http://dx.doi.org/10.1109/HPCA61900.2025.00073
, 2025, 'UnsafeCop: Towards Memory Safety for Real-World Unsafe Rust Code with Practical Bounded Model Checking', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 307 - 324, http://dx.doi.org/10.1007/978-3-031-71177-0_19
, 2024, 'A CFL-Reachability Formulation of Callsite-Sensitive Pointer Analysis with Built-In On-The-Fly Call Graph Construction', in Leibniz International Proceedings in Informatics Lipics, http://dx.doi.org/10.4230/LIPIcs.ECOOP.2024.18
, 2024, 'Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization', in International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS, pp. 797 - 812, http://dx.doi.org/10.1145/3620665.3640390
, 2024, 'A Context-Sensitive Pointer Analysis Framework for Rust and Its Application to Call Graph Construction', in Cc 2024 Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, pp. 60 - 72, http://dx.doi.org/10.1145/3640537.3641574
, 2024, 'A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs', in Proceedings of the Annual International Symposium on Microarchitecture Micro, pp. 437 - 450, http://dx.doi.org/10.1109/MICRO61859.2024.00040
, 2024, 'Correction-based Defense Against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing', in Proceedings of the 33rd Usenix Security Symposium, pp. 3603 - 3620
, 2024, 'Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing', in Proceedings International Symposium on Computer Architecture, pp. 382 - 395, http://dx.doi.org/10.1109/ISCA59077.2024.00036
, 2023, 'Statistical Type Inference for Incomplete Programs', in Esec Fse 2023 Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 720 - 732, http://dx.doi.org/10.1145/3611643.3616283
, 2023, 'Automatic Generation and Reuse of Precise Library Summaries for Object-Sensitive Pointer Analysis', in Proceedings 2023 38th IEEE ACM International Conference on Automated Software Engineering Ase 2023, Institute of Electrical and Electronics Engineers (IEEE), LUXEMBOURG, Echternach, pp. 736 - 747, presented at 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), LUXEMBOURG, Echternach, 11 September 2023 - 15 September 2023, http://dx.doi.org/10.1109/ASE56229.2023.00039
, 2023, 'Merge-Replay: Efficient IFDS-Based Taint Analysis by Consolidating Equivalent Value Flows', in Proceedings 2023 38th IEEE ACM International Conference on Automated Software Engineering Ase 2023, Institute of Electrical and Electronics Engineers (IEEE), LUXEMBOURG, Echternach, pp. 319 - 331, presented at 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), LUXEMBOURG, Echternach, 11 September 2023 - 15 September 2023, http://dx.doi.org/10.1109/ASE56229.2023.00027
, 2023, 'Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection', in Just R; Fraser G (ed.), Issta 2023 Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Association for Computing Machinery (ACM), WA, Seattle, pp. 101 - 113, presented at Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, WA, Seattle, 17 July 2023 - 21 July 2023, http://dx.doi.org/10.1145/3597926.3598041
, 2023, 'Hybrid Inlining: A Framework for Compositional and Context-Sensitive Static Analysis', in Issta 2023 Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 114 - 126, http://dx.doi.org/10.1145/3597926.3598042
, 2023, 'Accelerating Personalized Recommendation with Cross-level Near-Memory Processing', in Proceedings International Symposium on Computer Architecture, pp. 924 - 936, http://dx.doi.org/10.1145/3579371.3589101
, 2023, 'Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores', in International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS, pp. 483 - 497, http://dx.doi.org/10.1145/3582016.3582046
, 2023, 'AFaVS: Accurate Yet Fast Version Switching for Graph Processing Systems', in Proceedings International Conference on Data Engineering, pp. 53 - 66, http://dx.doi.org/10.1109/ICDE55515.2023.00012
, 2023, 'RSFuzzer: Discovering Deep SMI Handler Vulnerabilities in UEFI Firmware with Hybrid Fuzzing', in Proceedings IEEE Symposium on Security and Privacy, pp. 2155 - 2169, http://dx.doi.org/10.1109/SP46215.2023.10179421
, 2023, 'Two Birds with One Stone: Multi-Derivation for Fast Context-Free Language Reachability Analysis', in Proceedings 2023 38th IEEE ACM International Conference on Automated Software Engineering Ase 2023, pp. 624 - 636, http://dx.doi.org/10.1109/ASE56229.2023.00118
, 2022, 'A Dynamic Analysis Tool for Memory Safety Based on Smart Status and Source-Level Instrumentation', in Proceedings International Conference on Software Engineering, pp. 6 - 10, http://dx.doi.org/10.1109/ICSE-Companion55297.2022.9793834