Select Publications

By Mr Xiangyu Zhang

Book Chapters

Zhang X, 2023, 'Convolutional neural networks and architectures', in Handbook of Face Recognition, pp. 37 - 65, http://dx.doi.org/10.1007/978-3-031-43567-6_2

Journal articles

Yang G; Zhou Y; Zhang X; Cheng W; Liu K; Chen X; Zhuo TY; Chen T, 2026, 'Less is more: Towards green code large language models via unified structural pruning', Information Processing and Management, 63, http://dx.doi.org/10.1016/j.ipm.2025.104580

Liu H; Zhang X; Zhang H; Garcia-Perera LP; Khong AWH; Chng ES; Watanabe S, 2025, 'Aligning Speech to Languages to Enhance Code-Switching Speech Recognition', IEEE Transactions on Audio Speech and Language Processing, 33, pp. 4712 - 4725, http://dx.doi.org/10.1109/TASLPRO.2025.3629290

Chen M; Zhang Q; Wang M; Zhang X; Liu H; Ambikairaiah E; Chen D, 2025, 'Selective State Space Model for Monaural Speech Enhancement', IEEE Transactions on Consumer Electronics, 71, pp. 5414 - 5424, http://dx.doi.org/10.1109/TCE.2024.3523297

Zhang X; Zhang Q; Liu H; Xiao T; Qian X; Ahmed B; Ambikairajah E; Li H; Epps J, 2025, 'Mamba in Speech: Towards an Alternative to Self-Attention', IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 33, pp. 1933 - 1948, http://dx.doi.org/10.1109/TASLPRO.2025.3566210

Zhang X; Zhou Y; Yang G; Han T; Chen T, 2024, 'Context-aware code generation with synchronous bidirectional decoder', Journal of Systems and Software, 214, http://dx.doi.org/10.1016/j.jss.2024.112066

Han C; Yang J; Sun J; Ge Z; Dong R; Zhou H; Mao W; Peng Y; Zhang X, 2024, 'Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception', IEEE Robotics and Automation Letters, 9, pp. 6544 - 6551, http://dx.doi.org/10.1109/LRA.2024.3401172

Yang G; Zhou Y; Chen X; Zhang X; Zhuo TY; Chen T, 2024, 'Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models', IEEE Transactions on Software Engineering, 50, pp. 2437 - 2457, http://dx.doi.org/10.1109/TSE.2024.3440503

Yang G; Zhou Y; Chen X; Zhang X, 2024, 'CodeScore-R: An automated robustness metric for assessing the functional correctness of code synthesis', Jisuanji Yanjiu Yu Fazhan Computer Research and Development, 61, pp. 291 - 306, http://dx.doi.org/10.7544/issn1000-1239.202330715

Yang G; Zhou Y; Chen X; Zhang X; Xu Y; Han T; Chen T, 2023, 'A syntax-guided multi-task learning approach for Turducken-style code generation', Empirical Software Engineering, 28, http://dx.doi.org/10.1007/s10664-023-10372-1

Shi X; Zhang X; Tang R; Yang J, 2023, 'Solve High-Dimensional Reflected Partial Differential Equations by Neural Network Method', Mathematical and Computational Applications, 28, http://dx.doi.org/10.3390/mca28040079

Yang G; Zhou Y; Chen X; Zhang X; Han T; Chen T, 2023, 'ExploitGen: Template-augmented exploit code generation based on CodeBERT', Journal of Systems and Software, 197, http://dx.doi.org/10.1016/j.jss.2022.111577

Chen Y; Zhang P; Kong T; Li Y; Zhang X; Qi L; Sun J; Jia J, 2023, 'Scale-Aware Automatic Augmentations for Object Detection With Dynamic Training', IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, pp. 2367 - 2383, http://dx.doi.org/10.1109/TPAMI.2022.3166905

Shu H; Liang R; Li Z; Goodridge A; Zhang X; Ding H; Nagururu N; Sahu M; Creighton FX; Taylor RH; Munawar A; Unberath M, 2023, 'Twin-S: a digital twin for skull base surgery', INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 18, pp. 1077 - 1084, http://dx.doi.org/10.1007/s11548-023-02863-9

Qi L; Wang Y; Chen Y; Chen YC; Zhang X; Sun J; Jia J, 2022, 'PointINS: Point-Based Instance Segmentation', IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, pp. 6377 - 6392, http://dx.doi.org/10.1109/TPAMI.2021.3085295

Li Y; Liu Z; Wu W; Yao H; Zhang X; Zhang C; Yin B, 2022, 'Weight-Dependent Gates for Network Pruning', IEEE Transactions on Circuits and Systems for Video Technology, 32, pp. 6941 - 6954, http://dx.doi.org/10.1109/TCSVT.2022.3175762

Liu Z; Zhang X; Shen Z; Wei Y; Cheng KT; Sun J, 2021, 'Joint Multi-Dimension Pruning via Numerical Gradient Update', IEEE Transactions on Image Processing, 30, pp. 8034 - 8045, http://dx.doi.org/10.1109/TIP.2021.3112041

Ren S; He K; Girshick R; Zhang X; Sun J, 2017, 'Object detection networks on convolutional feature maps', IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, pp. 1476 - 1481, http://dx.doi.org/10.1109/TPAMI.2016.2601099

Zhang X; Zou J; He K; Sun J, 2016, 'Accelerating Very Deep Convolutional Networks for Classification and Detection', IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, pp. 1943 - 1955, http://dx.doi.org/10.1109/TPAMI.2015.2502579

He K; Zhang X; Ren S; Sun J, 2015, 'Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition', IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, pp. 1904 - 1916, http://dx.doi.org/10.1109/TPAMI.2015.2389824

Conference Papers

Wu D; Han W; Liu Y; Wang T; Xu CZ; Zhang X; Shen J, 2025, 'Language Prompt for Autonomous Driving', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 8359 - 8367, http://dx.doi.org/10.1609/aaai.v39i8.32902

Huang B; Wen Y; Zhao Y; Hu Y; Liu Y; Jia F; Mao W; Wang T; Zhang C; Chen CW; Chen Z; Zhang X, 2025, 'SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 3617 - 3625, http://dx.doi.org/10.1609/aaai.v39i4.32376

Zhang X; Liu D; Xiao T; Xiao C; Szalay T; Shahin M; Ahmed B; Epps J, 2025, 'Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4263 - 4267, http://dx.doi.org/10.21437/Interspeech.2025-17

Zhang X; Zhou Y; Yang G; Cheng W; Chen T, 2025, 'Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation', in Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 6157 - 6172

Yu L; Zha J; Yang T; Xie T; Zhang X; Chan SHG; Zhang C, 2025, 'Continuous Semi-Implicit Models', in Proceedings of Machine Learning Research, pp. 73375 - 73400

Li S; Zhou Y; Zhang X; Han T, 2025, 'Defending Llms Against Jailbreak Prompts Through Key Information Protection and Selective Compression', in IEEE International Conference on Software Quality Reliability and Security Qrs, pp. 58 - 67, http://dx.doi.org/10.1109/QRS65678.2025.00017

Xie B; Liu Y; Wang T; Cao J; Zhang X, 2025, 'GLAD: A STREAMING SCENE GENERATOR FOR AUTONOMOUS DRIVING', in 13th International Conference on Learning Representations Iclr 2025, pp. 101163 - 101180

Zhang Q; Chen M; Song Z; Liu H; Zhang X; Li H, 2025, 'Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study', in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, http://dx.doi.org/10.1109/WASPAA66052.2025.11230983

Yu E; Zhao L; Wei Y; Yang J; Wu D; Kong L; Wei H; Wang T; Ge Z; Zhang X; Tao W, 2025, 'Merlin: Empowering Multimodal LLMs with Foresight Minds', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 425 - 443, http://dx.doi.org/10.1007/978-3-031-73235-5_24

Zafar MA; Zhang X; Shahin M; Ahmed B, 2025, 'Multi-Class Dementia Detection Using Acoustic Features - ICASSP-2025 PROCESS Challenge', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889847

Kou G; Jia F; Mao W; Liu Y; Zhao Y; Zhang Z; Yoshie O; Wang T; Li Y; Zhang X, 2025, 'PADriver: Towards Personalized Autonomous Driving', in Proceedings of the International Joint Conference on Neural Networks, http://dx.doi.org/10.1109/IJCNN64981.2025.11228638

Wang H; Zheng A; Zhao Y; Wang T; Ge Z; Zhang X; Zhang Z, 2025, 'RECONSTRUCTIVE VISUAL INSTRUCTION TUNING', in 13th International Conference on Learning Representations Iclr 2025, pp. 15001 - 15026

Zhang X; Ma J; Shahin M; Ahmed B; Epps J, 2025, 'Rethinking Mamba in Speech Processing by Self-Supervised Models', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889111

Zhang X; Liu H; Zhang Q; Ahmed B; Epps J, 2025, 'SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information', in Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 10019 - 10030, http://dx.doi.org/10.18653/v1/2025.findings-acl.521

Wang S; Jia F; Mao W; Liu Y; Zhao Y; Chen Z; Wang T; Zhang C; Zhang X; Zhao F, 2025, 'Stream Query Denoising for Vectorized HD-Map Construction', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 203 - 220, http://dx.doi.org/10.1007/978-3-031-72655-2_12

Wei H; Kong L; Chen J; Zhao L; Ge Z; Yang J; Sun J; Han C; Zhang X, 2025, 'Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 408 - 424, http://dx.doi.org/10.1007/978-3-031-73235-5_23

Chen J; Kong L; Wei H; Liu C; Ge Z; Zhao L; Sun J; Han C; Zhang X, 2024, 'OneChart: Purify the Chart Structural Extraction via One Auxiliary Token', in Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, pp. 147 - 155, http://dx.doi.org/10.1145/3664647.3681167

Zhu K; Zhao L; Ge Z; Zhang X, 2024, 'Self-Supervised Visual Preference Alignment', in Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, pp. 291 - 300, http://dx.doi.org/10.1145/3664647.3680993

Liang R; Zhang X; Li Q; Wei L; Liu H; Kumar A; Leadingham KMK; Punnoose J; Garcia LP; Manbachi A, 2024, 'UNIDIRECTIONAL BRAIN-COMPUTER INTERFACE: ARTIFICIAL NEURAL NETWORK ENCODING NATURAL IMAGES TO fMRI RESPONSE IN THE VISUAL CORTEX', in 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, IEEE, SOUTH KOREA, Seoul, pp. 1851 - 1855, presented at 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), SOUTH KOREA, Seoul, 14 April 2024 - 19 April 2024, http://dx.doi.org/10.1109/ICASSP48485.2024.10446366

Tan H; Li J; Zhou Y; Wan J; Lei Z; Zhang X, 2024, 'Compound Text-Guided Prompt Tuning via Image-Adaptive Cues', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 5061 - 5069, http://dx.doi.org/10.1609/aaai.v38i5.28311

Chen H; Kong X; Zhang X; Zhao X; Huang K, 2024, 'DDAE: Towards Deep Dynamic Vision BERT Pretraining', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 1037 - 1045, http://dx.doi.org/10.1609/aaai.v38i2.27864

Jiang X; Li S; Liu Y; Wang S; Jia F; Wang T; Han L; Zhang X, 2024, 'Far3D: Expanding the Horizon for Surround-View 3D Object Detection', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 2561 - 2569, http://dx.doi.org/10.1609/aaai.v38i3.28033

Meng H; Zhang Q; Zhang X; Sethu V; Ambikairajah E, 2024, 'Binaural Selective Attention Model for Target Speaker Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4323 - 4327, http://dx.doi.org/10.21437/Interspeech.2024-683

Zhao L; Yu E; Ge Z; Yang J; Wei H; Zhou H; Sun J; Peng Y; Dong R; Han C; Zhang X, 2024, 'ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning', in Ijcai International Joint Conference on Artificial Intelligence, pp. 1743 - 1752

Dong R; Han C; Peng Y; Qi Z; Ge Z; Yang J; Zhao L; Sun J; Zhou H; Wei H; Kong X; Zhang X; Yi L; Ma K, 2024, 'DREAMLLM: SYNERGISTIC MULTIMODAL COMPREHENSION AND CREATION', in 12th International Conference on Learning Representations Iclr 2024

Liu H; Garcia LP; Zhang X; Khong AWH; Khudanpur S, 2024, 'ENHANCING CODE-SWITCHING SPEECH RECOGNITION WITH INTERACTIVE LANGUAGE BIASES', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, pp. 10886 - 10890, http://dx.doi.org/10.1109/ICASSP48485.2024.10448335

Wen Y; Zhao Y; Liu Y; Jia F; Wang Y; Luo C; Zhang C; Wang T; Sun X; Zhang X, 2024, 'Panacea: Panoramic and Controllable Video Generation for Autonomous Driving', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6902 - 6912, http://dx.doi.org/10.1109/CVPR52733.2024.00659

Zhang X; Liu D; Liu H; Zhang Q; Meng H; Garcia LP; Chng ES; Yao L, 2024, 'Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 159 - 171, http://dx.doi.org/10.18653/v1/2024.emnlp-main.9

Joshi A; Renzella J; Bhattacharyya P; Jha S; Zhang X, 2024, 'Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy', in Teachnlp 2024 6th Workshop on Teaching Nlp Proceedings of the Workshop, pp. 23 - 32

Zhang X; Liu H; Xu K; Zhang Q; Liu D; Ahmed B; Epps J, 2024, 'When LLMs Meet Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 146 - 158, http://dx.doi.org/10.18653/v1/2024.emnlp-main.8

Back to profile page

Filter by type

View all »

ORCID as entered in ROS

https://orcid.org/0009-0000-1839-646X