Select Publications

By Mr Xiangyu Zhang

Conference Papers

Wu D; Han W; Liu Y; Wang T; Xu CZ; Zhang X; Shen J, 2025, 'Language Prompt for Autonomous Driving', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 8359 - 8367, http://dx.doi.org/10.1609/aaai.v39i8.32902

Huang B; Wen Y; Zhao Y; Hu Y; Liu Y; Jia F; Mao W; Wang T; Zhang C; Chen CW; Chen Z; Zhang X, 2025, 'SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 3617 - 3625, http://dx.doi.org/10.1609/aaai.v39i4.32376

Zhang X; Liu D; Xiao T; Xiao C; Szalay T; Shahin M; Ahmed B; Epps J, 2025, 'Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4263 - 4267, http://dx.doi.org/10.21437/Interspeech.2025-17

Zhang X; Zhou Y; Yang G; Cheng W; Chen T, 2025, 'Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation', in Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 6157 - 6172

Li S; Zhou Y; Zhang X; Han T, 2025, 'Defending Llms Against Jailbreak Prompts Through Key Information Protection and Selective Compression', in IEEE International Conference on Software Quality Reliability and Security Qrs, pp. 58 - 67, http://dx.doi.org/10.1109/QRS65678.2025.00017

Xie B; Liu Y; Wang T; Cao J; Zhang X, 2025, 'GLAD: A STREAMING SCENE GENERATOR FOR AUTONOMOUS DRIVING', in 13th International Conference on Learning Representations Iclr 2025, pp. 101163 - 101180

Yu E; Zhao L; Wei Y; Yang J; Wu D; Kong L; Wei H; Wang T; Ge Z; Zhang X; Tao W, 2025, 'Merlin: Empowering Multimodal LLMs with Foresight Minds', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 425 - 443, http://dx.doi.org/10.1007/978-3-031-73235-5_24

Zafar MA; Zhang X; Shahin M; Ahmed B, 2025, 'Multi-Class Dementia Detection Using Acoustic Features - ICASSP-2025 PROCESS Challenge', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889847

Wang H; Zheng A; Zhao Y; Wang T; Ge Z; Zhang X; Zhang Z, 2025, 'RECONSTRUCTIVE VISUAL INSTRUCTION TUNING', in 13th International Conference on Learning Representations Iclr 2025, pp. 15001 - 15026

Zhang X; Ma J; Shahin M; Ahmed B; Epps J, 2025, 'Rethinking Mamba in Speech Processing by Self-Supervised Models', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889111

Wang S; Jia F; Mao W; Liu Y; Zhao Y; Chen Z; Wang T; Zhang C; Zhang X; Zhao F, 2025, 'Stream Query Denoising for Vectorized HD-Map Construction', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 203 - 220, http://dx.doi.org/10.1007/978-3-031-72655-2_12

Wei H; Kong L; Chen J; Zhao L; Ge Z; Yang J; Sun J; Han C; Zhang X, 2025, 'Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 408 - 424, http://dx.doi.org/10.1007/978-3-031-73235-5_23

Zhang X; Liu H; Zhang Q; Ahmed B; Epps J, 2025, 'SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information', in Findings of the Association for Computational Linguistics: ACL 2025, Association for Computational Linguistics, pp. 10019 - 10030, presented at Findings of the Association for Computational Linguistics: ACL 2025, - , http://dx.doi.org/10.18653/v1/2025.findings-acl.521

Chen J; Kong L; Wei H; Liu C; Ge Z; Zhao L; Sun J; Han C; Zhang X, 2024, 'OneChart: Purify the Chart Structural Extraction via One Auxiliary Token', in Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, pp. 147 - 155, http://dx.doi.org/10.1145/3664647.3681167

Zhu K; Zhao L; Ge Z; Zhang X, 2024, 'Self-Supervised Visual Preference Alignment', in Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, pp. 291 - 300, http://dx.doi.org/10.1145/3664647.3680993

Liang R; Zhang X; Li Q; Wei L; Liu H; Kumar A; Leadingham KMK; Punnoose J; Garcia LP; Manbachi A, 2024, 'UNIDIRECTIONAL BRAIN-COMPUTER INTERFACE: ARTIFICIAL NEURAL NETWORK ENCODING NATURAL IMAGES TO fMRI RESPONSE IN THE VISUAL CORTEX', in 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, IEEE, SOUTH KOREA, Seoul, pp. 1851 - 1855, presented at 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), SOUTH KOREA, Seoul, 14 April 2024 - 19 April 2024, http://dx.doi.org/10.1109/ICASSP48485.2024.10446366

Tan H; Li J; Zhou Y; Wan J; Lei Z; Zhang X, 2024, 'Compound Text-Guided Prompt Tuning via Image-Adaptive Cues', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 5061 - 5069, http://dx.doi.org/10.1609/aaai.v38i5.28311

Chen H; Kong X; Zhang X; Zhao X; Huang K, 2024, 'DDAE: Towards Deep Dynamic Vision BERT Pretraining', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 1037 - 1045, http://dx.doi.org/10.1609/aaai.v38i2.27864

Jiang X; Li S; Liu Y; Wang S; Jia F; Wang T; Han L; Zhang X, 2024, 'Far3D: Expanding the Horizon for Surround-View 3D Object Detection', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 2561 - 2569, http://dx.doi.org/10.1609/aaai.v38i3.28033

Meng H; Zhang Q; Zhang X; Sethu V; Ambikairajah E, 2024, 'Binaural Selective Attention Model for Target Speaker Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4323 - 4327, http://dx.doi.org/10.21437/Interspeech.2024-683

Zhao L; Yu E; Ge Z; Yang J; Wei H; Zhou H; Sun J; Peng Y; Dong R; Han C; Zhang X, 2024, 'ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning', in Ijcai International Joint Conference on Artificial Intelligence, pp. 1743 - 1752

Dong R; Han C; Peng Y; Qi Z; Ge Z; Yang J; Zhao L; Sun J; Zhou H; Wei H; Kong X; Zhang X; Yi L; Ma K, 2024, 'DREAMLLM: SYNERGISTIC MULTIMODAL COMPREHENSION AND CREATION', in 12th International Conference on Learning Representations Iclr 2024

Liu H; Garcia LP; Zhang X; Khong AWH; Khudanpur S, 2024, 'ENHANCING CODE-SWITCHING SPEECH RECOGNITION WITH INTERACTIVE LANGUAGE BIASES', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, pp. 10886 - 10890, http://dx.doi.org/10.1109/ICASSP48485.2024.10448335

Wen Y; Zhao Y; Liu Y; Jia F; Wang Y; Luo C; Zhang C; Wang T; Sun X; Zhang X, 2024, 'Panacea: Panoramic and Controllable Video Generation for Autonomous Driving', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6902 - 6912, http://dx.doi.org/10.1109/CVPR52733.2024.00659

Zhang X; Liu D; Liu H; Zhang Q; Meng H; Garcia LP; Chng ES; Yao L, 2024, 'Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 159 - 171, http://dx.doi.org/10.18653/v1/2024.emnlp-main.9

Joshi A; Renzella J; Bhattacharyya P; Jha S; Zhang X, 2024, 'Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy', in Teachnlp 2024 6th Workshop on Teaching Nlp Proceedings of the Workshop, pp. 23 - 32

Zhang X; Liu H; Xu K; Zhang Q; Liu D; Ahmed B; Epps J, 2024, 'When LLMs Meet Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 146 - 158, http://dx.doi.org/10.18653/v1/2024.emnlp-main.8

Xuan Y; Zhang X; Li SS; Shen Z; Xie X; Garcia LP; Togneri R, 2023, 'A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters', in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1 - 5, presented at ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04 June 2023 - 10 June 2023, http://dx.doi.org/10.1109/icassp49357.2023.10095885

Qi Z; Dong R; Fan G; Ge Z; Zhang X; Ma K; Yi L, 2023, 'Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining', in Proceedings of Machine Learning Research, pp. 28223 - 28243

Yan J; Liu Y; Sun J; Jia F; Li S; Wang T; Zhang X, 2023, 'Cross Modal Transformer: Towards Fast and Robust 3D Object Detection', in Proceedings of the IEEE International Conference on Computer Vision, pp. 18222 - 18232, http://dx.doi.org/10.1109/ICCV51070.2023.01675

Zhang X; Li Y; Zhang X; Wang Y; Sun J, 2023, 'Differentiable Architecture Search with Random Features', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 16060 - 16069, http://dx.doi.org/10.1109/CVPR52729.2023.01541

Cai Q; Zhang X; Ding H; Tao R, 2023, 'Efficient Information Recognition for Machine-printed Invoices', in 2023 International Conference on Image Processing Computer Vision and Machine Learning Icicml 2023, pp. 913 - 918, http://dx.doi.org/10.1109/ICICML60161.2023.10424949

Wang S; Liu Y; Wang T; Li Y; Zhang X, 2023, 'Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection', in Proceedings of the IEEE International Conference on Computer Vision, pp. 3598 - 3608, http://dx.doi.org/10.1109/ICCV51070.2023.00335

Yu L; Xie T; Zhu Y; Yang T; Zhang X; Zhang C, 2023, 'Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration', in Advances in Neural Information Processing Systems

Chen Y; Liu J; Zhang X; Qi X; Jia J, 2023, 'LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 13488 - 13498, http://dx.doi.org/10.1109/CVPR52729.2023.01296

Zhou H; Ge Z; Li Z; Zhang X, 2023, 'MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception', in Proceedings of the IEEE International Conference on Computer Vision, pp. 8514 - 8523, http://dx.doi.org/10.1109/ICCV51070.2023.00785

Chua VYH; Liu H; Perera LPG; Woon FT; Wong J; Zhang X; Khudanpur S; Khong AWH; Dauwels J; Styles SJ, 2023, 'MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4109 - 4113, http://dx.doi.org/10.21437/Interspeech.2023-1446

Zhang Y; Wang T; Zhang X, 2023, 'MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 22056 - 22065, http://dx.doi.org/10.1109/CVPR52729.2023.02112

Wu D; Wang T; Zhang Y; Zhang X; Shen J, 2023, 'OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation', in Proceedings of the IEEE International Conference on Computer Vision, pp. 2749 - 2758, http://dx.doi.org/10.1109/ICCV51070.2023.00259

Liu Y; Yan J; Jia F; Li S; Gao A; Wang T; Zhang X, 2023, 'PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images', in Proceedings of the IEEE International Conference on Computer Vision, pp. 3239 - 3249, http://dx.doi.org/10.1109/ICCV51070.2023.00302

Li SS; Zhang X; Zhou S; Shu H; Liang R; Liu H; Garcia LP, 2023, 'PQLM - Multilingual Decentralized Portable Quantum Language Model', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49357.2023.10095215

Ding X; Chen H; Zhang X; Huang K; Han J; Ding G, 2023, 'RE-PARAMETERIZING YOUR OPTIMIZERS RATHER THAN ARCHITECTURES', in 11th International Conference on Learning Representations Iclr 2023

Wu D; Han W; Wang T; Dong X; Zhang X; Shen J, 2023, 'Referring Multi-Object Tracking', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 14633 - 14642, http://dx.doi.org/10.1109/CVPR52729.2023.01406

Han Q; Cai Y; Zhang X, 2023, 'RevColV2: Exploring Disentangled Representations in Masked Image Modeling', in Advances in Neural Information Processing Systems

Cai Y; Zhou Y; Han Q; Sun J; Kong X; Li J; Zhang X, 2023, 'REVERSIBLE COLUMN NETWORKS', in 11th International Conference on Learning Representations Iclr 2023

Wang X; Chu X; Han C; Zhang X, 2023, 'SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers', in Proceedings 2023 IEEE Cvf International Conference on Computer Vision Workshops Iccvw 2023, pp. 731 - 741, http://dx.doi.org/10.1109/ICCVW60793.2023.00081

Qi D; Yang T; Zhang X, 2023, 'Slot-guided Volumetric Object Radiance Fields', in Advances in Neural Information Processing Systems

Zhang X; Zhou Y; Yang G; Chen T, 2023, 'Syntax-Aware Retrieval Augmented Code Generation', in Findings of the Association for Computational Linguistics Emnlp 2023, pp. 1291 - 1302, http://dx.doi.org/10.18653/v1/2023.findings-emnlp.90

Zhang X; Mo S; Wan Z, 2023, 'Traffic sign detection algorithm based on YOLOv5 combined with BIFPN and attention mechanism', in Itoec 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference, pp. 966 - 970, http://dx.doi.org/10.1109/ITOEC57671.2023.10291927

Zhong Z; Cui J; Yang Y; Wu X; Qi X; Zhang X; Jia J, 2023, 'Understanding Imbalanced Semantic Segmentation Through Neural Collapse', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 19550 - 19559, http://dx.doi.org/10.1109/CVPR52729.2023.01873

Back to profile page

Filter by type

View all »

ORCID as entered in ROS

https://orcid.org/0009-0000-1839-646X