Select Publications
Conference Papers
, 2025, 'Language Prompt for Autonomous Driving', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 8359 - 8367, http://dx.doi.org/10.1609/aaai.v39i8.32902
, 2025, 'SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 3617 - 3625, http://dx.doi.org/10.1609/aaai.v39i4.32376
, 2025, 'Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4263 - 4267, http://dx.doi.org/10.21437/Interspeech.2025-17
, 2025, 'Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation', in Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 6157 - 6172
, 2025, 'Defending Llms Against Jailbreak Prompts Through Key Information Protection and Selective Compression', in IEEE International Conference on Software Quality Reliability and Security Qrs, pp. 58 - 67, http://dx.doi.org/10.1109/QRS65678.2025.00017
, 2025, 'GLAD: A STREAMING SCENE GENERATOR FOR AUTONOMOUS DRIVING', in 13th International Conference on Learning Representations Iclr 2025, pp. 101163 - 101180
, 2025, 'Merlin: Empowering Multimodal LLMs with Foresight Minds', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 425 - 443, http://dx.doi.org/10.1007/978-3-031-73235-5_24
, 2025, 'Multi-Class Dementia Detection Using Acoustic Features - ICASSP-2025 PROCESS Challenge', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889847
, 2025, 'RECONSTRUCTIVE VISUAL INSTRUCTION TUNING', in 13th International Conference on Learning Representations Iclr 2025, pp. 15001 - 15026
, 2025, 'Rethinking Mamba in Speech Processing by Self-Supervised Models', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49660.2025.10889111
, 2025, 'Stream Query Denoising for Vectorized HD-Map Construction', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 203 - 220, http://dx.doi.org/10.1007/978-3-031-72655-2_12
, 2025, 'Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model', in Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 408 - 424, http://dx.doi.org/10.1007/978-3-031-73235-5_23
, 2024, 'OneChart: Purify the Chart Structural Extraction via One Auxiliary Token', in Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, pp. 147 - 155, http://dx.doi.org/10.1145/3664647.3681167
, 2024, 'Self-Supervised Visual Preference Alignment', in Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, pp. 291 - 300, http://dx.doi.org/10.1145/3664647.3680993
, 2024, 'UNIDIRECTIONAL BRAIN-COMPUTER INTERFACE: ARTIFICIAL NEURAL NETWORK ENCODING NATURAL IMAGES TO fMRI RESPONSE IN THE VISUAL CORTEX', in 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, IEEE, SOUTH KOREA, Seoul, pp. 1851 - 1855, presented at 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), SOUTH KOREA, Seoul, 14 April 2024 - 19 April 2024, http://dx.doi.org/10.1109/ICASSP48485.2024.10446366
, 2024, 'Compound Text-Guided Prompt Tuning via Image-Adaptive Cues', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 5061 - 5069, http://dx.doi.org/10.1609/aaai.v38i5.28311
, 2024, 'DDAE: Towards Deep Dynamic Vision BERT Pretraining', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 1037 - 1045, http://dx.doi.org/10.1609/aaai.v38i2.27864
, 2024, 'Far3D: Expanding the Horizon for Surround-View 3D Object Detection', in Proceedings of the Aaai Conference on Artificial Intelligence, pp. 2561 - 2569, http://dx.doi.org/10.1609/aaai.v38i3.28033
, 2024, 'Binaural Selective Attention Model for Target Speaker Extraction', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4323 - 4327, http://dx.doi.org/10.21437/Interspeech.2024-683
, 2024, 'ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning', in Ijcai International Joint Conference on Artificial Intelligence, pp. 1743 - 1752
, 2024, 'DREAMLLM: SYNERGISTIC MULTIMODAL COMPREHENSION AND CREATION', in 12th International Conference on Learning Representations Iclr 2024
, 2024, 'ENHANCING CODE-SWITCHING SPEECH RECOGNITION WITH INTERACTIVE LANGUAGE BIASES', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, pp. 10886 - 10890, http://dx.doi.org/10.1109/ICASSP48485.2024.10448335
, 2024, 'Panacea: Panoramic and Controllable Video Generation for Autonomous Driving', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6902 - 6912, http://dx.doi.org/10.1109/CVPR52733.2024.00659
, 2024, 'Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 159 - 171, http://dx.doi.org/10.18653/v1/2024.emnlp-main.9
, 2024, 'Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy', in Teachnlp 2024 6th Workshop on Teaching Nlp Proceedings of the Workshop, pp. 23 - 32
, 2024, 'When LLMs Meet Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection', in Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference, pp. 146 - 158, http://dx.doi.org/10.18653/v1/2024.emnlp-main.8
, 2023, 'A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters', in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1 - 5, presented at ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04 June 2023 - 10 June 2023, http://dx.doi.org/10.1109/icassp49357.2023.10095885
, 2023, 'Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining', in Proceedings of Machine Learning Research, pp. 28223 - 28243
, 2023, 'Cross Modal Transformer: Towards Fast and Robust 3D Object Detection', in Proceedings of the IEEE International Conference on Computer Vision, pp. 18222 - 18232, http://dx.doi.org/10.1109/ICCV51070.2023.01675
, 2023, 'Differentiable Architecture Search with Random Features', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 16060 - 16069, http://dx.doi.org/10.1109/CVPR52729.2023.01541
, 2023, 'Efficient Information Recognition for Machine-printed Invoices', in 2023 International Conference on Image Processing Computer Vision and Machine Learning Icicml 2023, pp. 913 - 918, http://dx.doi.org/10.1109/ICICML60161.2023.10424949
, 2023, 'Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection', in Proceedings of the IEEE International Conference on Computer Vision, pp. 3598 - 3608, http://dx.doi.org/10.1109/ICCV51070.2023.00335
, 2023, 'Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration', in Advances in Neural Information Processing Systems
, 2023, 'LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 13488 - 13498, http://dx.doi.org/10.1109/CVPR52729.2023.01296
, 2023, 'MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception', in Proceedings of the IEEE International Conference on Computer Vision, pp. 8514 - 8523, http://dx.doi.org/10.1109/ICCV51070.2023.00785
, 2023, 'MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization', in Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, pp. 4109 - 4113, http://dx.doi.org/10.21437/Interspeech.2023-1446
, 2023, 'MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 22056 - 22065, http://dx.doi.org/10.1109/CVPR52729.2023.02112
, 2023, 'OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation', in Proceedings of the IEEE International Conference on Computer Vision, pp. 2749 - 2758, http://dx.doi.org/10.1109/ICCV51070.2023.00259
, 2023, 'PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images', in Proceedings of the IEEE International Conference on Computer Vision, pp. 3239 - 3249, http://dx.doi.org/10.1109/ICCV51070.2023.00302
, 2023, 'PQLM - Multilingual Decentralized Portable Quantum Language Model', in ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, http://dx.doi.org/10.1109/ICASSP49357.2023.10095215
, 2023, 'RE-PARAMETERIZING YOUR OPTIMIZERS RATHER THAN ARCHITECTURES', in 11th International Conference on Learning Representations Iclr 2023
, 2023, 'Referring Multi-Object Tracking', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 14633 - 14642, http://dx.doi.org/10.1109/CVPR52729.2023.01406
, 2023, 'RevColV2: Exploring Disentangled Representations in Masked Image Modeling', in Advances in Neural Information Processing Systems
, 2023, 'REVERSIBLE COLUMN NETWORKS', in 11th International Conference on Learning Representations Iclr 2023
, 2023, 'SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers', in Proceedings 2023 IEEE Cvf International Conference on Computer Vision Workshops Iccvw 2023, pp. 731 - 741, http://dx.doi.org/10.1109/ICCVW60793.2023.00081
, 2023, 'Slot-guided Volumetric Object Radiance Fields', in Advances in Neural Information Processing Systems
, 2023, 'Syntax-Aware Retrieval Augmented Code Generation', in Findings of the Association for Computational Linguistics Emnlp 2023, pp. 1291 - 1302, http://dx.doi.org/10.18653/v1/2023.findings-emnlp.90
, 2023, 'Traffic sign detection algorithm based on YOLOv5 combined with BIFPN and attention mechanism', in Itoec 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference, pp. 966 - 970, http://dx.doi.org/10.1109/ITOEC57671.2023.10291927
, 2023, 'Understanding Imbalanced Semantic Segmentation Through Neural Collapse', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 19550 - 19559, http://dx.doi.org/10.1109/CVPR52729.2023.01873
, 2023, 'Understanding Masked Image Modeling via Learning Occlusion Invariant Feature', in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6241 - 6251, http://dx.doi.org/10.1109/CVPR52729.2023.00604