Biography

I am Ziyi Guan (管子义), a fourth-year Ph.D. candidate at The University of Hong Kong (HKU), supervised by Dr. Ngai Wong and Prof. Graziano Chesi. I am expected to graduate from The University of Hong Kong (HKU) in September 2025. Before that, I received my Bachelor’s degree from the School of Microelectronics at the Southern University of Science and Technology in 2021, supervised by Prof. Hao Yu.

My research focuses on optimizing large language models (LLMs), particularly through compression techniques like quantization, pruning, and distillation. Additionally, I am involved in developing LLM-based agents, especially APP/GUI Agents and Retrieval-Augmented Generation (RAG) frameworks for improved decision-making and task automation. You can find my publication from my Google Scholar

Currently, I am a research intern at Huawei Hong Kong Research Center (HKRC)(starting from November 2024)., focusing on developing GUI Test Agents and enhancing test automation using RAG-based frameworks. This research aims to improve the efficiency of mobile application testing across multiple platforms.

Research Interests:

LLM Compression & Optimization:
- Weight quantization, pruning, and distillation for efficient model deployment.
LLM Agents:
- Designing GUI Agents and exploring Retrieval-Augmented Generation (RAG) for task automation.
- Developing post-training techniques such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for enhanced agent functionality and efficiency.
Hardware-Efficient Neural Networks:
- Developing lightweight neural network architectures for constrained hardware such as RRAM.

I have published research on LLM pruning, quantization, and hardware-efficient model deployment, with contributions to prominent conferences such as DAC and DATE. Additionally, I am co-authoring papers on cutting-edge LLM optimization and application automation submitted to ACL 2025.

I am actively seeking job opportunities starting in Fall 2025 in the field of LLM Agent, LLM compression techniques, especially focusing on model optimization, pruning, quantization, and hardware-efficient neural network design. Please feel free to contact me for potential positions or collaborations.

You can find my Chinese CV here Chinese CV

You can contact me by my Email or by my WeChat: Wx555328778 or Mobile Phone: +86 18823347376 (in Chinese Mainland)/ +852 46827377(in Hong Kong)

Selected Publications (*represents equal contribution)

first author and co-first author:

Ziyi Guan, et al, “KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation” submitted to EMNLP 2025 (CCF-A).
Yupeng Su, Ziyi Guan*, et al, “LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models”, In Proceedings of DAC 2025 poster: 61st IEEE/ACM Design Automation Conference. (DAC 2025 (CCF-A)) PDF
Dingbang Liu, Ziyi Guan*, et al, “A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator”, In Proceedings of DAC 2025 poster: 61st IEEE/ACM Design Automation Conference. (DAC 2025 (CCF-A))
Ziyi Guan, et al, “APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models”, In Proceedings of DAC 2024: 61st IEEE/ACM Design Automation Conference. (DAC 2024 Oral(CCF-A)), San Francisco, CA, June 23-27, 2024. PDF
Ziyi Guan, et al, “An Isotropic Shift-Pointwise Network for Crossbar-Efficient Neural Network Design”, Design, Automation & Test in Europe Conference & Exhibition (DATE 2024 (CCF-B)), March 25, Valencia, 2024. PDF
Ziyi Guan,et al, “A Video-based Fall Detection Network by Spatio-temporal Joint-point Model on Edge Devices”, Design, Automation & Test in Europe Conference & Exhibition (DATE 2021 (CCF-B)). IEEE, 2021, pp. 422–427. pdf
Ziyi Guan, et al, “A Hardware-Aware Neural Architecture Search Pareto Front Exploration for In-Memory Computing.” in 2022 IEEE 16th International Conference on Solid-State Integrated Circuit Technology (ICSICT). IEEE, 2022, pp. 1–4. pdf

Other authors:

Shuwei Li, Ziyi Guan, et al. “A Fall Detection Network by 2D/3D Spatio-temporal Joint Models with Tensor Compression on Edge.” in ACM Transactions on Embedded Computing Systems (TECS) vol. 21, no. 6, pp. 1–19, 2022 PDF

(Last updated on May., 2025)