Biography
I’m Ziyi Guan (管子义), an AI Infra Researcher with ByteDance — Seed Infra, Heterogeneous Computing Group (since Oct 2025). I will earn my Ph.D. degree from The University of Hong Kong (HKU) on Nov 2025, supervised by Dr. Ngai Wong and Prof. Graziano Chesi. Before that, I received my Bachelor’s degree from the School of Microelectronics at the Southern University of Science and Technology in 2021, supervised by Prof. Hao Yu.
At Seed Infra, I focus on end-to-end acceleration for domestic AI chips, including KV-cache compression for long-context serving, post-training quantization/pruning/sparsity, training-time acceleration, and RL-friendly quantized inference—driving lower latency, higher throughput, and better cost efficiency across heterogeneous systems.
My research spans LLM optimization (quantization, pruning, distillation) and LLM-based agents (GUI/App agents and RAG frameworks). I also explore hardware-efficient neural networks co-designed with emerging accelerators.
You can find my publication from my Google Scholar (DAC,EMNLP, ICCAD, DATE, and ongoing work submitted to TCAD)
Previously, I worked at Huawei Hong Kong Research Center (Nov 2024 – Sep 2025) on KG-RAG GUI Test Agents, enhancing multi-platform mobile app testing via retrieval-augmented reasoning and this line of work includes a paper accepted to EMNLP 2025 (Main).
Research Interests:
- LLM Compression & Optimization:
- Weight quantization, pruning, and distillation for efficient model deployment.
- LLM Agents:
- Designing GUI Agents and exploring Retrieval-Augmented Generation (RAG) for task automation.
- Developing post-training techniques such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for enhanced agent functionality and efficiency.
- Hardware-Efficient Neural Networks:
- Developing lightweight neural network architectures for constrained hardware such as RRAM.
You can find my Chinese CV here Chinese CV
You can contact me by my Email or by my WeChat: Wx555328778.
Selected Publications (*represents equal contribution)
first author and co-first author:
Ziyi Guan, et al, “APTQ+: Attention-FFN-aware Post Quantization for Layerwise LLM Acclerator on FPGA” submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (CCF-A) (Under Review)
Ziyi Guan, et al, “KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation” In Proceedings of EMNLP 2025 Main Conference (CCF-B NLP Top Conference). (EMNLP 2025 (CCF-B)) PDF
Yupeng Su, Ziyi Guan*, et al, “LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models”, In Proceedings of DAC 2025 poster: 62nd IEEE/ACM Design Automation Conference. (DAC 2025 (CCF-A)) PDF
Dingbang Liu, Ziyi Guan*, et al, “A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator”, In Proceedings of DAC 2025 poster: 62nd IEEE/ACM Design Automation Conference. (DAC 2025 (CCF-A))
Ziyi Guan, et al, “APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models”, In Proceedings of DAC 2024: 61st IEEE/ACM Design Automation Conference. (DAC 2024 Oral(CCF-A)), San Francisco, CA, June 23-27, 2024. PDF
Ziyi Guan, et al, “An Isotropic Shift-Pointwise Network for Crossbar-Efficient Neural Network Design”, Design, Automation & Test in Europe Conference & Exhibition (DATE 2024 (CCF-B)), March 25, Valencia, 2024. PDF
Ziyi Guan,et al, “A Video-based Fall Detection Network by Spatio-temporal Joint-point Model on Edge Devices”, Design, Automation & Test in Europe Conference & Exhibition (DATE 2021 (CCF-B)). IEEE, 2021, pp. 422–427. pdf
Ziyi Guan, et al, “A Hardware-Aware Neural Architecture Search Pareto Front Exploration for In-Memory Computing.” in 2022 IEEE 16th International Conference on Solid-State Integrated Circuit Technology (ICSICT). IEEE, 2022, pp. 1–4. pdf
Other authors:
- Shuwei Li, Ziyi Guan, et al. “A Fall Detection Network by 2D/3D Spatio-temporal Joint Models with Tensor Compression on Edge.” in ACM Transactions on Embedded Computing Systems (TECS) vol. 21, no. 6, pp. 1–19, 2022 PDF
(Last updated on Oct., 2025)
