Biography

I’m Ziyi Guan (管子义), an AI Infra Researcher with ByteDance — Seed Infra, Heterogeneous Computing Group (since Oct 2025). I will earn my Ph.D. degree from The University of Hong Kong (HKU) on Nov 2025, supervised by Dr. Ngai Wong and Prof. Graziano Chesi. Before that, I received my Bachelor’s degree from the School of Microelectronics at the Southern University of Science and Technology in 2021, supervised by Prof. Hao Yu.

At Seed Infra, I focus on end-to-end acceleration for domestic AI chips, including KV-cache compression for long-context serving, post-training quantization/pruning/sparsity, training-time acceleration, and RL-friendly quantized inference—driving lower latency, higher throughput, and better cost efficiency across heterogeneous systems.

My research spans LLM optimization (quantization, pruning, distillation) and LLM-based agents (GUI/App agents and RAG frameworks). I also explore hardware-efficient neural networks co-designed with emerging accelerators.

You can find my publication from my Google Scholar (DAC,EMNLP, ICCAD, DATE, and ongoing work submitted to TCAD)

Previously, I worked at Huawei Hong Kong Research Center (Nov 2024 – Sep 2025) on KG-RAG GUI Test Agents, enhancing multi-platform mobile app testing via retrieval-augmented reasoning and this line of work includes a paper accepted to EMNLP 2025 (Main).

Research Interests:

  • LLM Compression & Optimization:
    • Weight quantization, pruning, and distillation for efficient model deployment.
  • LLM Agents:
    • Designing GUI Agents and exploring Retrieval-Augmented Generation (RAG) for task automation.
    • Developing post-training techniques such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for enhanced agent functionality and efficiency.
  • Hardware-Efficient Neural Networks:
    • Developing lightweight neural network architectures for constrained hardware such as RRAM.

You can find my Chinese CV here Chinese CV

You can contact me by my Email or gzygwp@gmail.com

Selected Publications (*represents equal contribution)

first author and co-first author:

  • Ziyi Guan, et al, “APTQ+: Attention-FFN-aware Post Quantization for Layerwise LLM Acclerator on FPGA” submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (CCF-A) (Under Review)

  • Ziyi Guan, et al, “KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation” In Proceedings of EMNLP 2025 Main Conference (CCF-B NLP Top Conference). (EMNLP 2025 (CCF-B)) PDF

  • Yupeng Su, Ziyi Guan*, et al, “LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models”, In Proceedings of DAC 2025 poster: 62nd IEEE/ACM Design Automation Conference. (DAC 2025 (CCF-A)) PDF

  • Dingbang Liu, Ziyi Guan*, et al, “A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator”, In Proceedings of DAC 2025 poster: 62nd IEEE/ACM Design Automation Conference. (DAC 2025 (CCF-A))

  • Ziyi Guan, et al, “APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models”, In Proceedings of DAC 2024: 61st IEEE/ACM Design Automation Conference. (DAC 2024 Oral(CCF-A)), San Francisco, CA, June 23-27, 2024. PDF

  • Ziyi Guan, et al, “An Isotropic Shift-Pointwise Network for Crossbar-Efficient Neural Network Design”, Design, Automation & Test in Europe Conference & Exhibition (DATE 2024 (CCF-B)), March 25, Valencia, 2024. PDF

  • Ziyi Guan,et al, “A Video-based Fall Detection Network by Spatio-temporal Joint-point Model on Edge Devices”, Design, Automation & Test in Europe Conference & Exhibition (DATE 2021 (CCF-B)). IEEE, 2021, pp. 422–427. pdf

  • Ziyi Guan, et al, “A Hardware-Aware Neural Architecture Search Pareto Front Exploration for In-Memory Computing.” in 2022 IEEE 16th International Conference on Solid-State Integrated Circuit Technology (ICSICT). IEEE, 2022, pp. 1–4. pdf

Other authors:

  • Shuwei Li, Ziyi Guan, et al. “A Fall Detection Network by 2D/3D Spatio-temporal Joint Models with Tensor Compression on Edge.” in ACM Transactions on Embedded Computing Systems (TECS) vol. 21, no. 6, pp. 1–19, 2022 PDF

(Last updated on Oct., 2025)