I am a Ph.D student at Peking University, advised by Prof. Libin Liu. I received my B.S. degree in electronic engineering from Tsinghua University in 2020.
My ultimate goal is to create a production-ready system capable of generating film-quality animation from natural language scripts.
To attain this goal, I joined a startup team from 2019 to 2021 and helped develop a language-assisted animation creation product that enables users (e.g., Tiktokers and kids) to effortlessly create simple animations by specifying natural language-described story scripts. You can view some demos here.
Recently, I am engaged in exploring a preliminary solution path from the academic perspective. My current work focuses on a core sub-module of the script-to-animation system: audio/text-driven human behavior motion synthesis, such as gesture generation.
Collaborators:
Talks:
We present a novel unified framework for physics-based motion control leveraging scalable discrete representations. By harnessing a large dataset of tens of hours of motions, our method learns a rich motion representation, allowing various downstream tasks such as physics-based pose estimation, interactive motion control, text2motion generation, and, more interestingly, seamless integration with large language models (LLMs).
We introduce Semantic Gesticulator, a novel framework designed to synthesize realistic co-speech gestures with strong semantic correspondence. Semantic Gesticulator fine-tunes an LLM to retrieve suitable semantic gesture candidates from a motion library. Combined with a novel, GPT-style generative model, the generated gesture motions demonstrate strong rhythmic coherence and semantic appropriateness.
We introduce GestureDiffuCLIP, a CLIP-guided, co-speech gesture synthesis system that creates stylized gestures in harmony with speech semantics and rhythm using arbitrary style prompts. Our highly adaptable system supports style prompts in the form of short texts, motion sequences, or video clips and provides body part-specific style control.