My ultimate goal is to create a production-ready system capable of generating film-quality animation from natural language scripts.
To attain this goal, I joined a startup team from 2019 to 2021 and helped develop a language-assisted animation creation product that enables users (e.g., Tiktokers and kids) to effortlessly create simple animations by specifying natural language-described story scripts. You can view some demos here.
Recently, I am engaged in exploring a preliminary solution path from the academic perspective. My current work focuses on a core sub-module of the script-to-animation system: audio/text-driven human behavior motion synthesis, such as gesture generation.
We introduce GestureDiffuCLIP, a CLIP-guided, co-speech gesture synthesis system that creates stylized gestures in harmony with speech semantics and rhythm using arbitrary style prompts. Our highly adaptable system supports style prompts in the form of short texts, motion sequences, or video clips and provides body part-specific style control.
We present a co-speech gesture synthesis system that achieves convincing results both on the rhythm and semantics.