Tenglong Ao 敖腾隆

Ph.D Candidate

Peking University

Biography

I am a Ph.D student at Peking University, advised by Prof. Libin Liu. I received my B.S. degree in electronic engineering from Tsinghua University in 2020.

Recently, my interests include:

Interactive Humanoid Agent: e.g., “Body of Her” blog: a human-centric entry point for interactive world simulators.
Human Motion Synthesis: e.g., co-speech gesture synthesis.
Long Story Script-to-Video Generation: I joined a startup team from 2019 to 2021 and helped develop a language-assisted animation creation product that enables users (e.g., Tiktokers and kids) to effortlessly create simple animations by specifying natural language-described story scripts. You can view some demos here.

Collaborators:

Talks:

GAMES Webinar (20230414)

Projects

Tenglong Ao

August 2024 Technical Report

Body of Her: A Preliminary Study on End-to-End Humanoid Agent

A real-time, duplex, interactive end-to-end network capable of modeling realistic agent behaviors, including speech, full-body movements for talking, responding, idling, and manipulation.

Heyuan Yao, Zhenhua Song, Yuyang Zhou, Tenglong Ao, Baoquan Chen, Libin Liu

May 2024 ACM Transactions on Graphics (TOG), SIGGRAPH 2024.

MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations

We present a novel unified framework for physics-based motion control leveraging scalable discrete representations. By harnessing a large dataset of tens of hours of motions, our method learns a rich motion representation, allowing various downstream tasks such as physics-based pose estimation, interactive motion control, text2motion generation, and, more interestingly, seamless integration with large language models (LLMs).

Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

May 2024 ACM Transactions on Graphics (TOG), SIGGRAPH 2024.

Semantic Gesticulator: Semantics-aware Co-speech Gesture Synthesis

We introduce Semantic Gesticulator, a novel framework designed to synthesize realistic co-speech gestures with strong semantic correspondence. Semantic Gesticulator fine-tunes an LLM to retrieve suitable semantic gesture candidates from a motion library. Combined with a novel, GPT-style generative model, the generated gesture motions demonstrate strong rhythmic coherence and semantic appropriateness.

Tenglong Ao, Zeyi Zhang, Libin Liu

March 2023 ACM Transactions on Graphics (TOG), SIGGRAPH 2023. Technical Best Paper Honorable Mention

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

We introduce GestureDiffuCLIP, a CLIP-guided, co-speech gesture synthesis system that creates stylized gestures in harmony with speech semantics and rhythm using arbitrary style prompts. Our highly adaptable system supports style prompts in the form of short texts, motion sequences, or video clips and provides body part-specific style control.

Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

November 2022 ACM Transactions on Graphics (TOG), SIGGRAPH Asia 2022. Technical Best Paper Award

Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

We present a co-speech gesture synthesis system that achieves convincing results both on the rhythm and semantics.

Nice Colleagues, Tenglong Ao

September 2021 A Commercial Product

Animation Synthesis from Story Scripts

We develop a language-assisted animation creation product that enables users (e.g., Tiktokers and kids) to effortlessly create simple animations by specifying natural language-described story scripts.