CCEdit:

Creative and Controllable Video Editing via Diffusion Models

CVPR 2024

Ruoyu Feng1,*, Wenming Weng1, Yanhui Wang1,
Yuhui Yuan2, Jianmin Bao2, Chong Luo2,†, Zhibo Chen1, Baining Guo2

1University of Science and Technology of China, 2Microsoft Research Asia,
*This work is done when Ruoyu Feng is an intern with MSRA
Project Lead


CCEdit is a trident network that can edit videos in a creative and controllable manner. By decouple the video editing process into video consistency maintaining, structure preserving, and appearance editing, CCEdit can achieve high-fidelity video editing results.

Video

Approach

Approach


Illustration of our overall framework. Structure and appearance information in the target video are modulated independently, and seamlessly integrated into the main branch. Structure control is conducted via the pre-trained ControlNet. Appearance control is achieved precisely by the edited key frame. Details regarding the autoencoder and iterative denoising process are omitted for simplicity. "P", "S", "B", "L" indicate prompt, structure, base model, and LoRA, respectively.

For the subsequent results, except for the Comparison section where we use SD-v1.5 as the base model and depth as the structure representation for fair comparison, we typically first edit the center frame using Stable Diffusion WebUI (which takes about 3 minutes) with personalized T2I models, and then use them as the appearance reference, selecting depth as the structure representation. In this process, both editing the center frame with Stable Diffusion WebUI and the video editing process utilize the same personalized T2I models.

Basic Functions


🌟 Global Transfer 🌟


Global transfer includes changes in style, transformations of the target object, and transformations of attributes, etc.


Input Video (Model: ReV Animated) A young and beautiful girl smiles to the camera, anime style.
Input Video (Model: ToonYou) City night, cyberpunk style.
Input Video (Model: ReV Animated, kMechAnimal) A mechanical bear is running.
Input Video (Model: ReV Animated) A paladin drives a motorcycle, fire on the road.
Input Video (Model: MechaMix) Portrait shot of robot, mechanical, sophisticated ancient Egypt style filigree inlays, cyberpunk, vibrant color, half body, dark vibes, volumetric light, dramatic background. Input Video (Model: ReV Animated) A magician in hood, blue eye, blue flame.
Input Video (Model: majicMIX realistic) A young and beautiful girl. Input Video (Model: Counterfeit-V3.0) A cute girl with a hat.

🌟 Foreground Editing 🌟


Foreground editing enables customized foreground object change.

Input Video (Model: hellofantasytime, fat animal) A cute corgi stick out tongue.
Input Video (Model: hellomecha, Building_block_world) A Lego brick -style car stops on the road. BJ_Lego bricks, no_humans, ground_vehicle, motor_vehicle, science_fiction, vehicle_focus, cinematic lighting, strong contrast, high level of detail.
Input Video (Model: ToonYou) A tiger is walking, anime style.
Input Video (Model: Counterfeit-V3.0) A young girl, anime style.
Input Video (Model: Counterfeit-V3.0) A cute dog ran towards the camera.

🌟 Background Editing 🌟


Background editing enables customized background editing and replacement.

Input Video (Model: ToonYou) A man is running on the beach, sunset.
Input Video (Model: ReV Animated) A person walks in the field. The Milky Way is in the sky, at night.
Input Video (Model: ToonYou) A woman is doing yoga, in winter, snow.
Input Video (Model: ToonYou) A woman is walking on the country road, sunset, back to the camera. Input Video (Model: SD-v1.5) A woman is drinking wine in a spring field.
Input Video (Model: ReV Animated) A man in a suit walks into a technological city, feeling futuristic and cinematic. Input Video (Model: ReV Animated) A man with black suit and a black horse walk in the wood.

Features


🌟 Different Styles 🌟


Users can customize the styles the target videos by using prompts, personalized T2I models, and reference key frame.


Input Video (Model: ToonYou) City, anime style. (Model: ToonYou) City at night, cyberpunk style.
Input Video (Model: hellomecha, Building_block_world) A LEGO-style aircraft carrier. (Model: ReV Animated) Spaceship flys in the sky.
Input Video (Model: Counterfeit-V3.0) A girl, anime style. (Model: majicMIX realistic) A girl.
Input Video (Model: majicMIX realistic) A girl. (Model: ToonYou) A girl, anime style. (Model: ToonYou) A girl, anime style. (Model: ToonYou) A girl, anime style.

🌟 Different Granularities 🌟


Users can customize the structure inheritance from the source video to the target video at different granularities.


Input Video (Model: GhostMix , 泼墨 ink splash) (w/o reference frame, line drawing) A man holding a sword stands in front of a waterfall, cold face, ink splash style. (Model: GhostMix , 泼墨 ink splash) (w/o reference frame, pidi boundary) A man holding a sword stands in front of a waterfall, cold face, ink splash style.
Input Video (Model: GhostMix , 泼墨 ink splash) (w/o reference frame, line drawing) A man splits the water surface with a sword, beneath a waterfall, ink splash style. (Model: GhostMix , 泼墨 ink splash) (w/o reference frame, pidi boundary) A man splits the water surface with a sword, beneath a waterfall, ink splash style. (Model: GhostMix , 泼墨 ink splash) A man splits the water surface with a sword, beneath a waterfall, ink splash style. (Model: GhostMix , 泼墨 ink splash) A man splits the water surface with a sword, beneath a waterfall, ink splash style.

🌟 Different Content 🌟


Input Video (Model: ToonYou) A tiger is walking, anime style. (Model: ToonYou) A bear is walking, anime style. (Model: ToonYou) A panda is walking, anime style.

🌟 Long Video Editing 🌟


Input Video (Model: ReV Animated) City overlook, at night, in winter.

Input Video (Model: ToonYou) A sailing boat, anime style.

Input Video (Model: ToonYou) A bear is walking.

Input Video (Model: ToonYou) The Joker is talking.

More Results


Input Video (Model: ReV Animated) (w/o reference frame) A beautiful woman sits in grass and smiles, flowers in background.
Input Video (Model: ToonYou) A woman is doing yoga, anime style.
Input Video (Model: ToonYou) City night, cyberpunk style.
Input Video (Model: ToonYou) City, anime style.
Input Video (Model: ReV Animated) A spaceship flying in the space, galaxy background, ultra detailed, Hyperrealistic, sharp focus, UHD, octane render.
Input Video (Model: ReV Animated) A spaceship flying over the city.
Input Video (Model: ReV Animated) A spaceship flying over the city. (Model: ReV Animated) A spaceship flying over the city.
Input Video (Model: ToonYou) A light rail passes by in the city at night, anime style. (Model: ToonYou) A light rail passes by in the city at night, anime style, high contrast. (Model: Aniflatmix - Anime Flat Color Style Mix (平涂り風/平涂风)) A light rail passes by in the city at night, anime style.

Comparison


To ensure fairness in comparison, for our model, depth is used as the structure condition, and the off-the-shelf image editing method PnP is employed to automate the editing of center frames. For other methods, the default settings provided in their codebases are used. All models employing Stable Diffusion use version 1.5.


Original Prompt: A man with a backpack hikes on a rocky terrain, surrounded by tall, rugged mountains and scattered boulders.

Video Type: Human

Camera Motion: 2

Object Motion: 2

Scene Complexity: 2


Target Prompt: An astronaut with a jetpack floats above a Martian landscape, with red rocky terrains and tall, alien-like mountains in the backdrop.

Editing Type: Compound Change

Fantasy Level: 3

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A rider on a horse jumping over an obstacle in an equestrian competition with a clear sky and other obstacles in the background.

Video Type: Human

Camera Motion: 2

Object Motion: 3

Scene Complexity: 2


Target Prompt: A rider on a horse jumping over an obstacle in an equestrian competition, rendered in Van Gogh style with swirling skies and vibrant colors.

Editing Type: Style Change

Fantasy Level: 1

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A BMX rider in full gear maneuvering their bike over a dirt ramp in a BMX track.

Video Type: Human

Camera Motion: 3

Object Motion: 3

Scene Complexity: 3


Target Prompt: A BMX rider in full gear maneuvering his bike over a dirt ramp in a night-time cityscape with skyscrapers in the background.

Editing Type: Background Change

Fantasy Level: 2

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A butterfly with black and orange wings perches on a plant amidst a field of golden grass.

Video Type: Animal

Camera Motion: 1

Object Motion: 2

Scene Complexity: 1


Target Prompt: A dragonfly with shimmering wings perches on a plant amidst a field of golden grass.

Editing Type: Object Change

Fantasy Level: 1

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A playful corgi dog with its mouth open and tongue out, looking excitedly at the camera.

Video Type: Animal

Camera Motion: 1

Object Motion: 1

Scene Complexity: 1


Target Prompt: A fantasy dragon with fiery eyes and smoke coming out of its nostrils, perched on top of a rocky cliff, thunderstorm behind.

Editing Type: Compound Change

Fantasy Level: 3

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: Two individuals crossing a street at a railway intersection with buildings in the background.

Video Type: Landscape

Camera Motion: 1

Object Motion: 2

Scene Complexity: 3


Target Prompt: Two animated characters from a classic video game crossing a pixelated street, with a digitalized cityscape in the background.

Editing Type: Compound Change

Fantasy Level: 3

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A close-up of daisies with vibrant yellow centers and white petals.

Video Type: Landscape

Camera Motion: 1

Object Motion: 1

Scene Complexity: 1


Target Prompt: A close-up of daisies with vibrant yellow centers and white petals, vibrant strokes of an impressionist painting.

Editing Type: Style Change

Fantasy Level: 1

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A cruise ship sailing through the ocean with a city skyline in the background.

Video Type: Object

Camera Motion: 1

Object Motion: 1

Scene Complexity: 2


Target Prompt: A space cruiser modeled after a cruise ship gliding through the cosmos with a nebula illuminating the background.

Editing Type: Compound Change

Fantasy Level: 3

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Original Prompt: A race car performing a drift turn on a track.

Video Type: Object

Camera Motion: 3

Object Motion: 3

Scene Complexity: 2


Target Prompt: A race car drifting on a track in a grainy, high-contrast black and white film style.

Editing Type: Style Change

Fantasy Level: 1

Input Video ControlVideo FateZero Pix2Viode Rerender A Video
Text2Video-Zero TokenFlow Tune-A-Video vid2vid-zero CCEdit (Ours)

Ablation Study


🌟 Appearance Branch 🌟


Original Prompt: A black swan swimming in a pond with lush greenery in the background.

Targer Prompt: A black swan swimming in a pond with lush greenery in the background, oil painting style.

Input Video w/o appearance branch w/ appearance branch

Original Prompt: A butterfly with black and orange wings perches on a plant amidst a field of golden grass.

Targer Prompt: A dragonfly with shimmering wings perches on a plant amidst a field of golden grass.

Input Video w/o appearance branch w/ appearance branch

Original Prompt: A camel walking in an enclosure with a wooden fence and greenery in the background.

Targer Prompt: A camel walking in an enclosure with a wooden fence and greenery in the background, Minecraft world style.

Input Video w/o appearance branch w/ appearance branch

Original Prompt: A stern-looking man in a sharp suit and tie.

Targer Prompt: A stern-looking man in a sharp suit and tie, in Traditional Sci-Fi Animation style.

Input Video w/o appearance branch w/ appearance branch

🌟 Anchor Prior 🌟


Original Prompt: A person walks in the filed.

Targer Prompt: A person walks in the filed, the Milky Way is in the sky, at night.

Input Video w/o anchor prior w/ anchor prior

🌟 Control Scale of Sturcture Branch 🌟


Original Prompt: A man with a backpack hikes on a rocky terrain, surrounded by tall, rugged mountains and scattered boulders.

Target Prompt: An astronaut with a jetpack floats above a Martian landscape, with red rocky terrains and tall, alien-like mountains in the backdrop.

Input Video Depth Maps Originla Center Frame Edited Center Frame
Scale=0.0 Scale=0.2 Scale=0.4
Scale=0.6 Scale=0.8 Scale=1.0

🌟 Control Scale of Appearance Branch 🌟


Original Prompt: A man with a backpack hikes on a rocky terrain, surrounded by tall, rugged mountains and scattered boulders.

Target Prompt: An astronaut with a jetpack floats above a Martian landscape, with red rocky terrains and tall, alien-like mountains in the backdrop.

Input Video Depth Maps Originla Center Frame Edited Center Frame
Scale=0.0 Scale=0.2 Scale=0.4
Scale=0.6 Scale=0.8 Scale=1.0

BibTeX

@article{feng2023ccedit,
      title={CCEdit: Creative and Controllable Video Editing via Diffusion Models},
      author={Feng, Ruoyu and Weng, Wenming and Wang, Yanhui and Yuan, Yuhui and Bao, Jianmin and Luo, Chong and Chen, Zhibo and Guo, Baining},
      journal={arXiv preprint arXiv:2309.16496},
      year={2023}
    }