Seedance 2.0 Prompt Guide

The Seedance 2.0 series (hereinafter referred to as the Seedance 2.0 series) models natively support joint generation of audio and video, and have excellent semantic understanding and multi-modal interaction capabilities. This article introduces how to use prompts (Prompt) and related techniques of the Seedance 2.0 series model to help you use this model more efficiently to generate high-quality video works that meet your needs.

Note

All visual (images, videos) and audio materials presented in this guide are independently generated by the Seedance/Seedream series of visual generation models.

Basic formula

The Seedance 2.0 series models support simultaneous reference to multi-modal materials such as videos, images, and audio, and can accurately lock in character appearance, action effects, visual style, dubbing timbre and other characteristics, greatly reducing the threshold for writing prompts. Based on this advantage, we can guide the model through simple basic formulas, take advantage of the characteristics of multi-modal materials, and quickly generate videos that meet specific needs.

Reference raw videos can be subdivided into three categories of tasks: multimodal reference, editing videos, and extending videos. You can select the prompt base formula based on the task type.

Multimodal reference

Extract some elements (such as subject, style, scene, sound effects) from the material to generate a new video.

Applicable scenarios: action migration, subject reuse, atmosphere reference, etc.

Recommended sentence patterns:

Image reference: Refer to <Subject N> in <Image N> to generate...

Video Reference: Refer to <Action/Camera/Style/Sound Effect> in <Video N> to generate...

Audio Reference: Refer to the timbre in <Audio N> to generate...

Edit video

Make local or global modifications based on the original video. The parts not mentioned are left unchanged by default.

Applicable scenarios: partial replacement, main body erasure, attribute modification, etc.

Recommended sentence patterns:

Add element: clearly describe <element characteristics> + <occurrence time> + <appearance position>

Modify elements: Strictly edit <Video N>, and change <Original Features> to <New Features>

Delete elements: Point out the elements that need to be deleted. For elements that remain unchanged, emphasize them in the prompt for better performance.

Extend video

To continue the original video in the time dimension requires that the audio and video style, subject and narrative be consistent.

Applicable scenarios: continuation of plot, extended action, completion of clips, etc.

Recommended sentence patterns:

Extend video: Extend <Video N> forward/backward, generating...

Track completion: <Video 1> + <Transition image description> + continue with <Video 2> + <Transition image description> + continue with <Video 3>

Note

To edit/extend video tasks, directly use "<Video N>" to refer to the video. Do not use "Reference<Video N>" to avoid being misjudged as a reference task.

Combined tasks

The above three tasks also support combined use.

Applicable Scenarios: Refer to a certain material and edit another material.

Recommended sentence patterns:

Refer to [Reference Dimensions] of <Image/Video N>, strictly edit <Video X>, [Specific editing content]

Advanced formula

Seedance 2.0 is essentially a multi-modal AI director: it reads your text prompts, images, videos, and audios at the same time, and internally splits it into two dimensions: "spatial layer" (what's in the image) and "temporal layer" (how things change over time) to understand and generate the image.

Therefore, a good prompt is not a simple "copywriting description", but an "engineering instruction": who, in what scene, what action, how the camera moves, and in what time sequence it occurs, delivered to the spatial layer and the temporal layer respectively. The specific formula is as follows:

text

Advanced formula for prompts: accurate subject + action details + scene environment + light and shadow tone + camera movement + visual style + image quality + constraints

To put it simply, first lock in "who" is "doing", then explain "where" and "what atmosphere", then tell the model "how to shoot", and finally use style, image quality and constraints to tighten the results. Each element is broken down in detail below.

1 Define the subject

In actual reference material, there are often multiple subjects in a image. In order to accurately reference specific objects in the material, a clear subject definition must be carried out. The subject can be characters, props, scenes, etc.

Recommended sentence pattern: Define [Subject Core Features] in <Image/Video N> as <Subject N>

Core feature requirements: Use 2-3 clear and stable static features (such as clothing, hairstyle, appearance, category) to describe to ensure unique identification

Example:

Define the woman wearing a red dress and straw hat in Image 1 as Subject 1

Define the woman wearing a red dress and straw hat in Image 1 as Zhang Hong

Note

Every time the subject is involved, the reference must be clear and avoid omission. Supports the following two usages:

For simple scenes where the subject is not defined, <Subject N>@<Image N> must be used every time the subject is mentioned to emphasize the binding relationship between the subject and the material. For example: Zhang San@Image 1.

For scenarios where the subject has been defined in advance, the same tag should be used every time the subject is mentioned. For example: Define the tall man in Video 1 as Police, and define the other short man as thief. Subsequent descriptions involve the tall man's continued use of "cop" referents and the shorter man's continued use of "thief" referents.

When using the asset library (Asset ID), you still need to use <Image/Video N> to refer to the subject. Because the model cannot directly associate Asset ID with reference content, Asset ID cannot be used directly to replace <Image/Video N>.

The description should be as concise as possible, avoid redundancy, and avoid semantic conflicts (such as contradictory characteristics of the same subject).

It is recommended that spatial relationships be expressed through reference drawings first to reduce complex text descriptions.

2 Use storyboard timing

The internal modeling of the model is spatially and temporally decoupled. Therefore, the most ideal form of a prompt for a complex video is a timeline storyboard: split the video into several storyboards, and dynamically describe each storyboard in the order of events: who + where + what + how the camera moves.

Practical Advice

Using shot order, write a simple "shot 1 / shot 2 / shot 3" storyboard for each video, and then merge it into a complete prompt.

Negative case: "The scene of a man running nervously on the street is very cinematic."

Positive Case:

Shot 1: Side shot in a street, the man starts running slowly, with a sense of rapid breathing.

Shot 2: A man knocks over a fruit stand. The camera pans quickly and shows a frightened close-up of the man.

Shot 3: The man climbed over the low wall and disappeared, and the camera slowly zoomed out to freeze on the empty street.

Specific Rules

Use labels such as Shot 1, Shot 2, and Shot 3 to organize the content in the order of events (first, last). There is no mandatory limit on the duration of each segment, and the priority is to allow the model to naturally generate a rhythm according to the plot.

Description

The model's support for precise time (such as 0–3 seconds) is unstable. Forcing a time limit may lead to abnormal generation results.

Each lens recommendation is organized according to the following logic:

1.Camera movement or lens switching methods: such as "panorama slowly zooms in", "fixed camera position", "lens cut to..." etc.

2.Main Actions and Expressions: Describe the key actions and expression changes of the core character/object.

3.Position or spatial change: Describe the scene, location or spatial relationship where the subject is located.

4.Audio information: Describes the sound effects, vocals or background music of the corresponding shot.

3 Action description requirements

limb refinement + degree quantification

The movements need to be specific to the hands, legs, head, shoulders and back and other body parts, and also add amplitude, speed, intensity descriptions;

Examples: Raise your hands slowly, turn your head quickly, push hard on the ground, lower your head slightly.

Prioritize low, slow and continuous small movements

Prioritize the use of slow, gentle, and consistent subtle movements, and try to avoid high-explosive, large-dynamic movements such as running, jumping, and violent rolling.

Examples: Walk slowly, raise your hands slightly, lower your head slightly, and sit down.

Supplementary action transition connection

Describe the inertia and connection between the preceding and following movements to ensure that the movements in the image are coherent and natural.

Example: Use the inertia of turning to raise your hand, and naturally transition from the pause state to raising your hand.

Concrete external expression of emotions

Use concrete physical details to express emotions instead of abstract words like “very sad” or “very angry.”

See the table below for specific examples:

Abstract emotions	Externalized into actions and details
Sad	Bowed, shoulders trembling slightly, eyes red, fingers unconsciously clenching the corners of clothes, tears rolling in eyes but not falling
Joy	The corners of the mouth can’t help but rise, the eyebrows stretch, the steps become brisk, I hum a tune subconsciously, and I can’t help but turn around in circles
Nervous/anxiety	Frequently looking at the watch, tapping fingers on the table, shortness of breath, averting eyes, biting nails unconsciously
Angry	Clenched fists, tense jawline, chest rising and falling violently, eyes as sharp as knives, words squeezed out from between teeth
Relieved	He breathed a long sigh of relief, relaxed his tight shoulders completely, showed a long-lost faint smile on his face, and looked up into the distance

4 Camera Movement Wording

The model has a strong understanding of camera movements and can simply use standard camera terminology, such as "medium shot, close-up, panorama, slow pan, smooth traverse, fixed shot". For more information, please see Camera Language.

Note

Try to specify only one camera movement method in a shot. Do not require push, pull and pan at the same time, which will increase the instability of the image.

5 Image quality, style and constraints

Image quality, style and constraints are the key to controlling the video generation effect. They can delimit the creative boundaries for the model, unify the image quality and artistic tonality, and avoid image defects and random deviations. They are necessary configurations to ensure that the final film effect is stable and compliant.

1 Image quality

Define the image clarity, detail texture, and light and shadow texture to improve the basic image quality of the finished film.

Example: High definition, rich details, movie texture, natural colors, soft light and shadow

2 Style

Set the overall art style and visual tone to unify the artistic atmosphere of the image.

Example: Cyberpunk cold blue-purple tone, retro film, Japanese freshness

3 Constraints

Constraint words are very important. They can effectively avoid image defects, deformities, collapse, and unreasonable elements, and constrain the generation boundaries and stability.

Commonly used constraint word templates:

Avoid generating subtitles: "Keep no subtitles", "Avoid generating any text or subtitles"

Avoid generating Logo: "Don't generate Logo"

Avoid generating watermark: "Do not generate watermark"

6 Practical cases

Show how to write prompts using the advanced formulas and elements described above.

Material preparation:

@Image 1: Half-length photo of the heroine

@Image 2: Dormitory scene reference image

@Video 1: Reference for indoor dialogue camera movement (middle shot push and pull or slight panning)

@Audio 1: Indoor ambient sound or light music

Prompt:

The girl in @Image 1 is used as the protagonist, @Image 2 is used as a reference for the style of the dormitory scene, and the camera movement method of @Video 1 is referred to.

Shot 1: In the evening, girl @Image 1 walks briskly to the door of dormitory @Image 2. The medium shot of the camera is steady, and warm yellow sunlight shines into the corridor from the window. She pauses at the door, takes a deep breath, and her expression is slightly nervous.

Shot 2: Girl @Image 1 Opens the door and walks into the dormitory. The camera cuts to a mid-shot indoors. The roommates are arranging their books and looking up at her. One of them smiles and asks {How did you go in the exam? Did you pass?}. The camera slowly switches to a close-up of the body of several people.

Shot 3: Girl @Image 1 first lowered her head to show a lonely expression, and the camera gave her a close-up view. Then she raised her head and couldn't hold back her smile, laughing loudly and saying {I lied to you}. The roommates chased her and started to fight. The camera slowly zoomed out, freezing a panoramic image of laughter and laughter in the dormitory.

The whole image is in the style of a high-definition film documentary, with warm tones and soft light and shadow; the characters' faces are stable and not deformed, and the movements are natural and smooth, with no stuttering or flickering; the ambient sound effects are naturally integrated with @Audio 1.

Other tips

Text generation

Seedance 2.0 series models support the generation of commonly used text. The model can automatically match the appropriate style and color according to the situation, and also supports specifying the color, style, appearance mode, appearance time, and appearance position of the text in the prompt. When writing, please give priority to using commonly used words and avoid rare words and special symbols to ensure the best presentation effect. Currently, it supports slogans, subtitles, bubbles and other scenes. For specific writing methods and cases, please see Text Generation.

Video extension vs segmented splicing

Continuous long shot (video extension): Suitable for "literary drama" in a single scene, such as long dialogue, emotional progression, and single-path movement to achieve an immersive and coherent one-shot effect.

Scene/action transition (segmented splicing): Suitable for plot transitions or complex and fast "martial arts", such as chases, fights, montages, etc., by independently generating clips and then editing and combining them to ensure rhythm and visual impact.

In actual production, the two methods are usually combined. For example, first using extension to generate coherent dialogue, and then splicing empty shots or transition clips to take into account immersion and rhythm changes.

Material configuration strategy

Materials are usually divided into four "functional roles":

1.Character anchoring: lock the appearance of the character

2.Set the tone of the scene: lock the environment and style

3.Camera movement reference: locking the camera language and action rhythm

4.Rhythm and atmosphere: Use audio to control mood and timbre

Recommended configuration (4-5 materials in total): 1-2 character images (face close-up/full body) + 1 scene image + 1 camera movement video + 1 audio clip.

Description

It is not recommended to use the maximum material limit. Too many materials will make it difficult for the model to determine the priority of features, and it is prone to problems such as style conflicts, blurred subject recognition, and generated effects that deviate from expectations.

Notes

Language specification

The language of the lines must be unified and avoid mixing Chinese and English (except for proper nouns).

Special character specifications

Reasonable use of symbols in prompts helps the model accurately understand different information types:

Information type	Symbol	Example
Music	()	(fast-paced rock music playing in the background)
Sound effects	<>	<The sound of dogs barking in the distance>
Lines	{}	{Hello, world}. If the lines are in a minor language (non-Chinese and English), the language must be marked, for example: {こんにちは} is said in Japanese.
Subtitles	[]	[Chapter 1: Departure]

FAQ

Character ID drift

Typical phenomenon

The generated character image is inconsistent with the reference image, or a "face change" (ID drift) occurs in the middle of the video, causing the character in the video to have a face-to-face celebrity and be blocked by review.

Root cause analysis

Insufficient validity of face reference images

Mixed use of reference images: Combine the face reference image with the whole body/half body pose image, clothing reference image, detail image, etc. in the same image and provide it to the model.

The proportion of the face is too small: In the mixed reference image, the proportion of the face area in the entire image is too small. The model does not have enough weight when extracting facial features and is easily interfered by the background or other elements.

Solution

Strengthen the independence and weight of face reference:

1.Prepare a close-up image of the face: In addition to the original full-body photo, prepare an additional close-up photo of the face that only includes the character's head (for the headshot, only the face should be kept, preferably without expression, and try to minimize distracting elements such as shoulders, neck, background, etc.).

2.Clear definition of the subject in the prompts: <Subject 1>’s facial features refer to image 1 (head shot), and make-up reference image 2 (full body shot).

3.Important materials in front: The more materials that require accurate reference, the further forward in the prompt.

Note

For character reference, just use a headshot + full body shot. It is not recommended to use multiple views of characters. Multi-view material contains different angles of the same character, and the model easily identifies it as multiple different subjects, which in turn exacerbates the ID drift problem.

Before optimization	After optimization
The characters in the video "changed their faces" midway, bumping into the faces of celebrities	The face remains consistent with the reference image throughout

Video contains subtitles

Typical phenomenon

The prompt does not require subtitles to be generated, and the generated video contains subtitles.

Solution

Description

Currently, it is not possible to 100% avoid generating subtitles directly. We can only use the following methods to reduce the probability of their occurrence and improve the success rate of card drawing.

1.Add clear constraint instructions to the prompts: "Keep no subtitles", "Avoid generating any text or subtitles".

2.If the text contained in the reference image/video is not necessary information, it is recommended to remove the text through tools (such as using the image/video editing capabilities of the Seedream/Seedance model), and then use text-free material as input.

3.If the business allows, the horizontal screen size is preferred to generate videos (the probability of generating subtitles in the horizontal screen is significantly lower than that in the vertical screen). Later, the video can be cropped to the vertical screen through editing software.

There is a logo/watermark in the video

Typical phenomenon

There is no mention of watermark-related content in the prompt, and the generated video contains logos/watermarks of other video platforms.

Solution

Add clear constraint instructions to the prompts: "Do not generate watermarks", "Do not generate Logo".

Style Drift

Typical phenomenon

It is expected to generate a 2D or 3D animation style, but the style of the input reference image is relatively realistic, and the video style is not emphasized in the prompt, and the generated video may drift into a real-life style.

Solution

Add clear style constraints to the prompts, such as "2D Japanese comic style" and "3D Chinese style comic". If you need more precise style control, it is recommended to change the reference image to the target style before producing the video.

Before optimization	After optimization
Xianxia style drifts into real-person style	Maintain the 3D Chinese comic CG fairy style

Extend the jump at the video junction

Typical phenomenon

After using the video extension function to generate a new video, splicing the new video with the original video may cause screen jumps and rollback problems at the connection point.

Solution

It is currently recommended to repair and align key frames through post-production editing, and subsequent fundamental optimization will be based on model iteration.

1.Import the videos to be spliced into Cutting or other professional video editing software.

2.At the first connection point, delete 6 frames from the end of the previous video.

3.At the same time, delete 1 frame from the beginning of the last video.

4.Repeat the above operation for all splicing points.

5.Export and check the smoothness of the spliced video.

Description

After the video frames are aligned, there may still be slight jumps. It is recommended that when continuing to generate the video, it ends with the moment of the transition cut and the next video starts with the new scene after the ending cut.

Before optimization	After optimization
The image jumps momentarily or The content rolls back at the connection point (5s, 20s)	The connection is smoother

Twins problem

Typical phenomenon

In a scenario where there are many characters in the image and three views of the characters are passed in as reference material, two identical characters are likely to appear in the same image of the generated video.

Root cause analysis

The definition of the character subject in the prompt is not clear, and the model cannot accurately distinguish different characters;

When three views/multiple views of characters are used as reference material, it is easy to cause confusion in model character recognition, thus generating duplicate characters of the same style.

Solution

Description

Currently, it is not possible to 100% avoid the twin problem directly. We can only reduce the probability of its occurrence and improve the success rate of card drawing through the following methods.

1.Clear the relationship between characters and subjects

Clearly define each character in the prompts and clarify the correspondence between the character and the reference image. It is recommended to mark the corresponding reference image after the character name and keep the format consistent.

Example: Zhang San (corresponding to image 1) throws the green passbook to Li Si (corresponding to image 2) who is standing.

2.Add global constraint directive

Add a fixed constraint at the end of the prompt: Characters with the same appearance, clothing, and accessories are not allowed to appear throughout the video. It is forbidden to generate clones or twins of the same style. Only a single corresponding character is retained in the same image, and no duplicate characters are allowed.

3.Optimize reference materials

For character reference images, it is preferred to use independent photos of single people. It is not recommended to use three-view or multi-view materials.

4.Simplified optimization prompts

Please do not use the complete script directly as a prompt. Excessive redundancy in copywriting content can easily cause confusion in model understanding. It is necessary to streamline irrelevant expressions and ensure clear and focused instructions.

Before optimization	After optimization
"Twins" appear on the screen at the 8th second

Video quality deteriorates when extended

Typical phenomenon

When the video generated by the model is used as input material for extended recording, the image quality will deteriorate. Repeated writing will superimpose the deterioration effect, especially in the character's face area, which is prone to mottled color patches.

Solution

Currently, image quality degradation can be alleviated through the following methods, and fundamental optimization will be carried out based on model iteration in the future:

1.Convert the original video to a white model video through Seedance 2.0, and then use it as input material for continuation.

Reference tips: Convert the video to a white 3D model, and unify the characters into a pure white 3D model, with no color, no texture, no shadows, a pure white background, stable structure, and smooth movement.

2.Prioritize the use of high-definition images as reference materials.

3.Reasonably control the number of video continuations to avoid multiple superimposed continuations.

Before optimization	After optimization
Directly continue writing the model product	Convert the model product into a white model video and then continue writing

The special effects are not as expected

Typical phenomenon

When describing specific special effects through prompt text, the special effects may not match expectations. For example, if the prompt specifies "the number "2999" appears with a countdown animation," the actual generated numerical scrolling special effects will jump out of order, which does not comply with the standard countdown logic.

Solution

It is recommended to use reference video to define special effects: use the target special effects video as a reference material to input into the model, so that the model can accurately understand the special effects form and motion logic, and generate effects that are more in line with expectations. For example: the number "2999" appears in video 1.

Before optimization	After optimization
The digital scrolling effect has an obvious sense of disordered jumps	After adding the correct digital scrolling special effects video, the special effects results are as expected Special effects animation

Too many reference characters

Typical phenomenon

When the number of reference characters exceeds 4, the stability of the model output decreases, and the generated videos may have problems such as inconsistent number of characters (such as few people, many people) or repeated characters.

Solution

Currently, it can be alleviated through the following methods, and subsequent fundamental optimization will be based on model iteration:

1.Generate images step by step: Group the characters into groups and ensure that the number of characters in each group of generated images does not exceed 4. For example, 6 people can be divided into 2 groups of 3 people to generate images separately.

2.Image to video: Use the multiple grouped images generated in the first step as reference material to generate the final video.

Before optimization	After optimization
Input 8 reference characters, the output video has 9 people	After generating 2 images of 8 characters, use Tusheng Video

There is noise at the end of the video

Typical phenomenon

When a video contains narration, sudden clicks and truncation noises are likely to occur at the end of the video.

Solution

Regenerate the video, or use editing tools such as clipping to perform audio fade-out processing on the ending audio track through Volume Envelope to eliminate truncation noise.

Specific steps (using clipping):

1.Import the generated video into the cutout.

2.Select the video track on the timeline and click Audio in the toolbar.

3.Select the Volume Envelope function (in some versions found in the Basic/Adjustment menu) and select the key point. At this point, a line representing the volume and key points will appear on the audio track.

4.Near the end of the video, pull down the end volume key to a volume of 0, creating a downward slope.

5.Preview and confirm that the ending audio fades out naturally without sudden cuts or noise.

Before optimization	After optimization
Noise at the end	No noise at the end

Chinese pronunciation is not accurate

Typical phenomenon

The model is prone to mispronunciation of polyphonic characters, rare characters, and characters with similar shapes.

Solution

You can replace easily mispronounced words with common homophones that have consistent pronunciation to avoid pronunciation deviations and restore the expected audio effect. Note that this solution is only an optimization method and cannot completely avoid all pronunciation problems.

Example: The prompt "Chilong Mountain" can be rewritten as the homophone "Chilong Mountain".

Before optimization	After optimization
The pronunciation of "Chi" in "Chilong Mountain" is incorrect	Correct pronunciation after changing to "Chi Longshan"

The tone reference is inaccurate

Typical phenomenon

When using reference audio to specify the timbre, the audio timbre of the final generated video deviates significantly from the reference timbre.

Solution

1.To add detailed description of timbre characteristics in the prompts, please refer to Seedance-1.5-pro Prompt Guide.

2.Keeping the tone and expression style of the video lines similar to that of the reference audio will help improve the restoration and stability of the timbre.

Before optimization	After optimization
The timbre does not match the input audio audio.mp3 Input audio	After the timbre feature description is added to the prompt, the timbre matching degree is significantly improved "Using @Audio 1 to describe the timbre of a middle-aged male voice with low thickness, warmth and fine graininess"

Appendix: Prompt Examples

Shows examples of prompts using Seedance 2.0 series models in different scenarios to help you more accurately implement functions such as multi-modal reference referential control and text generation. For more excellent cases, please refer to the Console Experience Center Seedance 2.0 template library.

Text generation

Slogan

Prompt reference template:

Plain

"Text content" + "Appearance time" + "Appearance position" + "Appearance method", "Text characteristics (color, style)"

Description

Seedance 2.0 can match the appropriate text style according to the situation. If the requirements for text performance are stricter, you can refer to the Multiple Image Reference>Logo Reference below.

Reference case:

[Result]

[Prompt]

In a hand-drawn cartoon style, three people are sitting around eating fried chicken in Image 1. The atmosphere is friendly and joyful. Then the image gradually blurs and the text "Happiness is in Seedance" is displayed in the middle of the image.

[Reference Material]

▲ Image 1

Subtitles

Prompt reference template:

Plain

Subtitles appear at the bottom of the screen, and the subtitle content is "...". The subtitles must be completely synchronized with the audio rhythm.

Reference case:

[Result]

[Reference Material]

[Prompt]

Generate videos with voiceovers. A deep, calm male voice says, "Our world is but a brief moment in the grandeur of the universe. Yet within it, life thrives regardless." The scene should slowly transition from night to dawn, with the stars fading away and the sun rising behind the mountains. Subtitles appear at the bottom of the screen according to the lines.

[Result]

[Reference Material]

[Prompt]

The two people in the image were chatting in the office. The woman spoke first. She said, "Every time you get stuck, don't you enjoy this just-right feeling?" The man responded with a smile: "I have my own rhythm." When the character spoke, the dialogue was casual and natural, and the corresponding lines of subtitles appeared at the bottom of the screen.

Bubble lines

Prompt reference template:

Plain

The "character" said: "...", bubbles appeared around the character when he spoke, and the lines were written in the bubbles.

Reference case:

[Result]

[Reference Material]

[Prompt]

In Image 1, the two people were running in the school playground in sportswear. The girl looked at the boy and said with a confident smile: "We can definitely do it!" The camera cut to a close shot of the boy, and he hesitantly replied: "Are you sure?" Bubbles appear around the speaking character, and inside the bubbles are the corresponding lines.

[Result]

[Reference Material]

[Prompt]

Referring to the image of the girl in Image 1 and Image 2, the girl picked one in a strawberry garden, took a bite, and said with a smile: "This is the real deal!" A bubble appeared around the girl, with lines written inside.

Image reference

The Seedance 2.0 series models support not only multi-view reference of the subject, but also multi-image reference such as scene images and storyboards.

During use, if there are requirements for the order of images, they should be uploaded in order. Image 1, Image 2...Image n can be used in the prompts for accurate reference.

Subject multi-angle view reference

Just refer to a clear reference object. The instructions that the model can respond to include but are not limited to the following examples.

Product:

3C Digital

[Result]

[Reference Material]

[Prompt]

Extract the cameras of Image 1, Image 2, and Image 3, change the background to white, put the camera on a white table, focus the camera in a close-up, and then slowly rotate the camera as the subject to clearly show the front, side and back of the camera.

Home Items

[Result]

[Reference Material]

[Prompt]

The background is a warm-toned home scene. The thermos cup in the reference image is shown in the middle shot. The camera zooms in smoothly to a close-up view of the thermos cup. A hand from outside the camera naturally enters the frame and gently grasps the cup body to pick up the thermos cup. The camera follows the camera and shows the slight rotation of the hand.

Character:

[Result]

[Prompt]

Refer to the image of the woman in Image 1, Image 2, and Image 3 to generate a image of her eating cake in a coffee shop.

[Reference Material]

Multiple image reference

Logo reference

[Result]

[Reference Material]

[Prompt]

The background is a neon-lit futuristic urban aerial corridor, with aircraft and holographic advertisements intertwined. Referring to the girl in Image 2, the girl is first shown flying silver floating lights with holographic projections in the middle shot, and then the camera zooms out to show the floating lights all over the sky. The image gradually blurs, and then the logo of Image 1 appears. The overall style is a 3D cyberpunk science fiction animation style.

Multi-subject Reference

[Result]

[Reference Material]

[Prompt]

Refer to the cat and dog in the image. In a warm apartment, the dog is lying down eating dog food. The cat comes over and touches the dog with its paw. The dog stops eating after seeing the cat, and the cat snuggles next to the dog. The image adopts warm colors.

Multi-element reference

[Result]

[Reference Material]

[Prompt]

The scene is set in the restaurant in Image 4, where people are coming and going. The girl in Image 1 is wearing the costume in Image 2 and is sorting the items on the counter. The boy in Image 3 is a customer. He walked up and wanted to ask the girl for her contact information. The logo in Image 5 is always displayed in the lower right corner of the screen.

Multiple grid storyboard reference

[Result]

[Reference Material]

[Prompt]

Refer to the storyboards in the image to generate intense fighting scenes. Each storyboard composition in the image should appear in order, and then the two people fight fiercely.

Storyboard reference

[Result]

[Reference Material]

[Prompt]

Refer to the storyboard composition in Image 3. The girl is waiting for her father to cook. She said: "아빠, 배고파요! 밥 다 됐어요?". The image of the girl refers to Image 1. Then the camera panned to the right and switched to the image and composition of "Image 4". The father's image was referred to "Image 2". The father answered her: "거의 다 됐어, 조금만 기다려!", and then the camera switched back to a close-up of his daughter's slightly disappointed facial expression. She said: "아직 멀었어요? 맛있는 냄새 나는데. .", and then switched to a close-up of his father's face, and he said: "이제 진짜 금방이야. 씻고와!”

Video reference

Seedance 2.0 series models support video reference, and you only need to clearly refer to the generated content and reference object when using it.

During use, if there are requirements for the order of videos, they should be uploaded in order. Video 1, Video 2...Video n can be used in the prompts for accurate reference.

Action reference

Film and TV

[Result]

[Reference Material]

▲ Video 1

[Prompt]

Referring to the character movements and camera language of Video 1, the fighting scenes of Image 2 and Image 1 are generated. Image 2 is the character on the left, and Image 1 is the character on the right. There is intense background music.

marketing

[Result]

[Reference Material]

▲ Video 1

[Prompt]

Referring to the running form of the horse in Video 1, a golden horse is generated running on the grassland, and then its gorgeous running posture is frozen and turned into a horse-shaped gold pendant.

Camera movement reference

[Result]

[Prompt]

Referring to the camera movement of Video 1, make a concept video of the science and technology park. Taking the high-rise building in Image 1 as the visual center, and swooping down from the first perspective, it reflects the sense of science and technology of the park in Image 1.

[Reference Material]

▲ Video 1

▲ Image 1

Special effects reference

Film and TV

[Result]

[Reference Material]

▲ Video 1

▲ Image 1

[Prompt]

Referring to the golden particle special effects in Video 1, let the character in Image 2 play the flute while being surrounded by the same particle effects.

Gameplay Special Effects

[Result]

[Reference Material]

▲ Video 1

▲ Image 1

[Prompt]

Referring to the special effects of Video 1, let the girl in Image 1 grow the same wings, and the wings will be generated in the same trajectory.

Video Editing

The Seedance 2.0 series models support video editing, adding, deleting or modifying elements, extending videos forward or backward, and track completion.

During use, if there are requirements for the order of videos, they should be uploaded in order. Video 1, Video 2...Video n can be used in the prompts for accurate reference.

Add, delete or modify elements

Add elements

[Result]

[Reference Material]

▲ Video 1

[Prompt]

Add fried chicken, pizza and other snacks to the table in Video 1.

Delete element

[Result]

[Reference Material]

▲ Video 1

[Prompt]

Clear away other parts and tools on the desktop in Video 1 and keep the desktop neat and clean. Only what they have on the desktop is in their hands.

Modify element

[Result]

[Reference Material]

▲ Video 1

▲ Image 1

[Prompt]

Replace the perfume in Video 1 with the cream in Image 1, and the movements and camera movements remain unchanged.

Video extension

Note

The model will automatically intercept the connecting parts for synthesis, input the original video clips, and will not generate them repeatedly.

Extend backwards

[Result]

[Reference Material]

▲ Video 1

[Prompt]

After generating the content of Video 1, the two late men ran towards them, and the five people finally met and had a friendly chat.

Extend forward

[Result]

[Reference Material]

▲ Video 1

[Prompt]

Extend Video 1 forward to give the man in white an over-the-shoulder shot. The man in white says: "It's not that bad. You're just stressed. Everyone goes through this, you just need to keep going."

Track completion

Description

Seedance 2.0 series models support up to 3 video inputs, and the total duration must not exceed 15 seconds.

When generated, the connecting part of the first and last videos will be automatically intercepted, and only the necessary fragments will be retained for synthesis.

Reference case:

[Result]

[Prompt]

Video 1, the moment the leaves fall to the ground, a golden particle effect is triggered, and a gust of wind blows, continue with Video 2 .

[Reference Material]

▲ Video 1

▲ Video 2