Experiment Goal and Baseline Prompt
Experiment goal: Identify the two core variables that most affect quality outcomes in "crossover fan art," and determine which combination produces the strongest visual and narrative impact.
The baseline scenario for "crossover fan art": two characters from different IPs or fictional universes meet and interact in an ordinary real-world location. The commercial value of this content type lies in its "viral topic potential" — the crossover itself is the content, requiring no additional narrative explanation.
Baseline prompt:
[CHARACTER 1] and [CHARACTER 2] casually sitting together at a table
in a [LOCATION]. The atmosphere is relaxed and light-hearted, two
characters engaged in an amusing conversation over food and drinks.
Cinematic lighting, photorealistic environment, highly detailed
character designs, unified realistic lighting across both characters,
no visual style clash between them. Candid moment, warm ambient light.
Two core variables:
- Variable A: Character contrast type (determines "where the crossover tension comes from")
- Variable B: Scene casualness level (determines "the intensity of the contrast")
Variable A Testing: 3 Character Pairing Types
Test question: Which type of character pairing produces the strongest visual and narrative tension?
A1: Opposing Characters Within the Same Universe
Characters come from the same IP but represent opposing values or factions (heroes vs. villains within the same fictional universe).
Examples: Batman and Joker, Goku and Vegeta, Harry Potter and Draco Malfoy
Add to prompt: despite being sworn enemies, sharing a casual meal. Uneasy truce atmosphere, neither is in full combat mode but tension lingers in their body language
Test results:
- Character design fidelity: Highest (Same IP — AI has clear understanding of both characters' appearances)
- Narrative tension: High (The "ceasefire" premise carries strong dramatic irony)
- Viral potential: Medium-high (Requires IP familiarity to appreciate the irony)
A2: Cross-Universe Characters (Different IPs)
Characters come from completely separate fictional universes (different anime, different comic universes, different games).
Examples: Iron Man and Naruto, Gandalf and Yoda, Mario and Sonic
Add to prompt: from completely different universes meeting for the first time, a mixture of curiosity and mutual respect. Each character's clothing and design style remains authentic to their original IP
Test results:
- Character design fidelity: Medium (Different IPs often have different visual styles — AI needs to unify both under single lighting, occasional style disconnect)
- Narrative tension: Highest ("The meeting that could never happen" is the ultimate hook)
- Viral potential: Highest (Covers two fan bases simultaneously, stacking exposure effect)
When selecting characters for A2, prioritize characters with high visual recognition — characters with signature costumes, hair colors, or accessories. If AI has ambiguous visual understanding of a character, the output may look like "two ordinary people" rather than specific characters. Compensate by describing appearances explicitly in the prompt: Batman in his full dark grey armored suit with cowl and cape generates dramatically better character accuracy than simply Batman.
A3: Fictional Character × Real Person
A fictional character and a real historical figure or contemporary person in the same frame.
Examples: Einstein and Doraemon, Beethoven and SpongeBob
Add to prompt: the fictional character visiting the real world and meeting [REAL PERSON]. A sense of temporal displacement, the fictional character clearly out of their element in the real world setting
Test results:
- Character design fidelity: Low (Real persons' likenesses in AI depend on portrait rights scope; accuracy varies significantly)
- Narrative tension: Medium (Temporal/dimensional displacement requires more context to land)
- Viral potential: Medium (Depends heavily on audience recognition of the specific real person)
Variable A conclusion: A2 (cross-universe) is best for combined narrative tension and viral potential, but poses the highest character fidelity challenge for AI. A1 (same-universe rivals) is the most stable choice with the most consistent generation quality. Beginners should start with A1 and progress to A2 as they develop prompt experience.
Variable B Testing: 3 Scene Casualness Levels
Test question: How does scene casualness level affect the character contrast effect?
B1: Highly Everyday Scene (McDonald's / Starbucks tier)
The setting is an instantly recognizable everyday fast food restaurant or coffee chain.
Scene description in prompt: a busy McDonald's restaurant, red and yellow decor, plastic trays with burgers and fries, fluorescent overhead lighting, casual customers in background
Test results:
- Environment recognition speed: Highest (Global audiences recognize instantly, zero explanation needed)
- Character contrast intensity: Highest (The more mundane the setting, the more absurd the character identities feel within it)
- Generation stability: High (Fast food restaurant scenes exist abundantly in AI training data; environment details render reliably)
B2: Semi-Formal Scene (Independent café / pub)
The setting has personality but is still an ordinary real-world public space.
Scene description in prompt: a cozy independent coffee shop, wooden furniture, warm ambient lighting, coffee cups and pastries on the table, quiet background atmosphere
Test results:
- Environment recognition speed: Medium (Café atmosphere is recognizable but lacks brand signature)
- Character contrast intensity: Medium (More gentle; better for emphasizing "the quality of their conversation" over situational absurdity)
- Generation stability: High (Coffee shop scenes also have abundant training data)
B3: Extreme Contrast Scene (Character's own iconic setting, but with everyday activities)
The setting is within one character's original universe, but everyday activities have been introduced (e.g., a McDonald's that exists in Gotham City, or a cafeteria at Hogwarts).
Scene description in prompt: in a fast food restaurant that exists within [CHARACTER'S UNIVERSE], with subtle visual references to the original setting mixed with mundane fast food aesthetics
Test results:
- Environment recognition speed: High (fans only) (Requires deep IP familiarity to appreciate)
- Character contrast intensity: Medium-low (Casualness is partially neutralized by the in-universe world-building)
- Generation stability: Medium (AI needs to simultaneously process two semantic layers of the setting; higher failure rate)
Variable B conclusion: B1 (highly everyday) produces the strongest visual contrast effect while also maintaining the best generation stability. For virality-focused content, B1 is the optimal choice.
The "scene casualness level" is essentially adjusting one parameter: the cognitive gap between character identity and their current environment. The larger the gap, the stronger the humor and virality — Batman eating at McDonald's derives its absurdity precisely from the extreme contrast between "universe-level guardian" and "the most democratically accessible everyday location." Understanding this parameter helps you select the right casualness level: B1 for maximum viral impact, B2 for artistic quality and narrative depth, B3 for precise resonance within niche fan communities.
Cross-Comparison: Which Combination Is Best
Based on the two-variable test results, the optimal combinations:
| Goal | Best combination | Reasoning |
|---|---|---|
| Maximum viral potential | A2 (cross-universe) × B1 (everyday) | Highest tension × widest audience × fastest recognition |
| Most stable generation quality | A1 (same-universe rivals) × B1 (everyday) | High character fidelity × stable scene generation |
| Highest artistic quality | A2 (cross-universe) × B2 (café) | Narrative tension × warm atmosphere; suits art collections |
| Niche precision fan targeting | A3 (fictional×real) × B3 (in-universe everyday) | High barrier; resonates deeply with hardcore fans |
Recommended everyday configuration (complete prompt for A2 × B1):
[CHARACTER 1 full description] and [CHARACTER 2 full description]
casually sitting together at a McDonald's table. Plastic trays with
burgers and fries between them. Relaxed, light-hearted atmosphere —
two characters from completely different universes sharing a casual meal.
Cinematic lighting with warm overhead fluorescent tint. Both characters
rendered with high fidelity to their original designs. Unified realistic
lighting eliminates visual style clash. Candid photography feeling,
mid-conversation moment captured.
Generate 4-6 variations of the same prompt in nanobanana pro and select the one with the best character accuracy and most natural interaction. Crossover fan art is highly susceptible to generation randomness — the same prompt can produce dramatically different interaction states across runs. Batch generation followed by selection is more efficient than iterative prompt micro-adjustments.
Quick Reference Table
| Parameter | Recommended value | Effect | Avoid |
|---|---|---|---|
| Scene specificity | Known fast food chain (McDonald's/KFC) | Instant recognition, zero explanation cost | Obscure or unfamiliar locations |
| Interaction verb | engaged in conversation / sharing food |
Natural narrative feel | standing side by side (no interaction) |
| Atmosphere words | relaxed, light-hearted, candid moment |
Removes characters from combat/tension mode | dramatic, intense (returns to battle state) |
| Lighting unity | unified realistic lighting across both characters |
Eliminates style disconnect | Omitting → characters look composited separately |
| Camera quality | cinematic lighting, candid photography feeling |
Adds filmic realism | portrait mode (becomes ordinary portrait) |
| Food props | detailed burgers, fries and drinks |
Anchors scene reality; provides interaction props | No food → empty table; missing life quality |
Unexpected Findings
Unexpected finding 1: Micro-interaction details have outsized narrative impact
Testing found that adding one specific small interaction detail to the prompt dramatically improves image narrative quality — more effectively than adding extensive character description vocabulary.
Comparison:
- No detail:
Batman and Joker sitting together→ two characters sitting separately, no visual exchange - With detail:
Batman pushing his fries toward Joker without making eye contact→ AI generated a subtle action with internal tension; the entire image gains complete story quality from this single detail
Conclusion: Narrative micro-action words (pushing a drink toward the other, scrolling phone while the other talks, pointing at the menu together) are the single most effective parameter for increasing narrative density in crossover fan art. These work because they simultaneously communicate two things: what the character is currently doing (action), and the relational state between the two characters (direction and emotional tone of interaction). AI's understanding of these social micro-actions is surprisingly accurate — they appear abundantly in real social photography training data.
Unexpected finding 2: Background blur level affects character "belonging" quality
When the background (restaurant environment) renders sharply, characters paradoxically appear to be "inserted" as foreign objects into the real world. When the background blurs appropriately, character-environment integration actually improves.
Optimal background phrase: background slightly blurred with bokeh, keeping focus on the characters at the foreground table — identical logic to portrait photography's shallow depth of field, making characters naturally the visual focal point. Blurred backgrounds also mask AI's inconsistencies when generating complex interior environment details.
Unexpected finding 3: Food props have a narrative anchoring function
Food isn't just set decoration — it provides immediate narrative explanation for "what the characters are doing," making the strange premise of "two unusual characters sitting together" feel natural. Without food on the table, viewers may wonder "why are they here?" With food, everything resolves naturally: they're having a meal.
Push further: food type can function as character personality extension. One character with a massive bucket of fried chicken, the other with a single cup of black coffee — these two prop choices silently communicate the contrast between the characters. Prompt format: Character A with a large bucket of fried chicken, Character B with a single cup of black coffee, their food choices reflecting their contrasting personalities — letting props perform character expression without additional description.
FAQ
Why do the two characters often look like a Photoshop composite rather than sharing the same space?
The problem: AI applied different lighting models to each character, making them appear separately rendered and pasted together. The core fix phrase: unified single-source lighting illuminating both characters from the same angle, no separate lighting setup for each character — this tells AI both characters share one light source, eliminating independent lighting per character.
Can I generate a large gathering scene with 3 or more characters?
Yes, but quality control difficulty increases significantly. For 3-character scenes, explicitly specify seating arrangement in the prompt (three characters at a round table, Character A on the left, B in center, C on the right), assigning different focal positions to each character to prevent AI from randomly clustering characters with chaotic composition. Approximate success rates: 2-character scenes achieve satisfying results ~80% of the time; 3 characters ~50%; 4+ characters drops sharply. Multi-character scenes also struggle with gaze direction — AI handles eye contact between 2 characters well but in groups of 3+, one character often ends up staring into empty space.
What copyright considerations apply to crossover fan art?
Fan art for personal use, fan community sharing, and non-commercial purposes is generally protected under "Fair Use" or equivalent law in most jurisdictions. For commercial use (selling, sponsored brand content), more caution is required: avoid using registered trademark graphic elements (like McDonald's golden arches logo), don't directly replicate officially trademarked character designs (use descriptive text rather than registered trademark names). For personal creative content sharing, crossover fan art is a globally established practice in fan communities and has been for decades.