You waited in line for an hour, finally reached the perfect viewpoint—and the moment you pressed the shutter, three tourists with selfie sticks walked into your frame. Back at the hotel, every single photo has strangers' heads scattered across the background.
Every travel photographer knows this pain. The good news: a well-crafted AI prompt can "evaporate" those people in 30 seconds, filling the background so naturally it looks like they were never there.
This guide doesn't just give you a prompt to copy—it explains why every phrase matters and how to adapt it when things go wrong.
The Result: Before vs After AI Tourist Removal

The image above shows the processed result: a plaza that was originally packed with tourists is now clean and empty. Notice the continuity of the floor tile joints—that's the key detail that separates good AI fills from obvious edits.
Why "Remove the People" Always Fails
Most people start with something like this:
Remove the people from this photo.
The result? Either the people become ghostly semi-transparent shapes, or the entire background looks like someone took an eraser to it, leaving blurry patches of color.

The image above shows a classic "problem photo": a plaza packed with selfie-taking tourists. This level of crowd density is exactly what makes manual editing impractical.
What's missing?
This prompt lacks 4 critical layers of information:
| Missing Layer | Consequence | Example |
|---|---|---|
| Who's a "tourist" | AI may remove your main subject too | Your friend disappears along with strangers |
| What to fill with | AI uses blurry color patches | Obvious smear marks in the background |
| Lighting constraints | Fill area has wrong light direction | Ground shadows suddenly break or change angle |
| Perspective constraints | Depth proportions get distorted | Floor tile perspective lines suddenly bend |
This is why we need a structured prompt—each phrase acts as one layer of constraint.
Full Prompt Breakdown: 8 Phrases, Each with a Technical Purpose
Here's the battle-tested prompt template:
Intelligently detect and remove tourists from the background of this photo. Keep the main subject intact while seamlessly replacing removed people with natural background continuation. Maintain consistent lighting and perspective.
Three sentences, but they contain 8 functional phrases. Let's break them down:
"Intelligently detect" — Why You Can't Skip "Intelligently"
detect tells the AI to first identify, then act—rather than blindly processing the entire image. Adding intelligently prompts the AI to use semantic understanding (not just color difference) to distinguish people from background.
Substitution experiment:
- Remove
intelligently→ AI may misidentify statues or people in posters as tourists - Replace with
automatically→ Similar results, but weaker at complex scenes (e.g., museum with both real visitors and painted figures) - Replace with
carefully→ More conservative processing, may miss people in corners
"from the background" — Why Not "from this image"
from the background defines the processing zone—only remove people in the background, leaving the foreground subject untouched. Writing from this image makes the AI want to remove all people.
tourists is more precise than people. The word people is generic and may cause the AI to erase street vendors or construction workers (scene elements you might want to keep). tourists implies "people who don't belong to this scene."
"Keep the main subject intact" — The Protection Boundary
This is the most critical constraint in the entire prompt. It tells the AI: there's a "protagonist" in this image, and no matter what happens to the background, every pixel of the protagonist stays untouched.
Without this phrase? When removing tourists standing close to your subject, the AI often "eats" the subject's edges too—like removing a chunk of your friend's shoulder.
"seamlessly replacing" — Transition Method
seamlessly demands that the AI apply a gradient transition at processing boundaries, not a hard cutout. This single word directly controls how natural the edges look after removal.
"natural background continuation" — The Fill Strategy
This is arguably the most technically important phrase. continuation doesn't mean "fill with a similar color"—it means the AI must extend the background's texture and structure. If the background is a brick wall, the mortar lines must continue logically. If it's grass, the density and direction must stay consistent.
Substitution experiment:
- Replace with
fill with similar colors→ Fill area becomes smooth color patches - Replace with
reconstruct the background→ Decent results, but sometimes the AI "imagines" architectural elements that don't exist in the original - Keep
natural background continuation→ Most stable, becausecontinuationforces the AI to reference surrounding pixels
"consistent lighting and perspective" — The Realism Safety Net
The final phrase is quality insurance. consistent lighting prevents the fill area from contradicting the original's brightness (e.g., original is warm sunset light, but the fill becomes harsh midday white). perspective prevents floor perspective lines from breaking after removal.
Quick Reference: 8 Phrases and Their Functions
| Phrase | Function | What Breaks Without It |
|---|---|---|
| Intelligently detect | Detection strategy | Misidentifies non-tourist objects |
| remove tourists | Operation command | — |
| from the background | Scope limiter | Foreground subject gets processed |
| Keep the main subject intact | Protection constraint | Subject edges get eroded |
| seamlessly replacing | Transition method | Hard edges at boundaries |
| natural background continuation | Fill strategy | Color patches or artifacts |
| consistent lighting | Light constraint | Brightness mismatch in fills |
| perspective | Spatial constraint | Floor perspective breaks |
3 Scene-Specific Prompt Adjustments
The base prompt doesn't handle all situations. Here are customized strategies for three common travel photography scenarios:
Scene 1: Dense Crowds (Covering 30%+ of the Frame)
Typical: The Great Wall, under the Eiffel Tower, Forbidden City entrance.
When tourists cover a large percentage of the frame, the AI needs to "rebuild" extensive background areas—the base prompt alone tends to produce large fill artifacts.
Adjustment: Append to the base prompt:
The crowd covers a large area. Prioritize reconstructing architectural
details and ground textures. Use surrounding visible areas as reference
for the reconstruction.
Scene 2: Sparse Passersby (1-5 People Scattered in Background)
Typical: distant beach, park path, museum gallery.
Easiest scenario, but also where "over-processing" happens—the AI might remove ground shadows that belong to the environment, creating a floating effect.
Adjustment: Append precision constraints:
Only 2-3 people need to be removed. Preserve all ground shadows and
reflections that belong to the environment, not to the removed people.
Scene 3: Partial Occlusion (Hardest Case)
Typical: someone walking between you and the camera, partially blocking your subject.
Here the AI must not only remove the passerby but also reconstruct the occluded parts of your subject.
Adjustment: Append:
One person partially occludes the main subject. Remove that person and
inpaint the occluded areas of the main subject based on visible body
proportions and clothing patterns.
Success rate for this scenario is roughly 70%. If occlusion exceeds 40% of the subject, consider using a different photo—AI can't reliably reconstruct large missing body areas yet. If you're interested in AI edge handling for portraits, the techniques in our frosted silhouette photography guide are also worth exploring.

The image above shows the same Roman arch before and after processing. On the left you can see crowds of tourists, street vendors, and tour buses; on the right, the plaza is clear with the stone pavement and architectural details fully preserved. This is the result of using the "dense crowd" scenario prompt with the architectural reconstruction additions.
From "Passable" to "Undetectable" — 5 Fine-Tuning Tricks
The base prompt gets you to 60-point results. These 5 tweaks push it to 90:
1. Specify Light Source Direction
The main light source comes from the upper left.
This ensures fill-area shadows match the original photo's direction exactly.
2. Describe Background Materials
The background contains marble flooring with gray veins and a sandstone wall.
The more specific your material description, the more accurate the fill. "Ground" is vague; "gray-veined marble floor" is precise.
3. Use "photorealistic" Not "realistic"
photorealistic triggers photo-grade realism. realistic might be interpreted as "realistic painting style." One word, potentially one quality tier of difference. For more on achieving photorealistic quality, see our commercial product photography prompt guide.
4. Specify Output Resolution
Output at the same resolution as the input image, preserving fine details.
Without this, most tools output default dimensions that may be smaller than your original.
5. Process Complex Scenes in Zones
If one-shot processing fails, split the work: first remove people from the left half, save the result, then process the right half with the same approach. This beats asking the AI to handle an entire dense crowd at once.
Not sure about the results? Try running both the base prompt and the enhanced version in nanobanana pro side by side to see the difference yourself.
When AI Fails: 5 Common Issues and How to Fix Them
Even with perfect prompts, these issues can still occur. Knowing them lets you fix fast:
| Failure | Root Cause | Fix (append to prompt) |
|---|---|---|
| Ghost outlines | Prompt says "remove" but not "replace" | completely remove and fill the area with background content |
| Repetitive texture patterns | Fill area too large, AI loops same texture | avoid repetitive pattern artifacts or reduce processing zone |
| Subject edge erosion | Tourist was too close, AI misjudged boundary | create a 5-pixel safety margin around the main subject |
| Color mismatch in fills | AI's global lighting understanding insufficient | match the exact color temperature of the surrounding area |
| Missing ground shadows | AI deleted environmental shadows along with tourist shadows | preserve environmental shadows, only remove shadows cast by tourists |
Each fix is an appendable phrase—no need to rewrite the entire prompt.
FAQ
Can AI process video tourist removal?
Current AI image tools handle single frames. Video removal requires frame-by-frame processing with inter-frame consistency—a significantly more complex task. Start by extracting key frames from your video and processing them individually.
What's the maximum number of people AI can remove?
No hard limit, but the rule of thumb: when people to be removed cover more than 50% of the frame area, quality drops significantly. The AI lacks enough reference information to reconstruct that much background. Consider cropping to a less crowded area first.
Does removal reduce image resolution?
Without specifying resolution in the prompt, most tools output default-size images that may be smaller than the original. Adding maintain original resolution at the end helps, but final output depends on the tool's maximum capability.
How does this differ from Photoshop's Content-Aware Fill?
Photoshop's Content-Aware Fill uses statistical pixel matching—it copies nearby textures to patch gaps. AI prompt-based removal uses semantic understanding—it knows "this is a brick wall" and generates logically correct mortar lines, rather than just copying adjacent pixels. For complex scenes (perspective-varying floors, reflective water), the AI approach typically produces more natural results.
Can processed photos be used commercially?
This depends on your AI tool's license terms. Most tools grant users rights to their output. However, if the original photo involves specific landmark photography rights (e.g., certain museum interiors), verify the original scene's copyright status before commercial use.