Camera-Controlled AI Image Editing with Qwen Image Edit (FastAPI + Web UI)

Text-only prompts are often insufficient for precise image editing when camera perspective matters. To solve this, I built a camera-controlled AI image editing system using Qwen Image Edit 2511 with a Multiple-Angles LoRA adapter, backed by a FastAPI inference server and a lightweight browser UI.

This system allows users to upload a reference image, select camera angle, lighting, and shot type, and generate diffusion-optimized prompts automatically — all running locally.

Why Camera Control Matters in AI Image Editing

Text prompts alone are ambiguous for viewpoint changes
Camera angle consistency preserves subject identity
LoRA-based camera control improves edit accuracy
Local inference ensures privacy and predictability

System Architecture

The project is split into two independent repositories:

Frontend UI – Camera selection + prompt generation
Backend API – Qwen Image Edit inference server

Browser UI (HTML/JS)
   ↓
Prompt Generator
   ↓
FastAPI Backend
   ↓
Qwen Image Edit 2511 + LoRA
   ↓
Edited Image Output

Frontend: Camera Prompt Generator (Browser-Based)

The frontend is intentionally simple — no framework, no cloud dependencies. It generates diffusion-safe camera prompts.

Camera Prompt Builder (JavaScript)


function generatePrompt() {
  const angle = document.getElementById('cameraAngle').value;
  const height = document.getElementById('cameraHeight').value;
  const shot = document.getElementById('shotType').value;
  const lighting = document.getElementById('lighting').value;

  return ` ${angle}, ${height}, ${shot}, ${lighting}, realistic perspective, same subject, consistent identity`;
}

Sending Image + Prompt to Backend


const response = await fetch('http://localhost:8000/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    prompt: generatePrompt(),
    reference_image: uploadedImageBase64,
    guidance_scale: 1.0,
    num_inference_steps: 4,
    height: 768,
    width: 768
  })
});

const data = await response.json();
displayOutputImage(data.image);

Backend: FastAPI + Qwen Image Edit

The backend is a FastAPI server optimized for 8GB VRAM GPUs. It loads Qwen Image Edit once and reuses the pipeline across requests.

FastAPI Entry Point


from fastapi import FastAPI
from app.schemas import GenerateRequest
from app.inference import generate_image

app = FastAPI()

@app.post("/generate")
async def generate(req: GenerateRequest):
    image, seed = generate_image(req)
    return {
        "success": True,
        "image": image,
        "seed": seed
    }

Inference Pipeline (Qwen Image Edit)


pipe = QwenImageEditPipeline.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16
).to("cuda")

pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

result = pipe(
    prompt=prompt,
    image=reference_image,
    num_inference_steps=steps,
    guidance_scale=guidance
)

Health Check Endpoint


@app.get("/health")
def health():
    return {
        "status": "ok",
        "gpu_available": torch.cuda.is_available(),
        "model_loaded": pipe is not None
    }

Memory & Performance Optimization

FP16 inference
Attention slicing
VAE slicing
768×768 resolution limit
Stateless request handling

These optimizations allow stable inference on GPUs like RTX 3060 / 4070 Laptop.

Privacy-First by Design

Unlike cloud-based AI tools:

No image uploads to third-party servers
No prompt logging
No telemetry
Full local execution

This makes the system suitable for internal tools, R&D, and sensitive workflows.

Open Source Repositories

Final Thoughts

Camera-aware prompt engineering is the next step in controllable AI image editing. By separating UI, prompt logic, and inference, this architecture remains scalable, privacy-friendly, and production-ready.

If you’re exploring advanced diffusion workflows, Qwen Image Edit with camera control offers an excellent balance of power and efficiency.

Camera-Controlled AI Image Editing with Qwen Image Edit (FastAPI + Web UI)

Camera-Controlled AI Image Editing with Qwen Image Edit (FastAPI + Web UI)

System Architecture

Frontend: Camera Prompt Generator (Browser-Based)

Camera Prompt Builder (JavaScript)

Sending Image + Prompt to Backend

Backend: FastAPI + Qwen Image Edit

FastAPI Entry Point

Inference Pipeline (Qwen Image Edit)

Health Check Endpoint

Memory & Performance Optimization

Privacy-First by Design

Open Source Repositories

Final Thoughts

Related Articles

Qwen3 TTS Voice Cloning: Clone Voices Instantly with AI | StakSoft AI Insights

Real-Time Whisper WebGPU: High-Performance Speech-to-Text in Browser | Staksoft Guide

How to Summarize a PDF Privately Using AI (No Uploads, 100% Secure)

Ready to Energize Your Project?