Camera-Controlled AI Image Editing with Qwen Image Edit (FastAPI + Web UI)
Share
Camera-Controlled AI Image Editing with Qwen Image Edit (FastAPI + Web UI)
Text-only prompts are often insufficient for precise image editing when camera perspective matters. To solve this, I built a camera-controlled AI image editing system using Qwen Image Edit 2511 with a Multiple-Angles LoRA adapter, backed by a FastAPI inference server and a lightweight browser UI.
This system allows users to upload a reference image, select camera angle, lighting, and shot type, and generate diffusion-optimized prompts automatically — all running locally.
Why Camera Control Matters in AI Image Editing
Text prompts alone are ambiguous for viewpoint changes
Camera angle consistency preserves subject identity
LoRA-based camera control improves edit accuracy
Local inference ensures privacy and predictability
System Architecture
The project is split into two independent repositories:
Frontend UI – Camera selection + prompt generation
Backend API – Qwen Image Edit inference server
Browser UI (HTML/JS)
↓
Prompt Generator
↓
FastAPI Backend
↓
Qwen Image Edit 2511 + LoRA
↓
Edited Image Output
Frontend: Camera Prompt Generator (Browser-Based)
The frontend is intentionally simple — no framework, no cloud dependencies. It generates diffusion-safe camera prompts.
Camera Prompt Builder (JavaScript)
function generatePrompt() {
const angle = document.getElementById('cameraAngle').value;
const height = document.getElementById('cameraHeight').value;
const shot = document.getElementById('shotType').value;
const lighting = document.getElementById('lighting').value;
return ` ${angle}, ${height}, ${shot}, ${lighting}, realistic perspective, same subject, consistent identity`;
}
Sending Image + Prompt to Backend
const response = await fetch('http://localhost:8000/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: generatePrompt(),
reference_image: uploadedImageBase64,
guidance_scale: 1.0,
num_inference_steps: 4,
height: 768,
width: 768
})
});
const data = await response.json();
displayOutputImage(data.image);
Backend: FastAPI + Qwen Image Edit
The backend is a FastAPI server optimized for 8GB VRAM GPUs. It loads Qwen Image Edit once and reuses the pipeline across requests.
FastAPI Entry Point
from fastapi import FastAPI
from app.schemas import GenerateRequest
from app.inference import generate_image
app = FastAPI()
@app.post("/generate")
async def generate(req: GenerateRequest):
image, seed = generate_image(req)
return {
"success": True,
"image": image,
"seed": seed
}
Inference Pipeline (Qwen Image Edit)
pipe = QwenImageEditPipeline.from_pretrained(
MODEL_ID,
torch_dtype=torch.float16
).to("cuda")
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
result = pipe(
prompt=prompt,
image=reference_image,
num_inference_steps=steps,
guidance_scale=guidance
)
Health Check Endpoint
@app.get("/health")
def health():
return {
"status": "ok",
"gpu_available": torch.cuda.is_available(),
"model_loaded": pipe is not None
}
Memory & Performance Optimization
FP16 inference
Attention slicing
VAE slicing
768×768 resolution limit
Stateless request handling
These optimizations allow stable inference on GPUs like RTX 3060 / 4070 Laptop.
Privacy-First by Design
Unlike cloud-based AI tools:
No image uploads to third-party servers
No prompt logging
No telemetry
Full local execution
This makes the system suitable for internal tools, R&D, and sensitive workflows.
Open Source Repositories
Final Thoughts
Camera-aware prompt engineering is the next step in controllable AI image editing. By separating UI, prompt logic, and inference, this architecture remains scalable, privacy-friendly, and production-ready.
If you’re exploring advanced diffusion workflows, Qwen Image Edit with camera control offers an excellent balance of power and efficiency.
Related Articles
Ready to Energize Your Project?
Join thousands of others experiencing the power of lightning-fast technology