How We Built a Distributed Image Processing Pipeline That Handles 10,000 Requests/Minute for a Photo Startup (With a Vietnamese Team)

You’re running a photo startup. Users upload 10,000 images per minute during peak hours. Your single-server ImageMagick gulps memory, queues pile up, and processing takes 45 minutes. Sound familiar?

That’s exactly the problem a US-based photo-sharing app brought to us in late 2025. They needed a pipeline that could handle bursts of 10,000 images per minute—resizing, watermarking, and compressing—with sub-30-second turnaround. And they needed it in five weeks.

The Pull Request Playbook: What I Learned from Reviewing 1,000+ PRs with a Remote Vietnamese Team

The Pull Request Playbook: What I Learned from Reviewing 1,000+ PRs with a Remote Vietnamese Team Let me… ...

Here’s how we did it with a team of six Vietnamese engineers operating from Ho Chi Minh City and Can Tho, orchestrated through the ECOA AI Platform ACP.

Why a Single Server Wasn’t Enough

The client’s original setup ran on a single EC2 instance with eight cores. They’d push images to S3, a cron job would pull them every 15 minutes, run ImageMagick sequentially, and dump results back. Simple. But for 10k images/minute, that breaks.

AI Agent Orchestration — The Complete Guide to Building Multi-Agent Systems in 2026

TL;DR AI agent orchestration is the discipline of managing multiple autonomous AI agents that collaborate to achieve complex… ...

The math is brutal:

Average image processing time per image: 1.2 seconds (resize + watermark + compress)
Sequential processing on 8 cores: ~8 images per second
Time to process 10,000 images: 1,250 seconds = 20+ minutes (assuming zero queue)
Real-world with queue buildup and memory thrashing: 45 minutes to 1 hour

Their users were refreshing the app, seeing “processing” spinners for almost an hour. Retention was tanking.

The Architecture: SQS + Lambda + ECS (No AI Agents—Just Good Engineering)

We proposed a three-tier distributed system. No over-engineering, no unnecessary AI agents. Just solid event-driven architecture with a serverless core.

[mermaid]
graph LR
A[User Upload] –> B[S3 Bucket]
B –> C[S3 Event Notification]
C –> D[SQS Queue]
D –> E[Lambda Pre-processor]
E –> F[SQS Batch Queue]
F –> G[ECS Fargate Workers]
G –> H[Processed S3 Bucket]
[/mermaid]

Layer 1: S3 Event → SQS Queue

Every uploaded image triggers an S3 PutObject event that fans directly into an SQS Standard Queue. We set the visibility timeout to 5 minutes and enabled dead-letter queuing after 3 retries.

Layer 2: Lambda Pre-processor

A small Node.js Lambda (128 MB memory, 15-second timeout) reads the SQS message, validates the image dimensions, extracts EXIF data, and pushes metadata into a batch queue. It doesn’t process images—it’s a fast validator.

Layer 3: ECS Fargate Workers

The heavy lifting runs on ECS Fargate Spot instances. We spin up worker containers that poll the batch queue in batches of 10. Each worker:

Pulls images from S3 into a tmpfs volume (5 GB shared memory)
Runs Sharp (libvips) for resizing and watermarking
Compresses to WebP at 80% quality
Uploads results back to a separate S3 bucket
Deletes or moves original file

We chose Sharp over ImageMagick because it’s 4–8x faster for common operations. In our benchmarks, Sharp resized a 4000×3000 JPEG to 1200px wide in 350ms versus ImageMagick’s 1.1 seconds.

Performance Results

Metric	Before	After	Improvement
Time to process 10k images	45 min	28 seconds	~96% faster
Infrastructure cost per month	$1,200	$830	~31% lower
Max concurrent throughput	~8 images/sec	167 images/sec	~20x increase
Error rate under load	12%	0.4%	~97% fewer errors

Honestly, we didn’t expect the cost to decrease. We assumed serverless would spike. But by using Fargate Spot with auto-scaling, we only paid for compute during processing bursts. Idle time cost zero.

The Team: Six Engineers in Vietnam

We staffed the project with:

1 senior platform engineer (Dung, Ho Chi Minh City) – designed the terraform infrastructure and ECS config
2 middle backend engineers (Lin and An) – wrote the Lambda handlers and worker logic
1 junior devops engineer (Minh, Can Tho) – managed CI/CD pipelines and monitoring with Datadog
2 mid-level QA engineers (both in HCMC) – built a simulated load test harness using Artillery

They worked overlapping schedules with the client’s US-based team. Standups were at 9 AM EST (which is 8 PM HCMC). That’s a 4-hour overlap window. We used that window for code reviews and architecture decisions.

But, here’s the kicker: the entire team had never worked with Sharp or Fargate Spot before. They learned on the fly. We gave them three days to prototype. On day five, they had a working MVP running in a test account.

The ECOA AI Platform ACP Role

We didn’t use AI for the pipeline code itself—client’s requirement was to avoid “black box” AI-generated code. But the team used ECOA AI Platform ACP for:

Agent-assisted code review – Every PR was automatically analyzed by an AI agent checking for security misconfigurations (e.g., open S3 buckets, missing IAM roles)
Automated test generation – The QA agents generated Artillery load test scripts from our OpenAPI spec, cutting manual test writing by 60%
Context-aware documentation – When a new developer joined mid-project, ACP surfaced the relevant architecture decisions and config files automatically, slashing onboarding time from 1 week to 2 days

Why is this relevant? Because we could have built the pipeline faster with AI agent orchestration, but the client wanted full code ownership. So the AI augmented our *process*, not the output. That’s the smarter play for most production systems.

Key Lessons Learned

1. Don’t underestimate cold start on image processing

Lambda cold starts added 3–5 seconds for Sharp initialization. We set the `AWS_LAMBDA_EXEC_WARMER` environment variable and kept a reservation of 10 warm containers during business hours. Solved it.

2. SQS batch size matters

We started with batch size 1. Performance was terrible. Switched to batch size 10 with 5-second wait time. Throughput jumped 8x. The client’s QA team noticed the difference immediately.

3. Vietnamese developers are not just cheaper—they’re faster to onboard

The team’s ability to pick up Sharp and Fargate in days, not weeks, saved the project timeline. We’ve seen this again and again. It’s not about cost (at $2,000/mo for middle devs, they’re half the US rate)—it’s about raw adaptability.

4. Always test with burst traffic

We simulated a burst of 15,000 images. The SQS queue spiked to 12,000 messages, Lambda Pre-processor scaled to 150 concurrent invocations, then ECS workers scaled from 2 to 20 tasks in 4 minutes. Everything held. Without that test, we would have hit an SQS retention limit during a real event.

Frequently Asked Questions

Q: Why didn’t you use a single Lambda function for processing? Why split into Lambda + ECS?

A: Lambda has a 15-minute timeout and limited memory (10 GB max). For large images or complex transformations, a single Lambda would time out or run out of memory. ECS Fargate gives us unlimited duration and up to 30 GB of memory per task. The Lambda pre-processor just filters and validates—it’s lightweight and cheap.

Q: How did you handle retries for failed images?

A: We used SQS dead-letter queues with 3 retries and a 5-minute visibility timeout. Failed images go to a DLQ, where a separate Lambda moves them to a “failed” S3 prefix. A weekly cron job processes those manually. Error rate is below 0.5%, so manual fallback is fine.

Q: Can I hire your Vietnamese team for similar a project?

A: Yes. ECOAAI provides vetted, English-speaking Vietnamese developers at $1,000–$3,000/month depending on seniority. We also offer the ECOA AI Platform ACP to boost their efficiency by up to 5x. Contact us for a free consultation.

Q: Did you consider using a machine learning model for watermark detection?

A: We discussed it. The client wanted simple text-based watermarks (no ML), so we used Sharp’s overlay compositing. If they need AI-based watermark removal detection in the future, we can add an ML inference layer using SageMaker or ECS with GPU.