From Legacy PHP to Event-Driven: How an EdTech Platform Migrated 50,000 Users in 8 Weeks

(Case Studies) - An EdTech platform was drowning in a 12-year-old PHP monolith. We migrated their entire user base and data pipeline to an event-driven architecture in 8 weeks—with a team in Ho Chi Minh City. Here's exactly how we pulled it off.

From Legacy PHP to Event-Driven: How an EdTech Platform Migrated 50,000 Users in 8 Weeks

The client came to us with a problem I’ve seen a hundred times. A 12-year-old EdTech platform built on PHP 5.6. A single MySQL database handling everything—user profiles, course enrollments, payment history, real-time quiz results, and video progress tracking. 50,000 active users. And it was breaking.

Every Friday at 8 PM (peak usage across US time zones), the site would crawl to a halt. Queue times for quiz scoring hit 45 seconds. Payment confirmations took up to 3 minutes. The client had already tried adding more MySQL read replicas. It didn’t matter. The bottleneck wasn’t the database—it was the monolith’s architecture.

Why Vietnam Outsourcing Is the Smartest Bet for Your Next Software Project

Why Vietnam Outsourcing Is the Smartest Bet for Your Next Software Project

TL;DR: Vietnam outsourcing is rapidly becoming the preferred offshore destination for Western tech companies. Lower costs than India,… ...

We had 8 weeks before the next academic semester started. Here’s how we rewrote the entire platform from the ground up, migrated 50,000 users with zero data loss, and cut processing times by 87%.

The Real Problem Wasn’t PHP

Let’s be brutally honest: PHP 5.6 wasn’t the core issue. Plenty of platforms run PHP 8.x in production just fine. The real problem was the data model.

Ship Leaner and Faster: Docker Optimization for Production Projects (With Real CI/CD Examples)

Ship Leaner and Faster: Docker Optimization for Production Projects (With Real CI/CD Examples)

Ship Leaner and Faster: Docker Optimization for Production Projects (With Real CI/CD Examples) TL;DR: Most production Docker images… ...

The legacy system stored everything in one gigantic `activities` table. User actions, quiz answers, video watch times—all crammed into a single table with 47 columns and 12 million rows. Queries like `SELECT * FROM activities WHERE user_id = ? AND course_id = ?` were taking 6–8 seconds. Indexes helped, but not enough.

“We’ve added indexes twice. It doesn’t help anymore.” — Their CTO, week 1

He was right. The data model had outgrown the relational approach.

Our Strategy: Strangler Fig + Event-Driven Core

We didn’t do a big-bang rewrite. That’s a recipe for disaster. Instead, we applied the Strangler Fig pattern. We identified three bounded contexts:

  1. User Identity & Enrollment (stays in PostgreSQL for now)
  2. Quiz & Assessment Engine (moves first—highest pain)
  3. Video Progress Tracking (moves second)

The quiz engine was the priority. It accounted for 70% of the slow queries.

Step 1: Build the Event Bus

We spun up a Kafka cluster (3 brokers, `t3.medium` instances) and defined our core events:

json
{
  "event_type": "quiz.submitted",
  "user_id": "u_abc123",
  "course_id": "c_xyz456",
  "quiz_id": "q_789",
  "answers": [
    {"question_id": "q1", "selected_option": "b", "timestamp_ms": 1710000000123},
    {"question_id": "q2", "selected_option": "a", "timestamp_ms": 1710000000345}
  ],
  "submitted_at": "2025-03-09T14:00:00.123Z"
}

We used Kafka Connect with Debezium to stream changes from the legacy MySQL `activities` table into Kafka topics. That gave us a real-time change data capture (CDC) pipeline. Zero downtime. Zero data loss.

Step 2: Rewrite the Scoring Engine in Go

Why Go? Because the PHP monolith had a synchronous, blocking scoring loop. Each quiz submission triggered a `for` loop over every question, making individual HTTP calls to a third-party plagiarism checker. That’s insanity. One slow third-party API call blocked the entire queue.

We rewrote the scoring engine as a stateless Go microservice with channel-based concurrency. Each question was scored in a separate goroutine. Timeouts were set to 500ms per external call. If the plagiarism checker hung, we logged the failure and moved on.

go
func ScoreQuiz(ctx context.Context, submission QuizSubmission) (QuizResult, error) {
    results := make(chan QuestionResult, len(submission.Answers))
    ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
    defer cancel()

    for _, answer := range submission.Answers {
        go func(a Answer) {
            select {
            case <-ctx.Done():
                results <- QuestionResult{ID: a.ID, Error: ctx.Err()}
            default:
                score := evaluateAnswer(a)
                results <- score
            }
        }(answer)
    }

    // collect results
    ...
}

Result? Average quiz scoring time dropped from 45 seconds to 1.2 seconds. That's a 97.3% reduction.

Step 3: Data Migration—Dual Writes

We couldn't afford a full cutover. So we implemented dual writes: every new quiz submission was written to both the legacy MySQL table and the new PostgreSQL event store. We ran both systems in parallel for 3 weeks.

The migration script ran as a set of idempotent Kafka consumers. Each consumer read from the CDC stream, transformed the old schema to the new schema, and wrote to PostgreSQL. If it failed, it retried with exponential backoff (max 5 retries).

We migrated 312,000 historical quiz records over one weekend. The rollback plan? Just stop the consumers. The legacy system was untouched.

The AI Augmentation That Made It Possible

Here's where ECOA AI Platform ACP came in. We can't pretend we did this with human effort alone.

Our team in Ho Chi Minh City—5 senior engineers, 2 middles—used ACP to automate the schema mapping. The legacy `activities` table had 47 columns. Manually mapping each column to the new event schema would've taken weeks. ACP's AI agent analyzed both schemas, identified 41 column mappings with 94% confidence, and generated the Kafka Connect configuration automatically.

We reviewed its output, corrected 3 mappings, and deployed. That saved roughly 120 engineering hours.

"I was skeptical about AI-generated config. But the mapping it produced was cleaner than what I'd written manually." — Our lead engineer, after the first review

ACP also handled error classification in production. When the scoring engine encountered a timeout, the AI agent grouped the error by third-party service and suggested a circuit breaker threshold. We set it to 5 failures in 30 seconds. That cut downstream error rates by 80%.

The Numbers (8 Weeks Later)

Metric Before After Improvement
Quiz scoring time (p95) 45s 1.2s 97.3%
Payment confirmation latency 3 min 8s 95.6%
Failed requests per day 450 12 97.3%
Deployment frequency 1x/week 5x/day 3500%
Infrastructure cost/month $14,200 $8,400 40.8% savings

We migrated 50,247 users with zero data loss. One user reported a missing quiz result on day 2—turned out to be a timezone bug in the legacy data. We fixed it in 20 minutes using the CDC replay.

What I'd Do Differently

Honestly? We should have started with the payment system instead of the quiz engine. Payments are simpler to model as events (order placed, payment received, receipt sent). The quiz engine had more edge cases—partial scores, late submissions, retake logic. We spent an extra week on edge case handling that the payment system wouldn't have needed.

But the client insisted on fixing the quiz latency first. Can't blame them. That's where the user pain was loudest.

Lessons for Anyone Migrating a Legacy System

  • Don't rewrite everything. Identify the top 3 pain points and migrate those bounded contexts. Leave the rest running.
  • Dual writes are your safety net. Always write to both systems until you're confident the new one works.
  • Use CDC for historical migration. Don't write custom ETL scripts that query the old database directly. Stream it.
  • AI orchestration isn't a gimmick. Schema mapping, error classification, config generation—these are real time-savers. We cut 120 hours of boring work.

Frequently Asked Questions

How did you ensure data integrity during the dual-write phase?

We used Kafka with exactly-once semantics and idempotent consumers. Each event carried a unique ID. If the consumer failed mid-processing, it replayed the event but the PostgreSQL upsert logic checked the event ID. Duplicates were silently ignored. We also ran hourly reconciliation scripts that compared row counts and checksums between MySQL and PostgreSQL.

What was the hardest technical challenge?

Handling the legacy `activities` table's polymorphic associations. One row could represent a quiz answer, a video watch session, or a payment receipt—different columns were used depending on the `activity_type`. Mapping that to clean, typed event schemas required manual review. The AI agent got the easy mappings right, but we had to write custom transform functions for 6 edge cases.

Did the AI agent actually save time, or was it more overhead?

It saved time overall. The schema mapping alone would've been 3–4 days of tedious work. The agent did it in 2 hours. We spent another 4 hours reviewing and fixing. That's still a net gain. The error classification was the real surprise—it caught a pattern we hadn't noticed (a specific third-party API timing out only during US business hours).

How does the cost compare to hiring a local US team?

Our Vietnamese team cost roughly $3,000/month per senior engineer. A comparable senior engineer in the US would be $12,000–$18,000/month. We ran a team of 7 for 8 weeks. Total engineering cost: ~$42,000. A US-based team would've been $168,000–$252,000. The AI platform cost an additional $4,000. Even with that, we delivered at 80% less cost than the local alternative—and we finished ahead of schedule.

Related: software development outsourcing — Learn more about how ECOA AI can help your team.

Related: outsourcing software to Vietnam — Learn more about how ECOA AI can help your team.

Related: affordable software outsourcing — Learn more about how ECOA AI can help your team.

Related reading: Why Vietnam Outsourcing Is the Smartest Move for Your Next Tech Build

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.