Computer VisionWebGLReactMediaPipe

Building CaricatureCam: Real-Time Face Warping in the Browser

Emmanuel Nwanochie

2026-06-21

6 min read

I recently built CaricatureCam, a browser-based app that applies real-time facial caricature effects to your live webcam feed — no installs, no servers, no uploads. Everything runs locally in the browser at 30+ frames per second. In this article I'll walk through how it works, the architecture decisions behind it, and the interesting engineering problems I had to solve along the way.

👉 Try it live: caricaturecam.nwanochie.dev (works best in Chrome or Edge — it needs camera access).

What CaricatureCam does

CaricatureCam tracks your face in real time and warps it on the fly: enlarge your nose, give yourself bug eyes, stretch your forehead, or distort your whole face like a funhouse mirror. You can stack multiple effects, tune the intensity of each one with a slider, record the result as a WebM video, snap a PNG photo, and even pipe the processed feed into Zoom or Google Meet through a virtual camera.

10 combinable caricature effects, each with its own intensity slider
30+ FPS facial landmark detection on a 468-point face mesh
Recording with pause/resume and one-click download
Photo capture straight from the canvas
Virtual camera output for video calls via OBS
Multi-camera support and a mirrored selfie view by default

The tech stack

CaricatureCam is built with React 19, TypeScript, Vite, and Tailwind CSS. The magic behind the face tracking is Google's MediaPipe Face Landmarker (@mediapipe/tasks-vision), which gives me a 468-point 3D mesh of the face directly in the browser using WebGL acceleration. The entire pipeline — capture, detect, warp, render — happens client-side, which means the user's video never leaves their machine. That privacy guarantee was a non-negotiable design goal from day one.

The rendering pipeline

Every frame goes through the same loop, driven by requestAnimationFrame:

Grab the current frame from the <video> element.
Run MediaPipe Face Landmarker to get 468 normalized landmark coordinates.
For each active effect, warp the relevant region of the image on a <canvas>.
Composite and draw the result, then hand it off to the recorder or virtual camera if active.

The hard part is doing all of this fast enough that it feels live. Landmark detection is the most expensive step, so the effects themselves have to be cheap. Each warp is implemented as a localized image transform — I sample a region around a set of landmarks and remap pixels outward or inward to exaggerate that feature.

A pluggable effects architecture

I didn't want adding a new effect to mean touching the rendering loop, the UI, and the state management. So effects are self-describing modules that conform to a single EffectDescriptor interface:

export const myEffect: EffectDescriptor = {
  id: 'myEffect',
  name: 'My Effect',
  icon: '🎭',
  defaultIntensity: 0.5,
  apply(ctx, landmarks, width, height, intensity, src) {
    // warp the canvas using utility helpers
  },
}

Each effect declares its id, display name, icon, default intensity, and an apply function. A central registry imports them all, and the UI renders controls automatically by iterating over that registry. Adding a new effect is two lines: write the module, add it to the registry. It shows up in the interface with its own slider with zero extra wiring. This kind of registry-driven design keeps the codebase open for extension but closed for modification — new effects never risk breaking existing ones.

Sharing the warp math

Most effects boil down to a few primitive operations: warp a circular region around a landmark, stretch a horizontal band of the face, or scale the overall face width. I factored these into reusable helpers (warpRegion, stretchBand, warpFaceWidth) that operate on the mirrored, normalized landmark coordinates. "Big Nose" and "Bug Eyes" are both just warpRegion calls with different landmark sets and signs. That shared foundation is why ten distinct effects amount to surprisingly little code.

Recording and the virtual camera

Recording uses the MediaRecorder API pointed at the canvas's captureStream(), writing WebM chunks that get stitched together and offered as a download when the user stops. The virtual camera path leans on OBS: you add the app as a browser source, start OBS's virtual camera, and select it inside Zoom or Meet. It's a pragmatic solution — browsers don't expose a native virtual-camera API yet — but it turns a fun toy into something you can actually show up to a meeting with.

Lessons learned

Building CaricatureCam reinforced a few things I keep coming back to as an engineer. First, doing the heavy lifting on the client can be both a performance win and a privacy feature — there's no round trip and no data leaves the device. Second, a good plugin interface pays for itself fast; once the effect contract was stable, experimenting with new distortions became genuinely fun instead of tedious. And third, perceived performance is a feature — keeping the per-frame work lean is what makes the difference between a gimmick and something that feels magical.

CaricatureCam is part of my ongoing exploration of real-time computer vision and creative tooling on the web. If you're curious about on-device ML, MediaPipe, or building extensible front-end architectures, I'd love to chat — try the live demo or reach out through the contact section of this site.

The Layers of a Network Request: nginx stream vs HTTP Proxying

Two backends terminating their own TLS meant I needed end-to-end TLS passthrough, not termination at the proxy — which sent me into the layers of a network request and onto nginx's Layer 4 stream module.

Automating Linux User Creation with Bash Scripts

Introduction Managing users on a Linux system can be a repetitive and error-prone task, especially when dealing with a large number of users. Automating this…

Handy Javascript Array Methods

There are really handy array methods in javascript to keep in mind when trying to manipulate data within an array to get your desired output. I would be going…