EngineeringMar 5, 2026·5 min read

How I Built Check Up's OCR Pipeline

Breaking down the computer vision stack behind automatic stat extraction, from raw screenshots to structured JSON with 3,400+ passing tests.

SharePost on X LinkedIn Email

David Olverson

Founder, ModernGrindTech

NBA 2K doesn't have a stats API. No endpoints, no webhooks, no export button. If you want to track your MyCareer averages, your rec center stats, or your Pro-Am box scores, you do it the old-fashioned way: screenshot the screen and type the numbers into a spreadsheet. In 2026. For a game that sells 10 million copies a year.

I built the OCR pipeline that fixes this. Upload a screenshot, get structured JSON back in under 2 seconds. Here's exactly how it works and what made it hard.

The Pipeline

The architecture is five stages, each one feeding the next:

Screenshot upload. The user drops an image through the 2K-Hub web interface. Next.js API route accepts it, validates the file type and size, and stores the raw image in temporary storage.
Image preprocessing. Before sending anything to the vision model, I normalize the image. Resize to a consistent resolution, adjust contrast to handle HDR screenshots, and crop to the stat-relevant regions of the screen. This step alone improved extraction accuracy by about 30%.
Claude Vision extraction. The preprocessed image goes to Claude's vision API with a structured prompt that tells it exactly what stat fields to look for (points, rebounds, assists, steals, blocks, turnovers, field goal percentage, and 15+ other fields depending on the game mode). The response comes back as raw text with the extracted values.
Validation layer. This is where most of the engineering lives. The raw extraction gets run through a validation pipeline: type checking (points should be an integer, FG% should be a decimal), range checking (nobody scores 900 points in a game), cross-field validation (FGM can't be higher than FGA), and format normalization (converting "12-23" shooting splits into separate made/attempted fields).
Database storage. Validated stats get written to PostgreSQL through Prisma. Each stat line is tied to a user, a game mode, a date, and the original screenshot URL for audit purposes.

What Made It Hard

If you've never tried OCR on video game screenshots, you might think this is a solved problem. It is not.

Resolution chaos. Players screenshot on PS5 at 4K, Xbox Series S at 1080p, PC at ultrawide 3440x1440, and Nintendo Switch at 720p. The stat overlay renders at different sizes, different positions, and different font scales on every platform. A pipeline that works perfectly on PS5 screenshots will miss half the fields on a Switch capture.

HDR and color issues. HDR screenshots have blown-out highlights that make white text on light backgrounds nearly invisible. Some players use colorblind modes that change the entire UI palette. Others have brightness cranked to max or min. The preprocessing step has to handle all of these without being told which variant it's looking at.

Overlapping UI elements. 2K loves to layer UI on top of UI. Achievement popups cover stat columns. Squad invites overlay the box score. The timeout indicator sits right on top of the assist count in certain game modes. The pipeline has to either work around these occlusions or flag the extraction as low-confidence so the user can verify.

Inconsistent formatting. Different game modes show stats differently. MyCareer shows per-game averages with one decimal. Rec center shows totals as integers. Pro-Am shows both but in different column orders. The extraction prompt has to adapt to whichever format it detects, and the validation layer has to know which rules apply to which format.

3,400+ Tests

I don't ship OCR without heavy testing. The test suite covers:

Unit tests for every validation rule (range checks, type coercion, cross-field logic)
Integration tests with real screenshots from every platform and game mode
Edge case tests for corrupted images, partial screenshots, and non-2K images
Regression tests for every extraction bug that's ever been reported and fixed
Performance tests ensuring the full pipeline completes in under 2 seconds

3,400+ tests across all of those categories. Every PR runs the full suite. If extraction accuracy drops on any screenshot in the test corpus, the build fails.

Pipeline Components

For anyone who wants the technical details:

Frontend: Next.js with drag-and-drop upload, real-time extraction status, and inline stat editing for corrections
Vision: Claude Vision API with game-mode-specific prompt templates
Database: PostgreSQL via Prisma with full stat history, audit trails, and per-user analytics
Preprocessing: Sharp for image manipulation, custom contrast normalization, region-of-interest cropping

What Users Get

Upload a screenshot, get your stats in structured JSON in under 2 seconds. No typing, no spreadsheets, no manual tracking. The accuracy rate across all supported platforms and game modes sits above 95%, and the validation layer catches most of the remaining 5% before it ever hits the database.

This is the kind of problem I love solving: a real gap where no official solution exists, a technical challenge that's harder than it looks on the surface, and a result that saves users real time every single session. The full platform is deployed at 2khub.io. Read the full 2K-Hub case study for architecture details, or check out the AI automation services page if you have a similar data extraction problem that needs solving.

You read 0% of this post5 min read

5 min read

Keep reading

← PreviousBuild LogBuilding an Esports Platform for a Content Creator Next →EngineeringBuilding an OSRS Private Server from Scratch

Engineering

AI-Assisted Development vs Hand-Coded: When to Use Each (2026)

AI handles 40-60% of a modern web app in 2026 (CRUD, auth scaffolds, migrations, tests). The other half, where it matters most, still needs a human architect. One-in-four AI-generated files has a subtle bug I catch at review. The pattern: AI for convention-correctness, hand-coded for context-correctness.

Apr 17, 2026

Engineering

Migrating a Discord Bot from Replit to Railway

Replit was causing cold starts, random disconnects, and 3am outages for a production Discord bot. Here is the full migration to Railway: Dockerfile setup, SQLite volume mounts, DNS cutover, token conflicts, and how the monthly cost dropped from $25 to $5.

Apr 7, 2026

Engineering

Mission Control: Building a Cron Registry for a Solo Dev Empire

Ten projects, scattered cron jobs, no visibility. Mission Control is the brain of the MGT ecosystem: cron registry, executor, heartbeat monitor, and execution history. 4 DB tables, 8 seeded crons, 9 monitored services, all from one admin panel.

Apr 7, 2026

Related case studies

Check Up Case Study→

Get build updates. No spam.

New product launches, build logs, and workshop announcements, sent when there's something worth reading.

Real build logs · Build-in-public updates · ~2 per month

Build logs, product launches, and behind-the-scenes from a solo dev studio.

No spam, ever1-2 emails/monthUnsubscribe anytime

Need something like this built?

Book a 30-minute call

← Build Log

EngineeringMar 5, 2026·5 min read

How I Built Check Up's OCR Pipeline

Breaking down the computer vision stack behind automatic stat extraction, from raw screenshots to structured JSON with 3,400+ passing tests.

SharePost on X LinkedIn Email

David Olverson

Founder, ModernGrindTech

I built the OCR pipeline that fixes this. Upload a screenshot, get structured JSON back in under 2 seconds. Here's exactly how it works and what made it hard.

The Pipeline

The architecture is five stages, each one feeding the next:

Screenshot upload. The user drops an image through the 2K-Hub web interface. Next.js API route accepts it, validates the file type and size, and stores the raw image in temporary storage.
Image preprocessing. Before sending anything to the vision model, I normalize the image. Resize to a consistent resolution, adjust contrast to handle HDR screenshots, and crop to the stat-relevant regions of the screen. This step alone improved extraction accuracy by about 30%.
Claude Vision extraction. The preprocessed image goes to Claude's vision API with a structured prompt that tells it exactly what stat fields to look for (points, rebounds, assists, steals, blocks, turnovers, field goal percentage, and 15+ other fields depending on the game mode). The response comes back as raw text with the extracted values.
Validation layer. This is where most of the engineering lives. The raw extraction gets run through a validation pipeline: type checking (points should be an integer, FG% should be a decimal), range checking (nobody scores 900 points in a game), cross-field validation (FGM can't be higher than FGA), and format normalization (converting "12-23" shooting splits into separate made/attempted fields).
Database storage. Validated stats get written to PostgreSQL through Prisma. Each stat line is tied to a user, a game mode, a date, and the original screenshot URL for audit purposes.

What Made It Hard

If you've never tried OCR on video game screenshots, you might think this is a solved problem. It is not.

3,400+ Tests

I don't ship OCR without heavy testing. The test suite covers:

Unit tests for every validation rule (range checks, type coercion, cross-field logic)
Integration tests with real screenshots from every platform and game mode
Edge case tests for corrupted images, partial screenshots, and non-2K images
Regression tests for every extraction bug that's ever been reported and fixed
Performance tests ensuring the full pipeline completes in under 2 seconds

3,400+ tests across all of those categories. Every PR runs the full suite. If extraction accuracy drops on any screenshot in the test corpus, the build fails.

Pipeline Components

For anyone who wants the technical details:

Frontend: Next.js with drag-and-drop upload, real-time extraction status, and inline stat editing for corrections
Vision: Claude Vision API with game-mode-specific prompt templates
Database: PostgreSQL via Prisma with full stat history, audit trails, and per-user analytics
Preprocessing: Sharp for image manipulation, custom contrast normalization, region-of-interest cropping