Portfolio

Projects

A selection of professional and personal work spanning data engineering, BI, machine learning, and creative software.

Personal · Full-Stack Web App
Live
May 2026 – Present

RareDex — Pokémon TCG Card Scanner

A mobile-first web app that identifies Pokémon TCG cards via camera scan, retrieves live market prices, and exports collections via shareable link or CSV download. Built for collectors and card shop trade-ins.

  • OCR pipeline using Google Cloud Vision + OpenCV to extract collector numbers from card images
  • Live video viewfinder mode with motion detection and debounce confirmation — scans while you sweep cards
  • Batch upload mode processes full card albums sequentially with a progress indicator
  • Google OAuth + magic-link auth (two-track, keyed on email); configurable trade-in percentage calculator; shareable export links and CSV download
  • Play-stamp detection for Prize Pack Series variants with TCGPlayer price lookup
  • Era-spanning regression suite: 25 cards across Black & White → Scarlet & Violet covering every card type (Pokémon, Supporter, Item, Stadium, Special Energy), all passing against the live API
PythonFlaskPostgreSQLGoogle Cloud VisionOpenCVVanilla JS
Worth noting: The OCR targets the collector number rather than the card name — numbers are small, isolated, and uniform, while card names have art interference and stylized fonts. That one decision made the pipeline dramatically more reliable, and means the app works on any card regardless of language.
Open App →
Personal · Data Eng
Live
May 2026 – Present

Indigo Circuit — Competitive PTCG Intelligence Platform

A competitive Pokémon TCG analytics platform that ingests major tournament results, runs them through a dbt + Snowflake pipeline, and surfaces live rankings, meta breakdowns, and player profiles via a Flask dashboard.

  • End-to-end data pipeline: raw scrape from Limitless TCG → dbt staging → mart layer computing ATP scores, seasonal rankings, archetype stats, and top-finisher history
  • Champion / Elite Four / Gym Leader tier system — segments the top 100 players and top 8 archetypes into named tiers; the framing is thematic but the segmentation does real analytical work
  • Meta breakdown across 30+ archetypes: appearance rate, Day 2 conversion rate, Top 8 rate, and win rate — format-aware, adjusts automatically to the active card set rotation
  • Player profiles with Plotly placement-history charts, archetype breakdowns, and full tournament records going back across seasons
  • TTL-cached Snowflake queries with a pre-warm thread on deploy — every endpoint is warm before the first real request; cold query latency is never on the critical path
  • Dynamic OG embed image (Pillow, 1200×630) generated server-side with live champion data; all pages under 300ms
PythondbtSnowflakeFlaskPlotlyRailway
Worth noting: Snowflake queries take 1–2 seconds cold. Running them live on every request would make the site unusable. The solution was a TTL cache with a background pre-warm thread that fires 3 seconds after deploy and hits every heavy endpoint. First real user always gets a cached response. It's a small thing, but it's the difference between a demo that works in a screencast and a product someone actually uses.
Open App →
Personal · Creative / WebGL
Live
May 2026 – Present

Voidpulse — GPU Particle Visualizer

A GPU particle visualizer modeled on the classic iTunes Magnetosphere plugin, running entirely in the browser. 60,000 particles react in real time to live audio via the Web Audio API — mic input, system audio capture, or file upload.

  • Custom GLSL vertex shader drives per-particle deformation: bass controls radial breathing and point size, mid displaces along Y, treble adds high-frequency jitter
  • 2-octave analytic curl-noise flow field keeps the cloud in constant divergence-free motion without any physics sim overhead
  • 6 morphable particle shapes (sphere, heart, torus, galaxy spiral, cube, helix) with smooth collapse-and-bloom transitions routed through the sphere
  • Beat detection uses a two-envelope onset detector with an 8-frame refractory window — punchy on hits, no strobing between beats
  • Cinematic mode auto-cuts between 6 named camera angles every 12–20s and pairs each cut with a random palette swap
  • EffectComposer post-processing stack: UnrealBloomPass + OutputPass (sRGB + tonemap), tunable from the UI panel
  • Audio-reactive gravity wells orbit the cloud and pull particles into visible clusters; count and pull strength are live-tunable
three.js r160WebGL / GLSLWeb Audio APIFlaskRailway
Worth noting: All the signal processing happens in the browser — no backend audio pipeline, just the Web Audio API's AnalyserNode feeding FFT data into shader uniforms 60 times a second. The GPU handles 60,000 particles per frame; the CPU handles the band math and beat detection. The constraint that made the whole thing interesting: tab audio capture via getDisplayMedia lets the visualizer react to whatever's already playing in another tab, with no system audio driver required.
Open App → View on GitHub →
Personal · Health Equity Analysis
Phase 1 · Complete
May 2026 – Present

Social Vulnerability vs. Health Outcomes in America

A phased analysis joining CDC's Social Vulnerability Index with PLACES health data across ~73,000 US census tracts to quantify how community vulnerability predicts health outcomes.

  • Diabetes prevalence is 1.64x higher in highly-vulnerable tracts (15.6% vs 9.5%)
  • Socioeconomic vulnerability is the dominant predictor across most outcomes
  • Identified DC, Connecticut, and Louisiana as states with sharpest within-state disparities
Pythonpandasscikit-learnmatplotlibgeopandas
Worth noting: SVI and PLACES are both well-maintained public datasets, but they weren't designed to be joined. Getting the geographic grain right across 73,000 census tracts took longer than the analysis itself. Once that was clean, the 1.64x diabetes finding came out clearly — which is either obvious or alarming depending on who you ask.
View Notebook → View on GitHub →
AI & Automation · Personal
Active
May 2026 – Present

Automated Image Generation Pipeline (FLUX.1-dev / ComfyUI)

A local automated pipeline using FLUX.1-dev and ComfyUI that generates, batch-processes, and organizes AI images from structured prompt inputs — runs unattended, no per-image supervision required.

  • Parameterized prompt templates feeding directly into ComfyUI workflows via API
  • Automated batch queuing, output naming, and file organization
  • Local GPU inference (no API costs) with reproducible seed control
  • Workflow JSON is version-controlled — swap models or prompt structures without rewriting the pipeline
PythonComfyUIFLUX.1-devREST APIdiffusers
Worth noting: The interesting part isn't the generation — it's that once the prompts are parameterized and the queue is set up, there's nothing left to babysit. Runs on local GPU, files organize themselves, no per-image API costs accumulating in the background.
Data Engineering · Microsoft
Jul 2021 – Jul 2024

SSAS Multidimensional → Tabular Migration

Led the migration of enterprise SSAS Multidimensional Cubes to a Tabular Model, improving the analytics infrastructure used by finance and accounting teams.

  • 200x faster query response when filtering data
  • 300% improvement in processing speed
  • Enabled independent scaling of cube components
  • Automated incident generation post-migration
SQLMDAXSSASAzure
Worth noting: The 200x stat is real. What made it clean was scoping narrowly — this was a migration, not a redesign. The main risks were DAX parity with existing MDX logic and partition strategy. Those are where most of these go wrong.
ETL & Pipelines · Microsoft
Jul 2021 – Jul 2024

Petabyte-Scale ETL Pipelines

Designed and maintained scheduled ETL pipelines processing hundreds of petabytes of raw data to deliver clean datasets for financial reporting.

  • SQL and MDAX queries against raw, unfiltered petabyte-scale sources
  • Delivered clean financial datasets powering accounting team reporting
  • Built and maintained live Tableau dashboards for finance stakeholders
SQLMDAXAzure Data FactoryAzure Data Lake
Worth noting: At this scale, the query logic isn't the hard part. What makes a pipeline actually reliable is how it fails — whether it surfaces errors loudly and specifically when something upstream changes. Silent corruption in financial data is worse than downtime.
Stakeholder Analytics · UVA
May 2025 – Mar 2026

End-to-End Stakeholder Dashboards

Led individual analytics projects from intake through sign-off — meeting with stakeholders across the university to gather requirements, source raw data, and deliver tailored Tableau dashboards.

  • Owned the full project lifecycle: requirements, data, build, delivery
  • Translated non-technical needs into clear visualizations
  • Helped stakeholders onboard to the dashboards and the Tableau platform
TableauRequirements GatheringStakeholder Communication
Worth noting: The requirements conversation is most of the job. People know the data they want to see; they're less sure what question they're actually trying to answer. Getting that right upfront is the difference between a dashboard that gets used and one that gets opened once.
BI Performance · UVA
May 2025 – Mar 2026

Tableau Site Optimization

Led performance optimization of UVA's Tableau environment — telemetry-driven diagnosis, targeted cleanup across stakeholder workbooks.

  • 20% faster workbook load times organization-wide
  • Telemetry monitoring to identify bottlenecks
  • Audited and resolved performance issues across multiple stakeholder workbooks
TableauTelemetry
Worth noting: Performance problems in Tableau usually concentrate in a few specific places — extract size, custom SQL that reruns on every page load, LODs that don't need to be LODs. Telemetry makes it obvious which. The 20% improvement came from fixing those, not a blanket cleanup.
App Modernization · Microsoft
Jul 2021 – Jul 2024

Desktop-to-Web Configuration Tool Migration

Migrated a legacy configuration tool from desktop to a web-based interface, improving engineer productivity and release velocity.

  • 15% increase in engineer productivity
  • Faster and more frequent update cycle
  • Built with ASP.NET and modern web stack
ASP.NET.NET FrameworkHTMLCSS
Worth noting: The main issue wasn't the software — it was that updates required remembering to ship them, so they rarely happened. Moving to web made deployment automatic and removed the per-machine installation problem. The 15% productivity gain is mostly that friction gone.