Jessica Kuo
Jessica Kuo
Data & ML Engineer
Available for new roles

Data & Machine
Learning
Engineer

Building scalable, reliable data and ML systems — end-to-end, from ingestion to production. Focused on healthcare IT and research applications where correctness matters.

About

End-to-end ML,
built to last.

I'm a Data & Machine Learning Engineer with a focus on building systems that are correct, maintainable, and production-ready. My work spans the full ML lifecycle — data ingestion and modeling, training pipelines, deployment, and analytics.

I've designed end-to-end ML pipelines including computer vision systems and multi-modal data platforms, supporting both rapid experimentation and large-scale production workloads. I care deeply about reproducibility, observability, and shipping things that actually work.

I'm especially drawn to problems at the intersection of large-scale data systems and healthcare IT — where correctness and reliability have real-world impact.

🐍
Languages
Python · SQL
🗄️
Data Stack
Snowflake · dbt · Fivetran
📦
Infrastructure
Docker · CI/CD · Cloud
🔬
ML Focus
Computer Vision · NLP · Multi-modal · Healthcare AI
🏥
Domain Interest
Healthcare IT · Research · Clinical Data
Skills

The full stack,
top to bottom.

Data Engineering
  • Data Ingestion & Pipelines
  • Data Modeling (dbt)
  • Snowflake / Data Warehousing
  • Fivetran / ELT Orchestration
  • SQL Optimization
Machine Learning
  • End-to-End ML Pipelines
  • Computer Vision Systems
  • Multi-modal Data Platforms
  • Model Training & Evaluation
  • Experiment Tracking
MLOps & Deployment
  • Model Deployment & Serving
  • Containerization (Docker)
  • CI/CD Pipelines
  • Reproducible Systems
  • Production Monitoring
  • MLflow learning
Analytics
  • Analytics Engineering
  • Data Quality & Testing
  • Metrics Design
  • Python (pandas, numpy)
  • Visualization
Projects

Selected work.

Computer Vision · Animal Behaviour · Research

Video Annotation & Auto-Annotation for Animal Behaviour Studies

Built an end-to-end Python pipeline for AgriGates to automate calf behaviour monitoring from video footage — replacing manual observation at scale. The system detects individual calves, tracks their movements using BotSort, and classifies behaviour (standing vs. lying) using a custom-trained ResNet-18 classifier. The final state classification model achieved >90% accuracy and F1 score in testing.

PythonPyTorchYOLOv8 ResNet-18BoT-SORT
Machine Learning · Healthcare · Classification

Predicting Diabetes in Pima Indian Women Using Logistic Regression

Built a logistic regression classifier with hyperparameter optimization to predict diabetes onset from clinical features including glucose levels, BMI, and pregnancies. The model achieved 75% accuracy on the test set, outperforming the baseline by ~8%. Glucose was the strongest predictor, followed by BMI and pregnancies. Analysis highlighted the clinical importance of reducing false negatives to avoid delayed diagnoses. View the full report here.

Pythonscikit-learnpandas numpyAltairMatplotlib Quartopytest
Open Source · Software Engineering · Data Structures

datastructpy — A Python Package

Co-developed and shipped an open-source Python library providing clean, practical implementations of essential data structures — including Binary Search Trees — designed for interview prep, coding challenges, and education. Taken from concept to a fully published PyPI package in under a month, with automated testing, comprehensive documentation, and a robust CI/CD pipeline.

PythonPoetrypytest PyPIToxGitHub Actions CodecovSphinx
Mental Health · Topic Mining · NLP · Research · Unsupervised ML

Mental Health App Review Topic Mining

Explored NLP and unsupervised machine learning to automate analysis of mental health app user reviews at scale. Scraped 63,474 reviews across 157 apps from the Google Play Store, then applied TF-IDF vectorization and k-means clustering to surface high-level topics from user opinions. Compared ten models with varying pre-processing rules; the best model identified 6 distinct topics from all reviews, 6 from negative reviews, and 9 from positive reviews. Results suggest the approach is a promising complement to traditional qualitative analysis for rapid, large-scale review mining.

PythonNLPscikit-learn TF-IDFK-Means ClusteringText Mining Web ScrapingUnsupervised ML
Data Visualisation · Dashboard · Interactive

World Happiness Dashboard

Co-authored an interactive dashboard visualising the World Happiness Dataset (2020–2024), helping users explore economic and political factors that influence happiness across countries and continents. Users can filter by GDP, perception of corruption, and other criteria to compare nations — with a practical focus on identifying potential immigration destinations based on personal priorities.

PythonDashAltair pandasGeoPandasscikit-learn MatplotlibRender

✦ More projects coming soon — some work is under NDA.

Contact

Let's build
something great.

I'm open to roles centred on large-scale data systems and production ML — particularly in healthcare IT and research where the work has meaningful real-world impact.

Whether you have a role, a project, or just want to chat about data systems — I'd love to hear from you.