Domen Perko
Back to projects

MBajk – Bike Stand Availability Prediction

Machine learning web application for predicting available bike stands at MBajk stations using GRU neural networks

Jun 2024
ReactFastAPITensorFlowMLflowDVC

Overview

A web application for predicting the number of available bike stands at MBajk bike stations. The prediction service is based on a recurrent neural network model (GRU - Gated Recurrent Unit). This project implements machine learning pipelines for continuous data fetching and model training to ensure that the prediction models are constantly improving over time.

Key Features

  • Real-time bike stand availability predictions
  • Historical data analysis and pattern recognition
  • Continuous model training and improvement
  • Optimized inference with ONNX Runtime
  • Model quantization for efficient deployment

Tech Stack

Frontend:

  • React
  • Next.js
  • TypeScript

Backend & ML:

  • FastAPI (REST API)
  • TensorFlow (Model development)
  • ONNX Runtime (Production inference)
  • MLflow (Experiment tracking)
  • DVC (Data version control)
  • Docker (Containerization)

Background

As a cyclist in Maribor, I frequently used the MBajk bike-sharing system. However, I often encountered a frustrating problem: arriving at a station only to find no available bikes, or reaching my destination with no empty stands to return the bike.

This real-world inconvenience sparked the idea for this project. I wanted to build a system that could predict bike stand availability in advance, allowing users to plan their trips more effectively and reduce the uncertainty of bike-sharing usage.

The challenge was not just about building a prediction model, but creating an entire ML system that could:

  • Continuously collect real-time data from MBajk stations
  • Train and retrain models as patterns change over time
  • Serve predictions quickly and reliably to users
  • Deploy efficiently with minimal resource usage

Implementation

Data Pipeline

  1. Data Collection: Built automated pipelines to continuously fetch real-time data from MBajk API
  2. Data Storage: Stored historical data with version control using DVC
  3. Feature Engineering: Extracted temporal features (time of day, day of week, weather patterns)
  4. Data Preprocessing: Normalized and transformed data for neural network training

Model Development

The core prediction engine uses a GRU (Gated Recurrent Unit) neural network, which is particularly effective for time-series forecasting:

  • Architecture: Multi-layer GRU with dropout for regularization
  • Training: Supervised learning on historical availability patterns
  • Validation: Time-series cross-validation to prevent data leakage
  • Tracking: MLflow for experiment tracking and model versioning

Optimization & Deployment

ONNX Runtime Integration: Converted TensorFlow models to ONNX format for faster inference:

  • 3x faster prediction times compared to native TensorFlow
  • Cross-platform compatibility
  • Reduced memory footprint

Model Quantization: Applied quantization techniques to reduce model size:

  • Converted weights from float32 to int8
  • 75% reduction in model size
  • Minimal accuracy loss (less than 2%)
  • Significantly faster inference on CPU

Continuous Training: Implemented automated retraining pipelines:

  • Daily data collection from MBajk API
  • Weekly model retraining on new data
  • Automatic model deployment if performance improves
  • A/B testing for model comparison

API & Frontend

FastAPI Backend:

  • RESTful endpoints for predictions
  • Real-time station status
  • Historical trend analysis
  • Model performance metrics

Next.js Frontend:

  • Interactive station map
  • Real-time availability display
  • Prediction visualization
  • Historical charts and trends

Results

The system successfully predicts bike stand availability with:

  • Prediction Accuracy: 85% accuracy for 30-minute forecasts
  • Response Time: Less than 50ms average API response time
  • Model Size: 2.5MB (after quantization, down from 10MB)
MBajk app interface showing bike stand availability predictions on a map
  • Deployment: Fully containerized with Docker for easy scaling

GitHub Repository: mbajk-ml-web-service

Lessons Learned

ML Pipeline Automation: Building robust, automated pipelines is as important as the model itself. The ability to continuously retrain and deploy models ensures long-term system reliability.

Production Optimization: ONNX Runtime and model quantization proved crucial for production deployment. These optimizations made the difference between a slow, resource-heavy service and a fast, efficient one.

Data Quality Matters: Garbage in, garbage out. Investing time in proper data collection, cleaning, and feature engineering significantly improved model performance.

Version Control for ML: Using DVC for data versioning and MLflow for experiment tracking transformed the development workflow, making it easy to reproduce results and compare experiments.

Real-World Testing: Theoretical accuracy metrics don't always translate to user satisfaction. Testing with real usage patterns revealed edge cases that required additional model refinement.

Continuous Improvement: ML systems need ongoing maintenance. Pattern changes over time (seasonal variations, new stations, policy changes) require regular model updates to maintain accuracy.