Back to Projects
Python
FastAPI
scikit-learn
Celery
RabbitMQ
React
Vite
TailwindCSS
Docker
Vercel
Chrome Extension

Kraven The Hunter

A full-stack cybersecurity platform that uses machine learning to detect phishing, malware, and malicious URLs in real time — combining a trained ML model, a production-grade API, a React web dashboard, and a Chrome extension.

Kraven The Hunter

Project Overview

I built Kraven because the problem was real and the existing solutions were either too slow, too expensive, or too dumb. So I built my own. End to end.

The ML pipeline was the most interesting part. I engineered 14 features from raw URLs — Shannon entropy, digit ratio, special character density, IP-based hostname detection, subdomain count, path depth, URL shortener detection, @ symbol presence, double-slash redirects, domain length, and HTTPS usage. I trained a scikit-learn RandomForest classifier on a merged dataset of ~500K malicious and benign URLs. The model outputs probability-based confidence scores via predict_proba, not just binary labels — giving users a nuanced threat assessment rather than a blunt yes/no.

The FastAPI backend exposes endpoints for URL prediction, community threat reporting, and health checks. It features model hot-reloading — it detects when the .pkl file changes on disk and swaps it in without downtime. There's also a community report lookup that overrides ML predictions with crowd-sourced intelligence, which is a nice touch.

When community report counts cross a configurable threshold, the API dispatches an async Celery task to a dedicated worker process. The worker merges community-reported URLs with the original training data and retrains the model — all without blocking the API. RabbitMQ serves as the message broker between the API and the worker.

The Chrome extension (Manifest V3) has two layers of protection: a content script that automatically scans every page the user visits and redirects to a warning page if the confidence score exceeds 85%, and a popup interface where users can manually scan the current page and see the full threat report inline — without ever leaving the browser.

The whole backend is containerised with Docker Compose running three services: the FastAPI engine, the Celery worker, and RabbitMQ. The engine and worker share a Docker volume for the model file. The React frontend is deployed to Vercel as a static SPA.

Key Features

  • Real-time malicious URL detection with ML confidence scoring
  • 14-feature URL engineering pipeline (entropy, digit ratio, path depth, etc.)
  • RandomForest classifier trained on ~500K URLs with predict_proba output
  • Community threat reporting that overrides ML predictions
  • Async model retraining via Celery + RabbitMQ without API downtime
  • Model hot-reloading — swaps updated .pkl file without restart
  • Chrome Extension (Manifest V3) with auto-scan and manual popup
  • React web dashboard with detailed threat reports and one-click reporting
  • Fully containerised backend with Docker Compose (API + worker + broker)

Challenges & Solutions

Engineering meaningful features from raw URL strings without leaking label information

Chrome Manifest V3 restrictions on content script redirects — required creative workarounds

Wiring async Celery retraining so it never blocks the live API

Keeping the model hot-reloadable on disk without service restarts

Balancing ML prediction confidence thresholds to minimise false positives

Key Learnings

How to build a production Chrome Extension with Manifest V3 and its many constraints
Feature engineering for URL-based ML — entropy and structural signals are surprisingly powerful
Async task queues with Celery and RabbitMQ for background model retraining
Model hot-reloading patterns in production APIs
How crowd-sourced intelligence can meaningfully improve ML accuracy over time

Project Info

Timeline

3 months

Team

Solo project

Status

Completed

Tech Stack

ML / Backend

Python
scikit-learn
FastAPI
SQLAlchemy
SQLite

Async / Messaging

Celery
RabbitMQ

Frontend

React 18
Vite
TailwindCSS
Framer Motion
Flowbite

Extension

Chrome Extension (Manifest V3)
JavaScript

Deployment

Docker Compose
Vercel

Tools

Git
Postman
VS Code