Project Deep Dive

Byte-Brain 🧠

Offline Static PE Malware Scanner with Explainable ML

PythonScikit-learnPEfileKali LinuxVirtualBox

About

A local, privacy-first malware analysis tool for Windows PE files. Uses a Random Forest classifier (EMBER-2018) for probability-based risk assessments.

Key Features

  • Zero-Cloud Dependency
  • Static PE Analysis
  • Explainable Predictions (LOW / MEDIUM / HIGH)
  • Batch Intelligence
  • Operational Security

How It Works

> Feature Engineering

  • Structural (machine type, sections)
  • Entropy (detecting packing)
  • Import signals (suspicious DLLs like ws2_32.dll)

> Machine Learning Pipeline

Balanced corpus of 10k EMBER-2018 samples. Random Forest classifier achieving ~97% accuracy.

> Future Improvements

Feature importance visualization, JSON/CSV export, YARA hints, ensemble models.

Installation

git clone https://github.com/Shrey42-dot/Byte-Brain.git
cd Byte-Brain
python3 -m venv bb-env
source bb-env/bin/activate   # Windows: bb-env\Scripts\activate
pip install -r requirements.txt

Project Structure

byte-brain/
├── byte_brain/
│   └── __main__.py        # CLI Entry Point
├── extractor/
│   └── feature_extractor.py # Custom PE feature extraction
├── model/
│   ├── byte_brain_rf.joblib    # Serialized Random Forest model
│   └── infer.py                # Inference engine
└── samples/                    # Safe PE samples

Example Output

Representative terminal output from a benign sample scan.

Terminal Output

Live Demo Simulation

Interactive terminal — run the bundled benign demo or simulate an upload (web-safe fallback).

> SYSTEM NOTICE: This interface is currently a simulation demonstrating the expected output of the analysis pipeline. To execute the live machine learning model, please download the working tool from GitHub. Full browser-based execution is coming soon.

kali@byte-brain:~/byte-brain
$