File Scanner 1.0.0
A high-performance C++ malicious file scanner.
|
A high-performance, multithreaded command-line utility for scanning files against a database of malicious MD5 hashes.
This project is a demonstration of modern C++ software engineering principles, designed to be robust, scalable, and maintainable. It is built with a clean, decoupled architecture to ensure high cohesion, low coupling, and excellent testability.
Made for the Kaspersky SafeBoard internship.
std::filesystem
, std::thread
, atomics, and move semantics.The scanner is designed for high throughput on modern hardware. Benchmarks were conducted on a machine equipped with an Apple M1 Pro (8 performance cores).
The benchmark consists of scanning a directory containing 100,000 files (1KB to 128KB each) with a total size of approximately 6.2 GB.
Metric | Result |
---|---|
Total Execution Time | **~19.5 seconds** |
Throughput | **~5,100 files/second** |
During the scan, the utility successfully utilizes all available CPU cores, demonstrating the efficiency of the multithreaded architecture.
To run the benchmark on your own machine, use the benchmark
target:
The project uses a standard CMake workflow. All C++ dependencies (Google Test, md5-lib) are fetched automatically.
The utility is run from the command line with three required arguments.
Launch Command:
Arguments:
--path <directory>
: The absolute or relative path to the root directory to be scanned.--base <file.csv>
: The path to the CSV file containing malicious signatures.--log <file.log>
: The path to the file where detection reports will be written.The signature database is a simple text file with one entry per line. Each line contains an MD5 hash and a verdict, separated by a semicolon.
Detections are logged in the JSON Lines (JSONL) format, which is structured and machine-readable.
After the scan is complete, a summary is printed to the console.
The project follows the principles of Clean Architecture to ensure a separation of concerns.
src/scanner_lib
(Core Library - DLL):**domain.h
):** Contains the core data structures (ScanResult
). It has no dependencies.scanner.h
, thread_pool.h
):** Contains the application-specific logic and orchestration (IScanner
, ThreadPool
). Depends only on the Domain.csv_hash_database.h
, etc.):** Contains the concrete implementations of external-facing components. It implements interfaces defined in the Application layer.src/scanner_cli
(Presentation Layer - EXE):**