Command Palette

Search for a command to run...

GitHub

CyberIntel Agent: Automated CVE Analysis System

09.2024Present

Overview

CyberIntel Agent is an automated CVE analysis system that scans NPM and Python library vulnerabilities, generating AI-powered fix recommendations using semantic similarity and few-shot learning techniques.

Development teams face a constant challenge keeping dependencies secure. When a new CVE drops affecting a common library, the questions are immediate: Does this affect us? What versions are vulnerable? What should we update to? CyberIntel addresses this by automatically analyzing vulnerabilities in the context of NPM and Python ecosystems, providing actionable fix recommendations rather than generic threat descriptions.

Demo

CyberIntel Agent Workflow

CyberIntel Agent processing CVE feeds: automated ingestion via AWS Lambda, LLM-powered analysis with fix recommendations, and Next.js dashboard for vulnerability tracking.

Technologies Used

AWS Lambda, S3, EventBridge, EC2, Docker, Next.js, AWS Cognito, RDS PostgreSQL, CloudWatch, LoRA, vLLM, 8-bit Quantization, Semantic Similarity, Few-shot Learning, NVD, CISA, MITRE ATT&CK

Serverless ETL Pipeline

Goal

Build an automated pipeline that continuously monitors CVE sources (NVD, CISA, MITRE ATT&CK) and processes new vulnerability disclosures with minimal infrastructure overhead.

Approach

We engineered a serverless ETL pipeline using AWS Lambda functions orchestrated by EventBridge schedules. New CVE data is ingested, normalized, and stored in S3 for processing. The Lambda functions handle data transformation and deduplication, ensuring consistent schema across different threat feed formats.

EventBridge triggers ensure the pipeline runs on schedule without manual intervention, while S3 provides durable storage for both raw feeds and processed results.

Considerations

Traditional server-based pipelines require constant maintenance and scaling decisions. A serverless architecture with Lambda and EventBridge eliminates infrastructure management overhead while automatically scaling with ingestion volume. This lets us focus on the analysis logic rather than server operations.

Results & Impact

The system continuously ingests CVE updates from multiple sources with zero infrastructure maintenance. The serverless architecture scales automatically during high-volume disclosure periods and costs nothing when idle.

LLM-Powered Vulnerability Analysis

Goal

Build an analysis system that reads CVE descriptions and generates fix recommendations specific to NPM and Python library vulnerabilities, including safe version ranges and migration guidance.

Approach

We fine-tuned an open-source LLM using LoRA (Low-Rank Adaptation) on security-specific data with emphasis on package ecosystem context. The model uses semantic similarity to match CVEs against known vulnerability patterns and few-shot learning to generalize fix recommendations to novel vulnerabilities.

For memory efficiency, we applied 8-bit quantization, achieving 50% memory reduction compared to FP16 baseline. This enables deployment on cost-effective EC2 instances while maintaining quality. The model is served via vLLM with dynamic batching for high throughput.

Considerations

Generic CVE descriptions don't tell developers what to actually do. A vulnerability in lodash < 4.17.21 needs specific guidance: "Update to 4.17.21 or later, test for breaking changes in _.template()." Semantic similarity helps the model understand that similar vulnerabilities have similar fixes, while few-shot learning enables accurate recommendations even for newly discovered vulnerability patterns.

Results & Impact

The system achieves 90% precision on novel CVE analysis with sub-second inference latency. Fix recommendations are specific to the affected package ecosystems, giving developers actionable guidance rather than generic threat descriptions.

Production Infrastructure

Goal

Build a production-ready system with proper authentication, data persistence, and monitoring that can serve as a reliable tool for development teams.

Approach

The system runs on AWS EC2 with Docker containers for consistent deployment. The frontend is built with Next.js, providing a responsive interface for vulnerability search and analysis. AWS Cognito handles user authentication, ensuring only authorized users access the system.

Vulnerability data and analysis results persist in RDS PostgreSQL, enabling historical queries and trend analysis. CloudWatch provides comprehensive monitoring with alerting for system health issues.

Considerations

A security tool must itself be secure and reliable. AWS Cognito provides enterprise-grade authentication without building custom auth. RDS PostgreSQL ensures data durability with automated backups. CloudWatch monitoring catches issues before they affect users.

Results & Impact

The production system maintains 99.5% uptime SLA. Users can reliably access vulnerability analysis when they need it, with proper access controls and audit trails.

Monitoring & Dashboard

Goal

Build a Next.js dashboard that displays vulnerability analytics, search functionality, and fix recommendations in a developer-friendly interface.

Approach

The dashboard connects to the RDS PostgreSQL database and presents:

The Next.js frontend provides responsive performance with server-side rendering for initial load speed.

Considerations

Developers need to quickly answer "does this CVE affect my dependencies?" The dashboard optimizes for this query pattern, with search as the primary interaction. Fix recommendations are prominently displayed alongside vulnerability details.

Results & Impact

Development teams can search for vulnerabilities and get actionable fix recommendations in seconds. The dashboard serves as a single source of truth for dependency security decisions.

How It All Comes Together

CyberIntel Agent demonstrates how combining serverless data pipelines with fine-tuned LLMs creates tools that directly help development teams secure their dependencies.

The architecture has three layers: ingestion (AWS Lambda ETL pipeline), analysis (LoRA fine-tuned model with semantic similarity and few-shot learning), and presentation (Next.js dashboard with Cognito auth).

Key insights from building this:

Package ecosystem context matters. Generic CVE analysis doesn't help developers. By focusing on NPM and Python libraries specifically, the system generates fix recommendations that are immediately actionable: specific version numbers, migration notes, and compatibility warnings.

Serverless for data pipelines, containers for ML. Lambda handles bursty ETL workloads efficiently, while EC2 with Docker provides the GPU resources and consistent environment needed for model inference. Each component uses the right compute model for its workload.

Production reliability enables adoption. A security tool that's down when you need it is useless. The 99.5% uptime SLA, proper authentication, and comprehensive monitoring mean teams can rely on CyberIntel as part of their security workflow.

CyberIntel shows that ML-powered security tools are most effective when they're specific to a problem domain (package vulnerabilities), integrated with the right infrastructure (AWS services), and reliable enough for production use.