Automated Vulnerability Report Analysis

What this is

§01

Every quarter, the enterprise security team received a stack of third-party vulnerability reports — PDFs from penetration testers, CSV exports from vendor portals, screenshots from auditors. Each had to be opened, read, cross-referenced against the company's internal control baselines, then triaged into a remediation plan.

A senior analyst would spend the best part of a day on each report. Three senior analysts. Once a quarter. And the prioritisation logic was different every time, depending on who had the pen.

The brief was tight: replace that bottleneck with something machine-driven, repeatable, and defensible. Crucially the output had to be the same shape the team already trusted — a ranked remediation table with cited evidence — not a chatbot or a dashboard.

The pipeline

§02

The agent runs end-to-end as a Dify flow with a Python ingestion pre-step and an n8n workflow handling the upstream/downstream plumbing. The shape is intentionally boring:

1 · Ingest (n8n): An n8n workflow watches the secure inbox / SFTP drop where third parties land their PDFs and CSVs. Each new file triggers a Python pre-processor that extracts structured findings — CVSS scores, affected assets, vendor recommendations — and strips PII before anything touches the LLM.
2 · Retrieve (Dify): The Dify agent embeds each finding and looks it up against the vectorised internal control library (~600 controls across ISO 27001 + bespoke). Top-5 matches by cosine similarity feed the reasoning prompt as cited context.
3 · Reason (Dify + OpenAI): Dify's structured-output mode composes a remediation entry per finding — owner team, severity, suggested fix, evidence citations back to the source PDF and to the control library. Strict JSON schema, no free-form HTML.
4 · Deliver (n8n → Archer): An n8n step pushes the JSON manifest into RSA Archer via its REST API, creating risk records with the LLM-generated remediation plan pre-filled. A parallel branch emails the markdown summary to the analyst who would have done the work by hand.

Why RAG (and not fine-tuning)

§03

The control library moves. New controls land every quarter, existing ones get re-worded after audits, deprecated ones get flagged for removal. Fine-tuning would have locked the model into a snapshot — every change would require a re-train.

Retrieval keeps the source-of-truth in the vector store, not in the weights — updates are an embeddings refresh, not a re-train.
Citations come for free: every recommendation links back to the specific control record that justified it. Auditable by design.
The team can A/B different control phrasings without touching the model.
Costs predictable — a fixed embeddings spend plus per-report inference, no training run to budget.

What it ships, quantified

§04

90%+

Manual time saved per report

100%

Of findings cite a source control

Senior analysts freed for higher-value work

The headline number understates the second-order effect. Standardised prioritisation meant downstream teams could trust the queue — the remediation engineers stopped re-relitigating severity for every ticket and started actually fixing things.

Stack

§05

Orchestration

Dify — visual flow, prompt management, structured-output mode
n8n — file-drop trigger, Python invocation, delivery to Archer
Python pre-processor for PDF/CSV → structured findings

Models

OpenAI GPT-4 (reasoning over retrieved context)
OpenAI text-embedding-3-large (vectorisation of controls + findings)

Retrieval

Vector store keyed to the internal control library
Cosine similarity top-K with score thresholding
Citations back to the source PDF + control record

Delivery

RSA Archer (risk records via REST API)
Markdown report → analyst inbox via n8n SMTP node

What I'd do differently

§06

Stand up the evaluation harness from day one. We bolted it on after the first wave of feedback — should have been baked in.
Treat the control library as a typed schema, not free-text. A few migrations of structured fields would have cut prompt size by 40%.
Add a human-in-the-loop sign-off for high-severity findings before the JSON hits the GRC API. The team trusts the agent now, but they didn't at week one — and that's where a small gate would have built trust faster.