How it works
aur-scan does one thing and refuses to do another: it reads a package and
everything it pulls in, and it never runs any of it. Every stage below holds
to that line — the scan cannot become the thing that executes the payload.
The pipeline
Fetch → Parse → Analyze → Resolve the tree → Report (and optional SBOM).
1. Fetch — hardened, read-only
Package metadata comes from the AUR RPC over a locked-down HTTP client: a 30s timeout, redirects refused, HTTPS-only, and the JSON body size-capped (16 MiB) so a hostile or MITM’d response can’t stream you out of memory. RPC URLs are built from validated, percent-encoded path segments — a package name can’t inject into the request.
The PKGBUILD itself is retrieved with a deliberately defanged git clone:
git -c core.hooksPath=/dev/null \
-c protocol.file.allow=never \
-c protocol.ext.allow=never \
-c core.symlinks=false \
clone --depth=1 --no-tags --no-recurse-submodules -- <url> .No hooks can fire, file:///ext:: protocols are blocked, symlinks are written
as plain files (no directory escape), submodules and tags are never fetched, and
-- means the URL can’t be read as a flag. The package name is validated before
it’s ever used as a path. Nothing is built, sourced, or evaluated.
2. Parse — text, not execution
A static parser reads the PKGBUILD and any .install script as text — pure
pattern/AST analysis, no bash evaluation. It extracts the fields, arrays
(depends, source, the checksum arrays), the source+=() appends, and the
function bodies. The brace scanner is quote- and comment-aware, so echo "}" or
# } can’t truncate a function early, and backslash-newline continuations are
spliced back together so curl evil \⏎| sh can’t slip past a single-line rule.
3. Analyze — the catalog
The parsed package is run against the authoritative detection catalog — 118 codes across 13 categories (see Detection Codes). That’s pattern rules plus structural analyzers: privilege escalation, source and transport integrity, checksum laundering, a remote-exec analyzer, a multi-line decode-and-execute pass, and an IOC match against known campaigns. You can add your own in Custom Rules.
One analyzer is opt-in and networked: with
threat intelligence enabled and your own key, declared
sha256sums are checked against VirusTotal and source= URLs against
abuse.ch/URLhaus. It is off by default — a default scan stays fully offline
and static — and even when on, every lookup fails open and only data already
public in the PKGBUILD ever leaves the machine.
4. Resolve the dependency tree
The package you name is rarely the tampered one — it’s usually something a few
levels down. aur-scan resolves the full transitive AUR dependency tree by
breadth-first walk, and a critical rule governs it:
Resolution follows only static, declared metadata (
depends/makedepends/…). It never fetches asource=artifact, follows a URL found in a PKGBUILD, or executes anything.
Every AUR package in the closure is scanned; official-repo and virtual packages
are trusted leaves. Depth and node caps stop runaway expansion. When a package
fetches and runs external code, that’s an opaque boundary — it’s flagged
(EXEC-REMOTE, with the URL) and the tree stops there. The scanner won’t chase
the link, because chasing it is exactly how it would run the payload.
The static-only invariant
The whole design rests on one provable property: the scan can’t compromise the machine doing the scanning. Across the entire core, the only subprocesses ever spawned are:
- the hardened
git cloneabove, to fetch a PKGBUILD; pacman -Si— is this name in the official repos?pacman -Qm— which foreign (AUR) packages are installed?
It never invokes makepkg, never bash -c, never eval, never sources a
PKGBUILD. The PKGBUILD is read with a plain capped file read and treated as data.
That’s the line that makes pointing it at a malicious package safe — see
Security for the full threat model.