Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Security

Containers don’t isolate themselves; you isolate them. Perry exposes the standard OCI security knobs on both ContainerSpec (single-container) and ComposeService (orchestrated stacks), plus first-party support for Sigstore / cosign image verification and a workload-graph policy tier API for declarative isolation levels.

Per-container security knobs

The same set of fields work on run(), create(), and any service in a compose up():

FieldTypeEffectCross-backend
read_onlybooleanMount the root filesystem as read-only. Forces all writable state to be in declared volumes.All backends
privilegedbooleanRun privileged: grants ALL Linux capabilities + access to host devices. Avoid unless absolutely necessary.Docker / Podman / Lima only — apple/container has no concept and drops the field with a warning
userstringUID, username, or "UID:GID" — runs the container’s processes as that identity. The image’s CMD ignores this if it does its own user-switching, but most properly-built images respect it.All backends
workdirstringWorking directory inside the container.All backends
cap_addstring[]Linux capabilities to add. Specific (e.g. ["NET_BIND_SERVICE"]), not blanket.All backends
cap_dropstring[]Capabilities to drop. ["ALL"] is the canonical “drop everything” starting point.All backends
seccompstringSeccomp profile path or "default" (uses the runtime’s default profile).Docker / Podman / Lima only — apple/container has no equivalent and drops the field with a warning

⚠️ Cross-backend security caveat. privileged, seccomp, --security-opt no-new-privileges, IPC/PID namespace sharing, and SELinux mount labels are not honored on apple/container — its Apple-VM model means those concepts don’t translate. Perry’s normalization pass drops the fields and emits a tracing::warn! rather than silently downgrading the security policy. For production deployments that demand cross-backend parity, set EnforcementMode::Strict on the engine — any unsupported security field becomes a hard up() failure rather than a silent drop. Full matrix at Cross-Backend Determinism.

Start with maximum isolation and add back only what the workload needs:

import { run as runSecure } from "perry/container";

// Maximum-isolation single-container run for an untrusted workload:
//   - read-only root filesystem
//   - no Linux capabilities at all
//   - non-root user
//   - working directory pinned
//   - default seccomp profile
async function runUntrustedWorkload(): Promise<void> {
    await runSecure({
        image: "alpine:3.19",
        cmd: ["sh", "-c", "echo isolated && exit 0"],
        read_only: true,
        cap_drop: ["ALL"],
        user: "nobody",
        workdir: "/tmp",
        seccomp: "default",
    });
}

Field-by-field rationale:

  • read_only: true — even an exploit that lands code execution can’t persist to the image’s filesystem. Anything mutable goes into a declared volume.
  • cap_drop: ["ALL"] — removes Linux capabilities the workload didn’t explicitly ask for. Most apps need none.
  • user: "nobody" — non-root inside the container. If the image doesn’t have a nobody user, replace with "65534:65534" (the numeric UID/GID of nobody on most distros).
  • workdir: "/tmp" — the only writable location under read_only: true is /tmp (which is tmpfs-backed by default).
  • seccomp: "default" — uses docker’s default seccomp profile (~50 syscalls blocked).

Capability addition patterns

cap_drop: ["ALL"] plus targeted cap_add:

WorkloadCapabilities
Web server binding to port 80/443cap_add: ["NET_BIND_SERVICE"]
Network namespace manipulationcap_add: ["NET_ADMIN"]
Kernel time settingcap_add: ["SYS_TIME"]
chown to other users (rare)cap_add: ["CHOWN"]
Bind-mount filesystems insidecap_add: ["SYS_ADMIN"] (still avoid if possible)

The full capability list is in man capabilities(7). Always start with cap_drop: ["ALL"] and add only what fails when removed — most applications need zero capabilities.

Image verification

Set PERRY_CONTAINER_VERIFY_IMAGES=1 to enable cosign keyless verification on every run(), create(), and pullImage() call:

export PERRY_CONTAINER_VERIFY_IMAGES=1
./my-app

Perry’s verifier:

  1. Resolves the image tag to its digest via inspect_image.
  2. Looks up the digest in an in-memory VERIFICATION_CACHE — subsequent runs against the same digest are free.
  3. Runs cosign verify --certificate-identity ${CHAINGUARD_IDENTITY} --certificate-oidc-issuer ${CHAINGUARD_ISSUER} <ref>@<digest> and caches pass/fail.
  4. On fail, the FFI rejects with a verification failed error (the container is never created).

Default identity / issuer point at Chainguard’s keyless signing flow:

ConstValue
CHAINGUARD_IDENTITYhttps://github.com/chainguard-images/images/.github/workflows/sign.yaml@refs/heads/main
CHAINGUARD_ISSUERhttps://token.actions.githubusercontent.com

For your own org’s images, override these via the (planned) per-call verification options. For now, using Chainguard-signed base images is the path of least resistance — cgr.dev/chainguard/<tool> is signed.

Cosign required. Set PERRY_CONTAINER_VERIFY_IMAGES=1 only when cosign is installed and on PATH. The verification is OFF by default so the bare-metal ./my-app execution doesn’t depend on a separate cosign install.

Capability sandbox helper

For one-off command execution against an untrusted image (CI helper, build tool, code-evaluation sandbox), use the run_capability pattern which wraps run() with the maximum-isolation defaults:

  • read_only: true
  • cap_drop: ["ALL"]
  • No network attached
  • user: "nobody"
  • Image verified via cosign before pull

This is the same path the internal perry-stdlib::container::capability module uses for shell-command sandboxing in plugin systems.

Workload-graph policy tiers (perry/workloads)

For multi-node deployments where different workloads have different trust levels, the workload-graph engine accepts a per-node policy:

import { graph, runGraph, runtime, policy } from "perry/workloads";

const g = graph("my-app", {
  trusted_db:    { image: "postgres:16-alpine",
                   runtime: runtime.oci(),
                   policy:  policy.default() },        // no extra hardening

  isolated_api:  { image: "myapp/api",
                   runtime: runtime.oci(),
                   policy:  policy.isolated() },       // no_network=true

  hardened_proxy: { image: "myapp/proxy",
                    runtime: runtime.oci(),
                    policy:  policy.hardened() },      // read_only_root + seccomp

  untrusted_eval: { image: "myapp/sandbox",
                    runtime: runtime.microvm(),         // ← required by tier
                    policy:  policy.untrusted() },     // microVM-only, all hardening on
});

await runGraph(g);

The four PolicyTier levels and what they enforce:

Tierno_networkread_only_rootseccompmicrovm
default()
isolated()
hardened()
untrusted()required

untrusted requires kernel-level isolation (i.e. a microVM, not a shared-kernel container). When the active backend doesn’t expose a microVM runtime (apple/container’s VM mode, Lima, Firecracker), the engine returns BackendNotAvailable rather than silently dropping the isolation guarantee. Use PERRY_ALLOW_UNTRUSTED_SHARED_KERNEL=1 to opt out — not recommended for actually-untrusted code.

User-explicit per-flag overrides on top of a tier are honored: setting policy.tier = "default" and no_network: true produces an isolated-network default-tier node.

Defense in depth

Stacking patterns for production:

  1. Verify images (PERRY_CONTAINER_VERIFY_IMAGES=1).
  2. Run as non-root (user: "nobody" or numeric UID).
  3. Drop all capabilities, add specific ones back (cap_drop: ["ALL"] + minimal cap_add).
  4. Read-only root filesystem (read_only: true).
  5. Internal networks for the database side (internal: true on the db’s network — see Networking).
  6. No published ports for private services (omit ports: on internal-only services).
  7. Resource limits (planned: mem_limit, cpu_limit on Service).

See also