Security & Compliance
=====================

This page describes the security posture of the Scaleout Helm chart (``charts/scaleout``) and
its reference GKE deployment: the controls it implements and enforces, their secure defaults,
how to verify them, and the residual risks.

The chart is **secure by default** — workload hardening, secret management, resource governance
and input validation are enabled out of the box, and the chart runs under the Kubernetes Pod
Security Standards *restricted* profile. Network segmentation and edge TLS are a single flag each.

Architecture & trust boundaries
-------------------------------

.. code-block:: text

   (TLS, cert-manager)         ┌──────── Kubernetes namespace ────────┐
   Edge clients ─────────────► │  Ingress (nginx)                      │
   Browsers / CLI / FL clients │   /api  /  /kratos /hydra (public)    │
                               │     │                                 │
   Edge FL clients ── gRPC ──► │  combiner ◄─► controller ◄─► hooks    │
   (JWT-authenticated)         │     api-server                        │
                               │     │  (NetworkPolicy: default-deny)   │
                               │     ▼                                 │
                               │  postgres / mongo / minio (data tier) │
                               └───────────────────────────────────────┘

* **Cluster edge** — all external HTTP enters via the Ingress over TLS. The Kratos/Hydra
  *public* APIs are exposed; their *admin* APIs are reachable only in-cluster.
* **Authentication** — when enabled, browser/API auth is enforced by Ory Kratos (identities /
  sessions) and Hydra (OAuth2/OIDC); gRPC endpoints enforce JWT.
* **External FL clients → combiner** — gRPC entering from outside the cluster, authenticated by
  JWT when auth is enabled. FL clients using the EdgeClient connect with TLS; ensure the combiner
  ``secureMode`` is configured accordingly.
* **Intra-cluster / data tier** — segmented by NetworkPolicies (default-deny ingress).

Controls
--------

Framework references: Pod Security Standards (PSS), the CIS Kubernetes Benchmark (CIS K8s),
SOC 2 Trust Services Criteria, and ISO/IEC 27001:2022 Annex A.

.. list-table::
   :header-rows: 1
   :widths: 34 10 56

   * - Control (as implemented)
     - Default
     - Framework mapping
   * - Run as non-root (``runAsNonRoot``, ``runAsUser`` ≠ 0) — all workloads
     - on
     - PSS Restricted; CIS K8s 5.2.6; SOC 2 CC6.1/CC6.3; ISO A.8.2/A.8.3
   * - No privilege escalation (``allowPrivilegeEscalation: false``)
     - on
     - PSS Restricted; CIS K8s 5.2.5; SOC 2 CC6.1; ISO A.8.2
   * - Drop ALL Linux capabilities
     - on
     - PSS Restricted; CIS K8s 5.2.8/5.2.9; SOC 2 CC6.1; ISO A.8.2
   * - Seccomp ``RuntimeDefault`` — all pods
     - on
     - PSS Restricted; SOC 2 CC6.1/CC6.8; ISO A.8.2/A.8.31
   * - Read-only root filesystem — app containers (writable scratch via emptyDir)
     - on [1]_
     - PSS Restricted (rec.); SOC 2 CC6.1/CC6.8; ISO A.8.2
   * - No service-account token automount
     - on
     - CIS K8s 5.1.5/5.1.6; SOC 2 CC6.1/CC6.3; ISO A.8.2/A.8.3
   * - Network segmentation — default-deny ingress + least-privilege allows
     - opt-in [2]_
     - CIS K8s 5.3.2; SOC 2 CC6.1/CC6.6; ISO A.8.20/A.8.22
   * - TLS in transit at the edge — cert-manager-issued Ingress certificates
     - opt-in
     - SOC 2 CC6.7; ISO A.8.24
   * - Authentication & authorization — Ory Kratos + Hydra (OIDC/OAuth2), gRPC JWT
     - opt-in
     - SOC 2 CC6.1/CC6.2/CC6.3; ISO A.5.15/A.5.17/A.8.5
   * - Secrets management — ``secretKeyRef``; generated+retained or ``existingSecret``;
       per-install Kratos JWKS; admin password never in plaintext env;
       external-secrets / sealed-secrets compatible
     - on
     - CIS K8s 5.4.1; SOC 2 CC6.1/CC6.3; ISO A.8.24/A.5.10
   * - Resource requests/limits — every container
     - on
     - SOC 2 A1.1 (Availability); ISO A.8.6
   * - Supply-chain integrity — optional image digest pinning; CI SAST (CodeQL),
       image scanning (Trivy), SBOM (Syft) + vulnerability report (Grype)
     - partial [3]_
     - SOC 2 CC7.1/CC8.1; ISO A.8.8/A.8.28/A.8.30
   * - Input validation — ``values.schema.json`` validates types/enums at install
     - on
     - SOC 2 CC8.1; ISO A.8.25
   * - Least-exposure ingress — only public auth APIs routed; admin APIs in-cluster only
     - on
     - SOC 2 CC6.6; ISO A.8.20/A.8.21

.. [1] The core read-only root filesystem is smoke-tested but not exercised by a full
   federated-training round; it can be disabled per workload if other writable paths are needed.
.. [2] Requires a NetworkPolicy-enforcing CNI (the reference GKE cluster uses Dataplane V2);
   enable with ``networkPolicy.enabled=true``.
.. [3] Digest pinning is opt-in (``image.coreDigest`` / ``image.frontendDigest``);
   scanning/SBOM/SAST run in the project's CI pipelines, not the chart.

Secure defaults & configuration
--------------------------------

.. list-table::
   :header-rows: 1
   :widths: 30 28 42

   * - Concern
     - Default
     - How to harden / enable
   * - Pod hardening
     - enforced
     - n/a (on for all workloads)
   * - NetworkPolicies
     - off
     - ``networkPolicy.enabled=true`` (needs a policy CNI)
   * - Edge TLS
     - off (``http``)
     - ``global.protocol=https`` + ``ingress.certManager.clusterIssuer``
   * - Authentication
     - per chart default
     - Kratos/Hydra; set ``auth.admin.*`` to bootstrap an admin
   * - Secrets
     - auto-generated + retained
     - ``secrets.existingSecret`` with external-secrets / sealed-secrets
   * - Image pinning
     - tags
     - ``image.coreDigest`` / ``image.frontendDigest``
   * - Bundled data backends
     - in-cluster (dev)
     - point at managed/hardened services (``*.deploy=false``)

Data protection
---------------

* **In transit (edge):** TLS terminates at the Ingress (cert-manager); auth admin APIs are never
  exposed externally.
* **In transit (intra-cluster):** currently plaintext within the cluster; transparent
  pod-to-pod and database mTLS is on the roadmap. NetworkPolicies restrict reachability in the
  meantime.
* **At rest:** persistent data (Postgres/Mongo/MinIO) is on PersistentVolumes — encryption at
  rest is provided by the cluster/cloud storage class (e.g. GKE encrypts persistent disks by
  default). Enable etcd encryption at rest / CMEK at the cluster level for Secrets. For
  production, prefer managed data backends and an external secret store over the bundled ones.
* **Secrets exposure:** credentials are referenced via ``secretKeyRef`` / ``envFrom``, never
  baked into manifests or images.

Verification
------------

.. code-block:: sh

   # Restricted securityContext on every workload
   kubectl -n <ns> get pods -o jsonpath='{range .items[*]}{.metadata.name}{": runAsNonRoot="}{.spec.securityContext.runAsNonRoot}{"\n"}{end}'

   # No service-account token mounted
   kubectl -n <ns> get pod <pod> -o jsonpath='{.spec.automountServiceAccountToken}'   # false

   # NetworkPolicy enforcement — a pod in another namespace is blocked from the data tier
   kubectl -n other run probe --image=busybox --restart=Never --command -- \
     sh -c 'nc -w5 -z <release>-postgres.<ns> 5432 && echo OPEN || echo BLOCKED'      # BLOCKED

   # TLS certificate issued for the host
   kubectl -n <ns> get certificate

   # Automated end-to-end (auth off + on, helm test, cross-namespace block)
   charts/scaleout/test/e2e.sh --mode both

The chart's ``helm test`` hook and ``test/e2e.sh`` are run during release validation and are
reproducible on any cluster (validated on GKE with Dataplane V2).

Residual risks & roadmap
------------------------

* **Intra-cluster traffic (including database connections) is not encrypted.**
  NetworkPolicies limit reachability; transparent mTLS via a service mesh is planned.
* **External FL client → combiner gRPC transport** — authenticated by JWT (when auth is enabled);
  EdgeClient connects with TLS by default; ensure combiner ``secureMode`` matches.
* **Core read-only root filesystem** is smoke-tested but not validated under a full
  federated-training round; a toggle is available (``securityContext.readOnlyRootFilesystem``).
* **Secret rotation** — generated secrets are intentionally retained across upgrades. To rotate:
  update the value in the chart-managed Secret directly, or use ``secrets.existingSecret`` with
  an external secret manager (external-secrets / sealed-secrets).
* **Bundled data backends are development-grade** — use managed/hardened backends in production.

Supply chain & vulnerability management
---------------------------------------

* Images can be pinned by digest for immutability.
* CI pipelines run CodeQL (SAST), Trivy (container image scanning), and Syft (SBOM) + Grype
  (vulnerability report) published per release.
* The chart consumes images from a private registry; restrict pulls with ``imagePullSecrets``.

The same content is maintained in the chart repository as ``charts/scaleout/SECURITY.md`` for
engineers working in the codebase.