.. _architecture-label: Architecture overview ===================== Scaleout Edge is built on a resilient, three-tier architecture designed to separate the **Control Plane** (Orchestration & Governance) from the **Data Plane** (Edge execution). This design ensures massive horizontal scalability and fault tolerance in unstable network environments. The architecture consists of three logical tiers: - **Tier 1:** Edge Nodes (The Data Plane) - **Tier 2:** Combiners (The Aggregation Layer and sub-local orchestrators) - **Tier 3:** Controller (The Control Plane) .. image:: img/Architecture_Overview.png :width: 80% :align: center :class: bordered | Tier 1: Edge Nodes (The Data Plane) ----------------------------------- The bottom tier consists of the **Edge Nodes**—the distributed nodes where your data resides. These can be anything from powerful on-premise servers to constrained IoT devices (e.g., NVIDIA Jetson, Raspberry Pi). **Key characteristics:** - **Sovereignty:** This is the only tier that touches raw data. Code travels down to this tier; data never travels up. - **Local Execution:** Nodes can receive a Compute Package (runtime environment + ML logic) from the Control Plane and execute it locally. Or use a local package already present on the node. - **Telemetry & Logging:** Nodes report model metrics, logs, and artifacts back to the Control Plane for monitoring and auditing. - **Secure Communication:** Nodes communicate securely with the Control Plane using egress-only connections, ensuring no ingress ports are required. - **Framework Agnostic:** Edge nodes can run models built with any ML framework (TensorFlow, PyTorch, TFLite, ONNX, etc.). Python, C++ and Kotlin client implementations are provided out-of-the-box. Tier 2: Combiners (The Aggregation Layer) ----------------------------------------- The **Combiner** is the scalability engine of the platform. It acts as an intelligent gateway and aggregator that sits between the Control Plane and the Edge Nodes. **Key responsibilities:** - **Horizontal Scalability:** You can deploy multiple Combiners to handle thousands of edge nodes. Each Combiner manages a subset of clients, distributing the load and ensuring efficient resource utilization. - **Local Orchestration:** Combiners can execute local orchestration logic, managing client participation in training rounds, handling retries, and enforcing policies defined by the Controller. - **Model Aggregation:** Combiners are responsible for aggregating model updates from their clients. This includes: - Executing the orchestration plan defined in the global **session plan** provided by the Controller. - Reducing client model updates into a single **combiner-level model**. - **Fault Tolerance:** Combiners can handle client dropouts and network issues, ensuring robust training even in unstable environments. - **Secure Communication:** Combiners maintain secure connections with both the Controller and their assigned Edge Nodes. Tier 3 — The Control Plane -------------------------- The top tier is the **Control Plane**, responsible for global orchestration, governance, and state management. It is the "brain" of the infrastructure. **Core Services:** 1. **Global orchestration** Defines the training strategy, selects clients for rounds, and pushes session configurations to the **Combiners**. 2. **Reducer** Aggregates combiner-level models into the final global model. 3. **Model Registry** Maintains the immutable Model Trail, storing every version of the global model and its associated metadata 4. **Governance & Security** Manages user access (RBAC), issues authentication tokens (JWT), and logs all system events to the Audit Trail, such as client contributions to models. 5. **Telemetry & Monitoring** Collects and analyzes metrics from the entire system, providing insights into performance, resource utilization, and potential issues. 6. **API Gateway** Exposes RESTful APIs for managing the system, allowing users to interact programmatically. 7. **User Interface** A web-based dashboard for visualizing system status, managing sessions, clients, and monitoring the state of the network. Notes on aggregation algorithms ------------------------------- Scaleout Edge includes several **built-in aggregators** for common federated learning workflows (see :ref:`agg-label`). For advanced scenarios, users may override the Combiner-level behavior using **server functions** (:ref:`server-functions`), allowing custom orchestration or aggregation logic. Aggregation happens in two stages: 1) Each Combiner reduces client updates into a *combiner-level model*, and 2) The Controller (Reducer) combines these into the final global model. .. meta:: :description lang=en: Architecture overview - An overview of the Scaleout Edge federated learning platform architecture. :keywords: Federated Learning, Architecture, Federated Learning Framework, Federated Learning Platform, FEDn, Scaleout Systems, Scaleout Edge