Architecture overview
Scaleout Edge is built on a resilient, three-tier architecture designed to separate the Control Plane (Orchestration & Governance) from the Data Plane (Edge execution). This design ensures massive horizontal scalability and fault tolerance in unstable network environments.
The architecture consists of three logical tiers:
Tier 1: Edge Nodes (The Data Plane)
Tier 2: Combiners (The Aggregation Layer and sub-local orchestrators)
Tier 3: Controller (The Control Plane)
Tier 1: Edge Nodes (The Data Plane)
The bottom tier consists of the Edge Nodes—the distributed nodes where your data resides. These can be anything from powerful on-premise servers to constrained IoT devices (e.g., NVIDIA Jetson, Raspberry Pi).
Key characteristics:
Sovereignty: This is the only tier that touches raw data. Code travels down to this tier; data never travels up.
Local Execution: Nodes can receive a Compute Package (runtime environment + ML logic) from the Control Plane and execute it locally. Or use a local package already present on the node.
Telemetry & Logging: Nodes report model metrics, logs, and artifacts back to the Control Plane for monitoring and auditing.
Secure Communication: Nodes communicate securely with the Control Plane using egress-only connections, ensuring no ingress ports are required.
Framework Agnostic: Edge nodes can run models built with any ML framework (TensorFlow, PyTorch, TFLite, ONNX, etc.).
Python, C++ and Kotlin client implementations are provided out-of-the-box.
Tier 2: Combiners (The Aggregation Layer)
The Combiner is the scalability engine of the platform. It acts as an intelligent gateway and aggregator that sits between the Control Plane and the Edge Nodes.
Key responsibilities:
Horizontal Scalability: You can deploy multiple Combiners to handle thousands of edge nodes. Each Combiner manages a subset of clients, distributing the load and ensuring efficient resource utilization.
Local Orchestration: Combiners can execute local orchestration logic, managing client participation in training rounds, handling retries, and enforcing policies defined by the Controller.
Model Aggregation: Combiners are responsible for aggregating model updates from their clients. This includes:
Executing the orchestration plan defined in the global session plan provided by the Controller.
Reducing client model updates into a single combiner-level model.
Fault Tolerance: Combiners can handle client dropouts and network issues, ensuring robust training even in unstable environments.
Secure Communication: Combiners maintain secure connections with both the Controller and their assigned Edge Nodes.
Tier 3 — The Control Plane
The top tier is the Control Plane, responsible for global orchestration, governance, and state management. It is the “brain” of the infrastructure.
Core Services:
Global orchestration Defines the training strategy, selects clients for rounds, and pushes session configurations to the Combiners.
Reducer Aggregates combiner-level models into the final global model.
Model Registry Maintains the immutable Model Trail, storing every version of the global model and its associated metadata
Governance & Security Manages user access (RBAC), issues authentication tokens (JWT), and logs all system events to the Audit Trail, such as client contributions to models.
Telemetry & Monitoring Collects and analyzes metrics from the entire system, providing insights into performance, resource utilization, and potential issues.
API Gateway Exposes RESTful APIs for managing the system, allowing users to interact programmatically.
User Interface A web-based dashboard for visualizing system status, managing sessions, clients, and monitoring the state of the network.
Notes on aggregation algorithms
Scaleout Edge includes several built-in aggregators for common federated learning workflows (see Aggregators). For advanced scenarios, users may override the Combiner-level behavior using server functions (Server Functions & Aggregators), allowing custom orchestration or aggregation logic.
Aggregation happens in two stages:
Each Combiner reduces client updates into a combiner-level model, and
The Controller (Reducer) combines these into the final global model.