Flow Description
graph TB
A["Steady State (regular traffic flow)"] --> B["Scale to 0: No Traffic"]
B --> C["Scale up from 0: New Incoming Traffic"]
C --> A
When we enable KubeElasti on a service, the service operates in 3 modes:
- Steady State: The service is receiving traffic and doesn't need to be scaled down to 0.
- Scale Down to 0: The service hasn't received any traffic for the configured duration and can be scaled down to 0.
- Scale up from 0: The service receives traffic again and can be scaled up to the configured minTargetReplicas.
1. Steady State: Flow of requests to service
In this mode, all requests are handled directly by the service pods; the KubeElasti resolver is not involved. The KubeElasti controller continually polls Prometheus with the configured query and checks the result against the threshold value to decide whether the service can be scaled down.
---
title: No incoming requests for the configured time period
displayMode: compact
config:
layout: elk
look: classic
theme: dark
---
graph LR
A[User Request] --> B[Ingress]
B --> C[Service]
C -->|Active| D[Pods]
subgraph Elasti Components
E[Elasti Controller]
F[Elasti Resolver]
end
C -.->|Inactive| F
E -->|Poll configured metric every 30 seconds to check if the service can be scaled to 0| S[Prometheus]
2. Scale Down to 0: when there are no requests
If the query from prometheus returns a value less than the threshold, KubeElasti will scale down the service to 0. Before it scales to 0, it redirects all requests to the KubeElasti resolver, then sets the rollout/deployment replicas to 0. It also pauses KEDA (if in use) to prevent it from scaling the service up, because KEDA is configured with minReplicas: 1
.
---
title: No incoming requests for the configured time period
displayMode: compact
config:
layout: elk
look: classic
theme: dark
---
graph LR
A[User Request] --> B[Ingress]
B --> C[Service]
subgraph Elasti Components
E[Elasti Controller]
F[Elasti Resolver]
end
C -->|Active| F
E -->|Scale replicas to 0| D[Pods]
C -.->|Inactive| D
How it works?
1. Switching to Proxy Mode
This is how we decide to switch to proxy mode.
sequenceDiagram
loop Background Tasks
Operator-->>ElastiCRD: Watch CRD for changes in ScaleTargetRef.
Note right of Operator: Watch ScaleTargetRef and Triggers
Operator-->>TargetService: Watch if ScaleTargetRef is scaled to 0 by <Br>any external component
Operator-->>Prometheus: Poll configured metric every 30 seconds<br> to check if the ScaleTargetRef has not received any traffic
Note right of Operator: If not traffic received for the configured <br> time period, Operator will switch to proxy mode.
Operator->>TargetService: Scale replicas to 0
Operator->>ElastiCRD: Switch to proxy mode.
end
2. Redirecting requests to resolver
This is how we redirect requests to resolver.
sequenceDiagram
Note right of Operator: When in Proxy Mode
Operator->>EndpointSlice: Create EndpointSlice for TargetService<br> which we want to point to resolver POD IPs
loop Background Tasks
Operator-->>Resolver: Watch Resolver POD IPs for changes
Operator-->>EndpointSlice: Update EndpointSlice with new POD IPs
end
3. Sync Private Service to Public Service
This is how we send traffic to target pod, even if the public service is pointing to resolver. We create a Private Service, as in Proxy Mode, we redirect the traffic to Resolver,
so we need to point the public service to resolver POD IPs.
sequenceDiagram
Note right of Operator: When in Proxy Mode
Operator->>TargetPrivateService: Create private service
loop Background Tasks
Operator-->>TargetService: Watch changes in label and IPs <br> in public service.
Operator-->>TargetPrivateService: Update label and IPs in <Br> private service to match Public Service.
end
3. Scale up from 0: when the first request arrives
Since the service is scaled down to 0, all requests will hit the KubeElasti resolver. When the first request arrives, KubeElasti will scale up the service to the configured minTargetReplicas. It then resumes Keda to continue autoscaling in case there is a sudden burst of requests. It also changes the service to point to the actual service pods once the pod is up. Requests reaching the KubeElasti resolver are retried for up to five minutes before a response is returned to the client. If the pod takes more than 5 mins to come up, the request is dropped.
---
title: First request to pod arrives
displayMode: compact
config:
layout: elk
look: classic
theme: dark
---
graph LR
A[User Request] --> B[Ingress]
B --> C[Service]
C -.->|Inactive| F[0 Pods]
subgraph Elasti Components
D[Elasti Controller]
E[Elasti Resolver]
end
C -->|Active| E
E -->|Hold request in memory and forward once ready| F
D -->|Scale replicas up from 0| F
---
title: State after the first replica is up
displayMode: compact
config:
layout: elk
look: classic
theme: dark
---
graph LR
A[User] -->|Request| B[Ingress]
B --> C[Service]
subgraph Elasti Components
E[Elasti Controller]
F[Elasti Resolver]
end
C -->|Active| G[Pods]
E -->|Check metric if workload can be scaled to 0| H[Prometheus]
C -.- |Inactive| F
How it works?
1. Bring the pod up
sequenceDiagram
Note right of Operator: When in Proxy Mode
Gateway->>TargetService: 1. External or Internal traffic
TargetService->>Resolver: 2. Forward request
par
Resolver->>Resolver: 3. Queue requests <br>in-memory (Req remains alive)
Resolver->>Operator: 4. Inform about the incoming request
end
par
Operator->>TargetService: 5. Scale up via HPA or KEDA
Operator->>Resolver: 6. Send info about target private service
end
2. Resolving queued requests
sequenceDiagram
loop
Resolver->>Pod: 7: Check if pod is up
end
par
Resolver->>TargetSvcPvt: 8: Send proxy request
TargetSvcPvt->>Pod: 9: Send & receive req
end
Note right of Resolver: Once pod is up, switch to serve mode