Engineering Goals for High Performance

Published on: September 17, 2025

For a system built with FastAPI, Docker, and Kubernetes, high performance is achieved when the application layer, containerization, and orchestration all excel.

Here are the key values and metrics we use to define and measure it.

1. Application Performance: The FastAPI Service

At the core, the performance of the API itself is paramount. FastAPI's asynchronous nature gives us a head start, but we measure its success with these metrics:

Low Latency (Response Time): This is the time it takes for an API to process a request and send a response.
- Good: < 500ms
- Excellent: < 200ms
High Throughput (Requests Per Second - RPS): This measures how many requests the service can handle simultaneously.
- Good: Thousands of RPS
- Excellent: 10,000+ RPS
Minimal Error Rate: The percentage of requests that result in an error. A low rate signifies stability and reliability.
- Good: < 0.5%
- Excellent: < 0.1%
Predictable Tail Latency (p99): This metric focuses on the user experience for the 99th percentile, ensuring the system is fast for almost everyone, not just on average.
- Good: p99 < 1s
- Excellent: p99 < 400ms

2. Container Efficiency: The Docker Environment

Docker ensures our microservices are packaged and deployed consistently. High performance here means efficiency and speed.

Small Image Size: Smaller images transfer faster over the network, leading to quicker deployments and scaling events.
- Target: < 200MB for a production-ready Python service.
Fast Container Startup Time: The time it takes for a container to go from "starting" to "running" and ready to accept traffic. This is critical for rapid scaling.
- Target: < 2 seconds.
Optimized Resource Footprint (CPU/Memory): Services should be profiled to understand their resource needs. Efficient containers use just enough CPU and memory to perform their tasks without waste, allowing for higher density on our nodes.

3. Orchestration & Scalability: The Kubernetes Platform

Kubernetes manages our containerized applications, ensuring they are resilient and can handle fluctuating demand.

High Availability (Uptime): The percentage of time our service is operational and available to users.
- Good: 99.9% ("three nines") - Approx. 8.7 hours of downtime/year.
- Excellent: 99.99% ("four nines") - Approx. 52 minutes of downtime/year.
Rapid & Elastic Scaling: The ability of the system to automatically add or remove service replicas in response to real-time traffic.
- Metric: Time to scale up (e.g., time from CPU spike to new pods being ready).
- Target: < 1 minute to have new pods ready to serve traffic.
Efficient Resource Utilization: Kubernetes should balance workloads evenly across the cluster to prevent "hot spots" (overloaded nodes) and maximize the use of our infrastructure.
- Target: Node CPU/Memory utilization maintained between 60-80% to balance cost-efficiency with headroom for traffic spikes.

The Holistic View: System-Wide Performance

Finally, high performance is measured by how these layers work together to create a resilient and responsive system.

Low End-to-End Latency: The total time a user request takes to travel through our entire system—from the entry point, through multiple microservices, to the final response.
Fast Mean Time to Recovery (MTTR): When a failure inevitably occurs, this is the average time it takes to fully restore service. A low MTTR is a key indicator of a truly robust, high-performance system.

By focusing on these specific metrics, we have a concrete set of engineering goals that guide our development, deployment, and operational strategies.

Share this post

Share on X • Share on LinkedIn • Share via Email