The Case for Small Backend Endpoints
One endpoint per screen is a trap. Endpoints should be shaped by data needs, not page layouts — that distinction has consequences for how you build both sides.
The endpoint-per-page pattern feels rational when you're building the first version. One request per route, predictable, maps cleanly to the navigation tree. By version two, the endpoint is returning 14 joined tables, the frontend is ignoring 60% of the payload, and adding a new card to the dashboard means a backend deployment. The better model: endpoints shaped by data units, not page layouts. Small, focused, independently cacheable.
The kitchen sink endpoint
Here's the anti-pattern made concrete. A dashboard page needs: a summary of open orders, a metric for revenue this week, a list of recent shipments, and the user's notification count. The kitchen sink approach produces one endpoint:
GET /api/dashboard
Response:
{
"openOrders": { "count": 14, "value": 48200 },
"weeklyRevenue": { "total": 92400, "change": 0.07 },
"recentShipments": [ ...20 items with full carrier details... ],
"notificationCount": 3,
"userProfile": { "name": "...", "role": "...", "lastLogin": "..." }
}
This works until it doesn't. The problems accumulate:
Latency is determined by the slowest join. If recentShipments requires a carrier API call with a 400ms timeout, the entire page blocks for 400ms even though notificationCount is a single count query that resolves in 2ms.
Caching is all-or-nothing. You can't cache the metric cards for 30 seconds and the notification count for 5 seconds if they live in the same response. Everything expires together.
Adding a field is a deployment event. The designer wants to add an "overdue orders" count. That means a new query, a new field in the response, a backend PR, a frontend PR, and a coordinated deploy. For one number.
Loading states collapse. The entire page is either loading or loaded. You can't show metric cards immediately while the shipments table is still fetching.
The small endpoint version
Split by data unit and ownership:
GET /api/metrics/orders/summary → { count, value }
GET /api/metrics/revenue/weekly → { total, change }
GET /api/shipments/recent?limit=20 → [ ...shipments... ]
GET /api/notifications/count → { count }
Each endpoint owns one thing. The frontend fires all four in parallel. The page renders as data arrives.
Now you can cache /api/metrics/revenue/weekly for 60 seconds with a Cache-Control header — it won't change between requests. /api/notifications/count gets 10 seconds. /api/shipments/recent gets 15. Independent TTLs, independent staleness decisions. An HTTP caching layer (or even a CDN for public data) handles this for free if you structure it right.
What this enables on the frontend
Four parallel requests with independent loading states maps directly to four skeleton zones:
function Dashboard() {
const orders = useQuery({ queryKey: ["orders", "summary"], queryFn: fetchOrderSummary });
const revenue = useQuery({ queryKey: ["revenue", "weekly"], queryFn: fetchWeeklyRevenue });
const shipments = useQuery({ queryKey: ["shipments", "recent"], queryFn: fetchRecentShipments });
const notifications = useQuery({ queryKey: ["notifications", "count"], queryFn: fetchNotificationCount });
return (
<DashboardLayout>
<MetricStrip>
<MetricCard query={orders} label="Open Orders" />
<MetricCard query={revenue} label="Revenue This Week" />
<MetricCard query={notifications} label="Unread" />
</MetricStrip>
<ShipmentsTable query={shipments} />
</DashboardLayout>
);
}
Each MetricCard shows its own skeleton while its query is pending. The table has its own skeleton. The user sees the metric cards appear within 80ms and the table fill in at 300ms. The old endpoint would have held all of that until the slowest piece resolved.
This also means error states are scoped. If the revenue metric fails, the rest of the page still works. The card shows an error state; everything else renders normally. With a kitchen sink endpoint, one failing join takes the whole page down.
The cost of small endpoints
There is a real cost: request count. Four requests instead of one means four round trips, four sets of headers, four connections (mitigated by HTTP/2 multiplexing, but not eliminated). On a slow mobile network or a high-latency VPN, this can be perceptible.
The mitigation is not to collapse back to a single endpoint — it's to measure. Most internal dashboards run on fast connections. Most slow-network users are on public-facing pages where you have other strategies (SSR, prefetching, edge caching). The round-trip cost is usually not the bottleneck people assume it is.
When it genuinely is — when you're building for weak connections or you've profiled and confirmed waterfall latency is the issue — that's when you reach for a BFF.
When to aggregate: the BFF pattern
A Backend for Frontend layer is an aggregation service that lives between the API layer and the client. It's not a replacement for small endpoints; it's a composition layer on top of them.
Client → BFF (Node.js) → GET /api/metrics/orders/summary
→ GET /api/metrics/revenue/weekly
→ GET /api/shipments/recent
→ GET /api/notifications/count
The BFF fires the downstream requests in parallel, aggregates, and returns one response to the client. You get the network efficiency of one request and the architectural cleanliness of small, independently cacheable downstream endpoints.
The BFF pattern makes sense when: you're targeting mobile clients with constrained bandwidth, you have multiple client types (web/mobile/TV) with different data shape needs, or you need to aggregate across microservices that can't be called directly from the browser. It does not make sense as a catch-all for every project. Adding a BFF to a monolith that's serving one web client is almost always premature.
Endpoint size and skeleton screen design
Here's the design implication most teams miss: your skeleton screens should be designed around your endpoint boundaries, not your page layout.
If you have one endpoint, you have one loading state: a full-page skeleton that resolves all at once. If you have four endpoints, you have four loading zones, each with their own skeleton and their own resolution timing. The designer needs to know which data lands first so they can prioritize what appears above the fold.
In practice this means frontend engineers and designers need to agree on endpoint granularity before either party starts building. The designer shouldn't be drawing a skeleton screen for a section whose data is bundled into the same response as the header. The engineer shouldn't be splitting endpoints in ways that create visible jank the designer didn't account for.
That coordination is the real architectural decision. Small endpoints make it explicit. Kitchen sink endpoints hide it until someone notices the skeleton flash is ugly in production.
A useful heuristic
If a field in your endpoint response is never read by the frontend, the endpoint is over-fetching. Audit your API responses against actual frontend usage once a quarter. You'll find fields that were added for a feature that got cut, join data that's only used in one edge case, and nested objects that get immediately destructured into one property.
Small endpoints don't eliminate over-fetching, but they localize it. When /api/metrics/orders/summary returns an unused field, the blast radius is one small response. When /api/dashboard over-fetches, you're paying the serialization and transport cost across the entire page on every load.
Shape endpoints around data units. Let the page assemble them. The loading states will be better and the caching story will actually work.