Cache Layer Architecture

Poorly defined cache boundaries are the root cause of most stale-data bugs and excess network traffic in SaaS frontends: components fetch redundantly, mutations propagate to some views but not others, and memory grows unbounded in long-lived SPAs. This page, a sub-topic within State Architecture & Cache Fundamentals, details how to construct a normalized cache layer that separates server state from transient UI state, configures framework adapters consistently, and defines explicit lifecycle rules that scale as your data graph grows. If you are first evaluating where cache state should live, read Client vs Server State Boundaries before continuing here; for the specific decision between React Query and Redux, see React Query vs Redux for Server State.

Diagnostic checklist

You are in the right place if you observe any of these symptoms:

The same API endpoint is called multiple times per page render with identical parameters.
A mutation updates the database but sibling components continue displaying stale data.
Browser memory climbs steadily over a user session without a visible leak in component code.
Optimistic UI updates flicker or snap back inconsistently after server confirmation.
Invalidating one resource accidentally clears unrelated cached queries.
Hydration mismatches appear during SSR because the client re-fetches data the server already had.

Prerequisites

Before implementing the patterns below you should be comfortable with:

Reference vs Value Storage Models — understanding when the cache holds a pointer to a shared entity vs a deep-copied payload is essential before choosing normalization depth.
Normalization Principles for UI — the difference between a flat entity table and a nested response shape, and why denormalization should happen at render time, not at storage time.
TanStack Query v5 QueryClient API (staleTime, gcTime, structuralSharing, setQueryData, invalidateQueries) or Apollo Client v3 InMemoryCache (cache.modify, cache.identify, readFragment/writeFragment).

Data-flow overview

The diagram below models how a SaaS frontend cache layer sits between the network and the component tree. API responses are normalized into a flat entity store before they reach any component; components read denormalized views reconstructed at render time.

Implementation: Normalized Entity Graph

Flat API responses must be transformed into a keyed entity graph before entering the cache. This prevents storage duplication and ensures that a single setQueryData call propagates atomically to every component that reads the same entity.

Steps

Extract unique identifiers. Parse each item in the response for its primary key (id, uuid, or a composite like ${type}:${id}) before writing to the cache.
Build a flat lookup table. Store entities as Record<string, Entity> and maintain a parallel ordered string[] of IDs to preserve list sequencing without re-sorting on every render.
Decouple component reads from payload shape. Components should receive normalized references and reconstruct their denormalized view via a select transform at render time, not at write time.

// queryClient.ts — TanStack Query v5
import { QueryClient } from '@tanstack/react-query';

export interface NormalizedList<T> {
  ids: string[];
  entities: Record<string, T>;
}

export function normalizeList<T extends { id: string }>(
  items: T[]
): NormalizedList<T> {
  const ids: string[] = [];
  const entities: Record<string, T> = {};

  for (const item of items) {
    ids.push(item.id);
    entities[item.id] = item;
  }

  return { ids, entities };
}

export const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      // 5-minute stale window: safe for most SaaS entity lists
      staleTime: 1000 * 60 * 5,
      // 30-minute garbage-collection window for inactive queries
      gcTime: 1000 * 60 * 30,
      // Skip re-render when the returned reference is structurally equal
      structuralSharing: true,
    },
  },
});

// useProjects.ts — apply normalization via the select option
import { useQuery } from '@tanstack/react-query';
import { normalizeList, NormalizedList } from './queryClient';

interface Project { id: string; name: string; status: 'active' | 'archived'; }

export function useProjects() {
  return useQuery<Project[], Error, NormalizedList<Project>>({
    queryKey: ['projects'],
    queryFn: () => fetch('/api/projects').then(r => r.json()),
    // select runs after every successful fetch and after cache reads.
    // TanStack memoizes it: if the raw data reference is unchanged,
    // select does not re-run and the previous normalized map is returned.
    select: normalizeList,
  });
}

Cache Behavior Impact: structuralSharing: true runs a deep structural equality check on the incoming data before committing it to the cache. If the server response is byte-for-byte identical to the last snapshot, TanStack Query keeps the previous reference and no subscriber re-renders. The select transform receives the raw cached value, not the normalized one, so structural sharing operates on the raw API response — this prevents unnecessary re-normalization on focus-triggered background refetches.

Configuration Trade-offs

staleTime vs data volatility. A 5-minute staleTime is safe for static entity lists but wrong for real-time inventory or presence data. Set staleTime: 0 for those queries and use refetchInterval or a WebSocket channel alongside.
gcTime and memory pressure. A 30-minute gcTime means inactive queries remain in memory for 30 minutes. In SPAs with hundreds of unique query keys, this can accumulate to tens of MB. Profile with queryClient.getQueryCache().getAll().length in development to detect key sprawl.
Composite key overhead. Normalization requires consistent ID extraction. A backend that returns id for some resources and _id for others silently breaks the lookup graph. Enforce a mapping layer at the API boundary.
structuralSharing cost on large payloads. The deep equality pass runs synchronously on the main thread. For payloads exceeding ~5 000 objects, disable structuralSharing on those specific queries and rely on mutation-driven invalidation instead.

Implementation: Framework Adapter Configuration

Both TanStack Query and Apollo Client expose low-level configuration that most teams leave at defaults long after those defaults become a liability. Explicit adapter configuration standardizes retry behaviour, deduplication windows, and fetch cancellation across every query in the application.

Steps

Standardize the fetch function signature. Wrap fetch or axios in a single adapter that handles auth headers, request cancellation, and non-2xx error throwing uniformly. Pass the adapter to QueryClient via queryFn defaults or to Apollo via HttpLink.
Configure exponential back-off with jitter. Retry only on network failures and 5xx errors; fail immediately on 4xx. Spread retry delay with jitter to prevent thundering-herd reconnect bursts.
Set deduplication and window consistency. In TanStack Query, concurrent calls with identical query keys share one in-flight request automatically — verify you have not accidentally diverged query keys for the same resource.

// fetchAdapter.ts — shared fetch wrapper for TanStack Query queryFn
export class ApiError extends Error {
  constructor(public status: number, message: string) {
    super(message);
    this.name = 'ApiError';
  }
}

export async function apiFetch<T>(
  endpoint: string,
  signal?: AbortSignal
): Promise<T> {
  const response = await fetch(`/api${endpoint}`, {
    signal,
    headers: {
      'Content-Type': 'application/json',
      // In production, inject auth token from a secure store, not here.
    },
  });

  if (!response.ok) {
    // 4xx: do not retry — propagate immediately so QueryClient treats it as non-retryable
    throw new ApiError(response.status, `API error: ${response.status}`);
  }

  return response.json() as Promise<T>;
}

// queryClient.ts — configure retry policy
import { QueryClient } from '@tanstack/react-query';
import { ApiError } from './fetchAdapter';

export const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      staleTime: 1000 * 60 * 5,
      gcTime: 1000 * 60 * 30,
      structuralSharing: true,
      // Retry only on non-4xx errors, up to 3 attempts with exponential back-off
      retry: (failureCount, error) => {
        if (error instanceof ApiError && error.status < 500) return false;
        return failureCount < 3;
      },
      retryDelay: (attempt) =>
        Math.min(1000 * 2 ** attempt + Math.random() * 200, 30_000),
    },
    mutations: {
      retry: 0, // Never auto-retry mutations: side effects are not idempotent
    },
  },
});

Cache Behavior Impact: TanStack Query’s request deduplication operates at the query key level: if two components mount simultaneously and both trigger a ['projects'] query, only one HTTP request fires. The second subscriber attaches to the in-flight promise via the internal QueryObserver mechanism. Configuring retry as a function gives you per-error-type control — ApiError with status < 500 short-circuits the retry loop entirely, so 401 and 403 responses surface to the UI immediately rather than after three delayed retries.

Configuration Trade-offs

Retry on mutations. Setting retry > 0 on mutations risks duplicate side effects — a payment charge, a sent email — if the server processed the request but the response was lost in transit. Keep mutations.retry: 0 as a hard rule.
AbortSignal propagation. Passing signal from TanStack Query’s queryFn argument to fetch enables automatic request cancellation when a component unmounts mid-fetch. Without it, the network request completes and the response is discarded, wasting bandwidth and keeping server threads occupied.
Shared QueryClient vs per-route instances. A single QueryClient mounted at the application root deduplicates correctly. Multiple instances (e.g., one per page in a micro-frontend setup) cannot share cache entries; this is the most common cause of redundant fetches after navigation.
HttpLink vs BatchHttpLink in Apollo. BatchHttpLink reduces round-trips by merging operations, but increases perceived latency for individual queries because each request waits for the batch window to close. Prefer HttpLink with Apollo’s built-in deduplication unless you have measured network overhead that justifies batching.

Implementation: Cache Lifecycle and Invalidation Boundaries

Once the entity graph is normalized and the adapter is configured, you need explicit rules for when data ages out and how mutations trigger targeted refreshes. The two failure modes here are over-invalidation (resetting the entire cache on every mutation) and under-invalidation (relying solely on TTL expiry and missing server-side changes).

Steps

Implement stale-while-revalidate windows. Serve the cached entity immediately while a background refetch runs. Tune staleTime per resource volatility tier: 0 s for real-time, 60 s for frequently-edited entities, 300 s for reference data.
Scope invalidation by resource prefix. After a mutation, call invalidateQueries({ queryKey: ['projects'] }) rather than invalidateQueries() to flush only the affected resource family. For cross-resource dependencies (e.g., a team mutation that affects project membership), invalidate both keys in a single Promise.all.
Apply optimistic updates with rollback. Set the new entity state immediately on mutation dispatch; roll back to the pre-mutation snapshot on failure using the onError context pattern.

// useUpdateProject.ts — TanStack Query v5 optimistic mutation
import { useMutation, useQueryClient } from '@tanstack/react-query';
import { apiFetch } from './fetchAdapter';
import { NormalizedList } from './queryClient';

interface Project { id: string; name: string; status: 'active' | 'archived'; }

interface UpdateProjectInput { id: string; status: 'active' | 'archived'; }

export function useUpdateProject() {
  const queryClient = useQueryClient();

  return useMutation<Project, Error, UpdateProjectInput, { previous: NormalizedList<Project> | undefined }>({
    mutationFn: ({ id, status }) =>
      apiFetch<Project>(`/projects/${id}`, undefined).then(/* POST in real impl */),

    // 1. Snapshot the current cache before mutating
    onMutate: async ({ id, status }) => {
      // Cancel any outgoing refetches so they don't overwrite our optimistic update
      await queryClient.cancelQueries({ queryKey: ['projects'] });

      const previous = queryClient.getQueryData<NormalizedList<Project>>(['projects']);

      // 2. Apply the optimistic change to the normalized entity store
      queryClient.setQueryData<NormalizedList<Project>>(['projects'], (old) => {
        if (!old) return old;
        return {
          ...old,
          entities: {
            ...old.entities,
            [id]: { ...old.entities[id], status },
          },
        };
      });

      return { previous };
    },

    // 3. On server confirmation, invalidate to sync any server-side derived fields
    onSuccess: () => {
      queryClient.invalidateQueries({ queryKey: ['projects'] });
    },

    // 4. On failure, restore the snapshot
    onError: (_err, _vars, context) => {
      if (context?.previous) {
        queryClient.setQueryData(['projects'], context.previous);
      }
    },
  });
}

Cache Behavior Impact: cancelQueries issues an abort signal to any in-flight ['projects'] fetch via the AbortController associated with the query. This prevents a race condition where a background refetch resolves after the optimistic write and overwrites it with stale server data. The onMutate → onError → rollback pattern is the canonical TanStack Query v5 approach: context passes the pre-mutation snapshot through the mutation lifecycle without any global variable. When onSuccess fires invalidateQueries, TanStack Query marks the ['projects'] query stale and schedules a background refetch — components see the optimistic value until the refetch resolves, then receive the authoritative server state.

For tag-based invalidation systems in Apollo, the equivalent uses cache.modify to write the confirmed value directly into the normalized graph:

// Apollo Client v3 — targeted cache patch after mutation confirmation
import { useMutation, gql } from '@apollo/client';

const UPDATE_PROJECT = gql`
  mutation UpdateProject($id: ID!, $status: String!) {
    updateProject(id: $id, status: $status) {
      id
      status
    }
  }
`;

function useUpdateProjectApollo() {
  return useMutation(UPDATE_PROJECT, {
    // Optimistic response tells Apollo what the mutation result looks like
    optimisticResponse: ({ id, status }) => ({
      updateProject: { __typename: 'Project', id, status },
    }),
    // update patches the normalized InMemoryCache directly
    update: (cache, { data }) => {
      if (!data?.updateProject) return;
      cache.modify({
        id: cache.identify(data.updateProject),
        fields: {
          status: () => data.updateProject.status,
        },
      });
    },
  });
}

Cache Behavior Impact: cache.identify resolves the Apollo InMemoryCache’s normalized key for the entity (Project:${id}) so cache.modify targets exactly that node. The optimistic response is written immediately to every query that references Project:${id} — Apollo’s reactive variables propagate the change to all active useQuery subscribers without an additional network round-trip. On server confirmation, Apollo replaces the optimistic entry with the real response; on failure, it reverts to the pre-optimistic snapshot automatically.

For background refetch strategies and tuning revalidation intervals at scale, see the sibling topic on optimizing SWR revalidation intervals.

Configuration Trade-offs

Memory pressure with long gcTime. A gcTime of 30 minutes means every inactive query key stays in the in-memory cache for 30 minutes after its last subscriber unmounts. In SPAs where users navigate across many resource types, measure total cache size in development: queryClient.getQueryCache().getAll().reduce((n, q) => n + JSON.stringify(q.state.data ?? '').length, 0).
Race conditions during concurrent mutations. If two mutations target the same entity simultaneously (e.g., two users editing the same record, or a bulk-update operation running alongside a single-row edit), their optimistic writes will overwrite each other unless serialized via a mutation queue or version vector. TanStack Query does not serialize concurrent mutations automatically.
Aggressive invalidation and waterfall refetches. Invalidating broad query key families (e.g., ['projects'] when only one project changed) triggers a refetch of every matching query. In a dashboard with 20 active queries under the projects key family, this creates 20 simultaneous network requests. Prefer surgical setQueryData patches over broad invalidateQueries wherever the mutation response includes the full updated entity.
cancelQueries scope. cancelQueries({ queryKey: ['projects'] }) cancels all in-flight queries whose key starts with ['projects']. If you have nested key families (['projects', teamId]), the cancel will propagate to all of them. This is usually the right behaviour but can create subtle ordering issues if team-scoped queries have independent refetch schedules.

Common Pitfalls & Resolutions

Observable Issue	Root Cause	Diagnostic Resolution
Identical API requests firing for every component mount	Multiple components sharing the same logical resource are using divergent query keys (e.g., `['project', id]` vs `['projects', id]`)	Audit query keys with `queryClient.getQueryCache().getAll().map(q => q.queryKey)` in development; enforce a query key factory to guarantee key consistency
UI flickers back to stale state after a mutation succeeds	`onSuccess` calls `invalidateQueries` before the optimistic write; or a background refetch resolves between `onMutate` and `onSuccess` without `cancelQueries`	Add `await queryClient.cancelQueries` in `onMutate`; verify `onSuccess` fires `invalidateQueries` after server confirmation, not alongside it
Memory footprint grows across user sessions	`gcTime` is set too high (or at `Infinity`) for queries that accumulate unique keys (e.g., per-user-action query keys)	Set `gcTime` to session-appropriate durations and use a query key factory that limits cardinality; monitor with `getQueryCache().getAll().length`
Apollo cache modifications not reflecting in a sibling component	The sibling component’s query result references a different entity key or uses a `fetchPolicy` that bypasses the normalized cache	Confirm `cache.identify` resolves correctly; ensure the sibling query uses `cache-first` or `cache-and-network`; check that `__typename` is present in all responses

Frequently Asked Questions

How do I handle cache invalidation for deeply nested relational data without flushing the whole cache?

Use entity-level cache modification — cache.modify in Apollo or setQueryData in TanStack Query — to patch specific nodes by their normalized ID. Scope invalidation to the parent resource ID; the normalized lookup table propagates changes to every consuming component without a full cache reset. Only fall back to invalidateQueries when the server response includes derived fields that cannot be reconstructed client-side.

When should I prefer reference storage over value storage in the frontend cache?

Prefer reference storage whenever two or more components consume the same entity. Reference storage ensures a single mutation propagates atomically across the entire UI tree and eliminates duplicate memory consumption from deep-cloned payloads. Value storage is appropriate only for ephemeral, component-local state that never needs to be shared (e.g., a form draft that has not yet been submitted).

What staleTime is appropriate for SaaS dashboards that need near-real-time data?

Set staleTime: 0 for actively-viewed real-time metrics, combined with refetchInterval: 10_000 or a WebSocket push channel for critical updates. For entity detail pages that a user is actively editing, use staleTime: 0 plus refetchOnWindowFocus: false to prevent background refetches from overwriting in-progress edits. Static reference data (countries, plan tiers, feature flags) can tolerate staleTime: 1000 * 60 * 60 (1 hour) without impacting user experience.

How does normalization affect SSR hydration performance?

Normalizing on the server before serialization (e.g., inside dehydrate for TanStack Query or extractApolloState for Apollo) reduces payload size and shifts the graph-reconstruction work to where it is cheapest: the server. On the client, HydrationBoundary restores the pre-built normalized map directly into the QueryClient without a second parse pass. This prevents React hydration mismatches caused by structural differences between the server-rendered snapshot and the client-side initial fetch. If you cannot normalize on the server, ensure the client-side select transform is idempotent so repeated calls during hydration produce identical references.

State Architecture & Cache Fundamentals — the parent topic covering the full spectrum of frontend cache design, from client/server state separation through normalization and synchronization.
Client vs Server State Boundaries — how to decide which state belongs in the query cache vs local component state vs a global store, with concrete decision criteria for SaaS applications.
Reference vs Value Storage Models — understanding entity reference graphs vs deep-copied value payloads, which underpins the normalization approach on this page.
Stale-While-Revalidate Implementation — a detailed recipe for configuring background refetch windows in TanStack Query and SWR so users always see an instant response without sacrificing data freshness.
Tag-Based Invalidation Systems — how to group cache entries by resource type and invalidate them surgically after mutations, avoiding the over-invalidation trap covered in the pitfalls table above.
Mutation Sync & Rollback — advanced patterns for handling concurrent mutations, conflict resolution, and rollback in collaborative SaaS UIs.