Data Normalization & Query Key Design

Nested API responses are the leading cause of stale-data bugs in production React applications. When the same User record arrives embedded inside a Post, a Comment, and a Notification response, each copy lives under a separate cache key — and a single write to one copy leaves the others stale until the next full invalidation. This reference duplication problem, and the query key chaos that usually accompanies it, affects every team working at scale with React Query, Apollo, SWR, or RTK Query.

This guide is for frontend platform teams and SaaS product engineers who own the cache layer. It covers the architectural decisions that determine whether your cache is a consistent, efficient lookup store or a growing pile of duplicated, diverging snapshots: entity mapping and ID-based normalization, deterministic query key factories, relational reference stitching, and pagination normalization.

Understanding client vs server state boundaries before applying these patterns will help you avoid over-normalizing ephemeral UI state that belongs in local component state, not the query cache.

Architectural Overview

The diagram below models the data flow from raw network response through normalization to the component tree — and the two paths an update can take: a cache write that propagates to all affected refs, or a full invalidation that triggers a re-fetch.

Core Concepts Reference

Concept	Definition	React Query API	Apollo Client API	SWR API	RTK Query API
Query Key	Serializable array uniquely identifying a cache entry	`useQuery({ queryKey })`	`cache.readQuery({ query, variables })`	`useSWR(key)`	endpoint cache tag
Entity normalization	Flattening nested objects into ID-indexed records	Manual `select` transform or `normalizer`	`InMemoryCache` with `keyFields`	Manual in `fetcher`	`createEntityAdapter`
Invalidation	Marking cache entries stale to trigger re-fetch	`queryClient.invalidateQueries`	`cache.evict` / `refetchQueries`	`mutate(key)`	`invalidatesTags`
Optimistic update	Writing expected state before server confirms	`onMutate` + `setQueryData`	`optimisticResponse` + `update`	`mutate(key, data, false)`	`onQueryStarted`
Stale time	Window after fetch during which data is considered fresh	`staleTime` (ms)	`fetchPolicy: 'cache-first'`	`dedupingInterval`	N/A (RTK uses polling)
GC time	How long unused data stays in memory before eviction	`gcTime` (ms)	`InMemoryCache` LRU	N/A	`keepUnusedDataFor`
Structural sharing	Reusing object references when data has not changed	`structuralSharing: true` (default)	Automatic by ref	N/A	Enabled by default

Strategy 1 — Entity Mapping & Normalization at the Network Boundary

The normalization transform must run as close to the network boundary as possible — not inside a component, not inside a selector. By the time a response reaches the queryFn return value, it should already be in normalized form. This prevents two components from independently transforming the same payload and diverging on edge cases.

Entity mapping strategies vary by payload complexity. For simple REST responses, a manual reducer that indexes items by id is sufficient. For deeply nested GraphQL responses, normalizr or a schema-based approach scales better — see flattening deeply nested GraphQL responses for a worked example.

Apollo’s InMemoryCache handles this automatically via keyFields, which is why Apollo users rarely write manual normalizers. TanStack Query users must implement normalization explicitly or accept per-key duplication.

Understanding the normalization principles that govern flat state trees will clarify when denormalization is the better trade-off — particularly for read-heavy, rarely mutated data.

Configuration trade-offs:

Setting gcTime too short (below 300 000 ms) on entity detail keys evicts records that are still referenced by list selectors, forcing redundant network requests during fast navigation.
Disabling structuralSharing in TanStack Query eliminates the referential equality optimization that prevents downstream React.memo components from re-rendering when data is unchanged — leave it enabled unless debugging.
Apollo keyFields must be set at cache construction time; changing them after the cache is hydrated leaves stale entries that do not merge correctly until the cache is reset.
Overly broad normalization of ephemeral response fields (e.g. cursor tokens, transient error messages) inflates the entity store and makes GC pressure worse.

Strategy 2 — Deterministic Query Key Design

A query key is the cache’s primary index. Treating it as an afterthought causes three failure modes in production: phantom cache entries from non-deterministic serialization, over-broad invalidations that flush unrelated data, and under-broad invalidations that leave stale data visible.

Designing stable query keys for React Query covers the key factory pattern in detail. The core rule: all dynamic parameters must be sorted before insertion so that { page: 1, sort: 'asc' } and { sort: 'asc', page: 1 } produce the same key array.

Hierarchical key structures enable surgical invalidation. Calling queryClient.invalidateQueries({ queryKey: ['users'] }) marks every key beginning with 'users' as stale — ['users', 'list', …], ['users', 'detail', …], and ['users', 'infinite', …] all re-fetch. Calling queryClient.invalidateQueries({ queryKey: ['users', 'detail', userId] }) marks only that record stale.

Configuration trade-offs:

Deeply nested key hierarchies (more than 4 levels) increase the complexity of partial invalidation and require careful documentation so that mutation handlers target the right prefix.
Serializing large filter objects into the key array increases memory cost per cache entry. Prefer deriving a compact hash for filter objects with more than 6 fields.
Using exact: true in invalidateQueries is necessary when you want to avoid cascading invalidation across sibling keys with the same prefix. The default (exact: false) is safer for most mutation handlers.
queryClient.removeQueries permanently evicts the entry rather than just marking it stale; use it only when you want a loading state, not a background refresh.

Strategy 3 — Relational Reference Stitching & Cache Integrity

Once entities live in a flat store, components need to reconstruct relational graphs at render time. The relationship stitching in cache approach stores only scalar IDs as foreign keys — post.authorId: 'u-42' rather than post.author: { id: 'u-42', name: '…' } — and resolves the full object in a memoized selector.

This pattern guarantees that updating users['u-42'] in the store propagates to every component that reads post.author through the selector, without any explicit invalidation of the post query.

For how reference vs value storage models affect re-render frequency and garbage collection, the state architecture fundamentals section covers both strategies with concrete benchmarks.

Cascade invalidation — automatically purging dependent entities when a parent mutates — is necessary when deletion is involved. Deleting users['u-42'] without purging posts that reference it leaves dangling IDs that cause null pointer errors in selectors.

Configuration trade-offs:

Memoized selectors (createSelector from Reselect, or useMemo) are mandatory at graph depth > 2. Unmemoized selectors recompute on every render and negate the referential equality benefits of normalization.
Bi-directional relationships (User → Posts, Post → Author) require matching postIds on the User record. This doubles write complexity at ingestion time but eliminates O(n) scans at read time.
Apollo cache.writeFragment for partial updates bypasses query-level observers unless you also trigger cache.modify or refetchQueries. Prefer cache.modify for mutations that affect multiple fields on the same entity.
RTK Query’s providesTags and invalidatesTags must be symmetric: if a getUser endpoint provides [{ type: 'User', id }], every mutation that touches that user must invalidate the same tag shape.

Strategy 4 — List & Pagination Normalization

Pagination introduces a second level of cache structure: the ordered list of IDs for each page sits alongside the flat entity records, and the two must be managed independently. Pagination normalization patterns covers three distinct strategies: offset-based, cursor-based, and infinite scroll with useInfiniteQuery.

The critical invariant: the list store holds IDs and page metadata; the entity store holds records. Conflating them causes the common bug where adding a new record to the first page pushes an existing record off the visible list even though it still exists in the entity store.

Merging paginated lists without duplicates and normalizing cursor-based pagination address the two most error-prone cases in production.

For stale-while-revalidate pagination, the first page is typically served from cache immediately while a background re-fetch confirms or updates the order — this requires careful handling when a mutation changes sort rank.

Configuration trade-offs:

useInfiniteQuery with gcTime: 0 evicts all pages on component unmount, which produces a loading flash on mount. Set gcTime to at least 5 minutes for infinite scroll feeds.
Setting maxPages in useInfiniteQuery caps memory by dropping oldest pages but breaks “jump to page N” navigation patterns unless you also implement a separate page-specific query.
Optimistic list mutations (adding a record to the top of a list) must also normalize the new entity into the detail store; otherwise a detail page opened immediately after the mutation renders stale data.
Offset-based pagination is susceptible to the “page drift” bug: if a record is deleted between page 1 and page 2 fetches, every subsequent page is off by one. Cursor-based pagination is immune to this.

Production Code: End-to-End Normalized Cache Pipeline

The following snippet wires together all four strategies in a realistic TanStack Query v5 / TypeScript setup: a key factory, a normalization transform, a useQuery hook that populates the normalized store, and a mutation that performs an optimistic update and targeted invalidation.

// cache/queryKeys.ts — deterministic key factory
// All filter objects are sorted before insertion to prevent phantom cache entries.
type Filters = Record<string, unknown>;

export const userKeys = {
  all: ['users'] as const,
  lists: () => [...userKeys.all, 'list'] as const,
  list: (filters: Filters) =>
    [...userKeys.lists(), Object.fromEntries(Object.entries(filters).sort())] as const,
  details: () => [...userKeys.all, 'detail'] as const,
  detail: (id: string) => [...userKeys.details(), id] as const,
};

// cache/normalize.ts — normalization transform at the network boundary
export interface UserDTO {
  id: string;
  name: string;
  email: string;
  posts: Array<{ id: string; title: string; authorId: string }>;
}

export interface NormalizedUsers {
  users: Record<string, Omit<UserDTO, 'posts'> & { postIds: string[] }>;
  posts: Record<string, { id: string; title: string; authorId: string }>;
  ids: string[];
}

export function normalizeUsers(raw: UserDTO[]): NormalizedUsers {
  const users: NormalizedUsers['users'] = {};
  const posts: NormalizedUsers['posts'] = {};
  const ids: string[] = [];

  for (const user of raw) {
    const postIds: string[] = [];
    for (const post of user.posts) {
      posts[post.id] = post;
      postIds.push(post.id);
    }
    users[user.id] = { id: user.id, name: user.name, email: user.email, postIds };
    ids.push(user.id);
  }

  return { users, posts, ids };
}

// hooks/useUserList.ts — useQuery with normalization applied in select
import { useQuery, useQueryClient, useMutation } from '@tanstack/react-query';
import { userKeys, normalizeUsers, type NormalizedUsers } from '../cache';

async function fetchUsers(filters: Filters): Promise<NormalizedUsers> {
  const res = await fetch(`/api/users?${new URLSearchParams(filters as Record<string, string>)}`);
  if (!res.ok) throw new Error('Failed to fetch users');
  const raw = await res.json();
  // Normalization happens here — at the boundary — before React Query stores the value.
  return normalizeUsers(raw);
}

export function useUserList(filters: Filters) {
  return useQuery({
    queryKey: userKeys.list(filters),
    queryFn: () => fetchUsers(filters),
    staleTime: 60_000,   // treat data as fresh for 60 s; background re-fetch on focus after that
    gcTime: 300_000,     // keep unused data in memory for 5 min to avoid loading flash on back-nav
    structuralSharing: true, // default: reuses object refs when data is unchanged, preventing re-renders
  });
}

// hooks/useUpdateUser.ts — mutation with optimistic update + targeted invalidation
interface UpdateUserInput { id: string; name: string }

export function useUpdateUser() {
  const queryClient = useQueryClient();

  return useMutation({
    mutationFn: (input: UpdateUserInput) =>
      fetch(`/api/users/${input.id}`, {
        method: 'PATCH',
        body: JSON.stringify({ name: input.name }),
        headers: { 'Content-Type': 'application/json' },
      }).then(r => r.json()),

    onMutate: async (input: UpdateUserInput) => {
      // Cancel any in-flight re-fetches for this user's detail key
      // so they don't overwrite our optimistic write.
      await queryClient.cancelQueries({ queryKey: userKeys.detail(input.id) });

      // Snapshot the current value for rollback.
      const previous = queryClient.getQueryData(userKeys.detail(input.id));

      // Apply optimistic update directly to the detail cache.
      queryClient.setQueryData(userKeys.detail(input.id), (old: NormalizedUsers | undefined) => {
        if (!old) return old;
        return {
          ...old,
          users: {
            ...old.users,
            [input.id]: { ...old.users[input.id], name: input.name },
          },
        };
      });

      return { previous };
    },

    onError: (_err, input, context) => {
      // Roll back to the snapshot captured in onMutate.
      if (context?.previous) {
        queryClient.setQueryData(userKeys.detail(input.id), context.previous);
      }
    },

    onSettled: (_data, _err, input) => {
      // Always re-fetch after settle to confirm server state — both the detail
      // and any list that might show this user's name.
      queryClient.invalidateQueries({ queryKey: userKeys.detail(input.id) });
      queryClient.invalidateQueries({ queryKey: userKeys.lists() });
    },
  });
}

// components/UserWithPosts.tsx — selector reconstructs the relational graph
import { useMemo } from 'react';
import { useUserList } from '../hooks/useUserList';

export function UserWithPosts({ userId, filters }: { userId: string; filters: Filters }) {
  const { data, isLoading } = useUserList(filters);

  const userWithPosts = useMemo(() => {
    if (!data) return null;
    const user = data.users[userId];
    if (!user) return null;
    // Resolve foreign-key references to full post objects at render time.
    // useMemo ensures this only runs when `data` or `userId` changes.
    const posts = user.postIds.map(id => data.posts[id]).filter(Boolean);
    return { ...user, posts };
  }, [data, userId]);

  if (isLoading) return <p>Loading…</p>;
  if (!userWithPosts) return <p>User not found</p>;

  return (
    <div>
      <h2>{userWithPosts.name}</h2>
      <ul>
        {userWithPosts.posts.map(post => (
          <li key={post.id}>{post.title}</li>
        ))}
      </ul>
    </div>
  );
}

The normalizeUsers function runs inside queryFn rather than inside select, which means TanStack Query stores the normalized form in the cache — not the raw nested payload. Every component that reads from userKeys.list(filters) gets the same normalized structure and the same post entity references. The onSettled in useUpdateUser issues two targeted invalidations: the detail key for the mutated user and the lists prefix (which matches all list keys regardless of filter) — this is the minimum invalidation scope that guarantees consistency without flushing unrelated queries.

Common Engineering Pitfalls

Symptom	Root Cause	Resolution
Two components show different values for the same user after a mutation	The same entity is stored under two separate query keys as a nested object, not as a normalized ref	Normalize at the network boundary; ensure both components read from the same key or from a shared entity store
`invalidateQueries` triggers a full page reload feel — all lists flash to loading	Invalidation targets `['users']` (entire namespace) instead of the minimal prefix	Use `userKeys.detail(id)` for single-record mutations; only escalate to `userKeys.lists()` when list order or membership may change
Infinite scroll appends duplicate items after a mutation that adds a new record	The new record already exists in an earlier page but is re-fetched at the top of page 1	Deduplicate by entity ID at list merge time using a `Set`; store list order as an ID array rather than embedding entity objects
Object reference changes on every render even when data is identical	`structuralSharing` disabled, or normalization creates new object literals every time even for unchanged fields	Re-enable `structuralSharing`; in the normalize function, compare incoming values to existing cache values before constructing new objects
Selector returns `null` for a record that was recently written via optimistic update	`onMutate` wrote to the detail key but the list key still references old data, and the selector reads from the list	Write the optimistic update to both the detail key and the relevant list key, or normalize the list to hold only IDs and resolve entities separately
Apollo query returns cached data after evict + re-query	`cache.evict` removes the normalized ref but `cache.gc()` must be called separately to fully purge unreferenced objects	Call `cache.gc()` immediately after `cache.evict` to trigger garbage collection

Frequently Asked Questions

When should I normalize cache data versus keeping nested API responses intact?

Normalize when the same entity appears in multiple queries or components, when you need to apply an optimistic update to a single record without invalidating every list that contains it, or when your payload exceeds roughly 50 records and duplication would materially inflate memory. Keep nested structures for isolated single-view responses that are never shared across query keys and where referential consistency is not a concern — a detail-only page with no related queries is a good candidate.

Why do my query keys produce duplicate cache entries even when the data is identical?

TanStack Query uses deep equality on the key array, so { page: 1, sort: 'asc' } and { sort: 'asc', page: 1 } are treated as the same key — but only if you sort object entries before inserting them. If your factory passes raw filter objects without sorting, JavaScript object key order differs between render cycles (especially after spreading or merging) and you accumulate phantom cache entries. Enforce sorted serialization in every key factory using Object.fromEntries(Object.entries(filters).sort()).

How does Apollo InMemoryCache handle normalization differently from RTK Query?

Apollo normalizes automatically by __typename + id (or a custom keyFields mapping), merging incoming data into a flat store on every network response without any explicit normalization step in your application code. RTK Query requires you to define a normalizedApi with createEntityAdapter instances and manually declare which tags each endpoint provides and invalidates. Apollo’s approach is zero-config for standard REST/GraphQL shapes; RTK Query’s tag system gives you explicit, auditable control over invalidation scope, which is easier to reason about in large codebases where multiple teams own different endpoints.

Can a single optimistic update mutate a record that appears in both a list query and a detail query?

Yes, but only if both query keys share normalized entity data. In Apollo, writing to the cache by ref via cache.modify automatically reflects in every query that reads that ref. In TanStack Query you must call queryClient.setQueryData for each affected key individually — unless you implement a shared entity store at a root key and derive list and detail views from it, which requires a custom selector architecture beyond what useQuery provides out of the box.

What gcTime and staleTime values are safe for a normalized entity store?

Entity detail caches should use a staleTime of 30 000–120 000 ms (background revalidation on focus) and gcTime of at least 300 000 ms (5 minutes) so navigating back does not cause a loading flash. List caches that aggregate many entities can use a shorter staleTime (0–10 000 ms) because list data is cheaper to re-fetch and more likely to diverge from server state. Never set gcTime shorter than your user’s expected navigation round-trip or you will evict entity detail data that is still being read by a mounted component.

Entity Mapping Strategies — how to define entity schemas, choose primary key conventions, and write the normalization transforms that feed this pipeline.
Nested Data Flattening Techniques — step-by-step techniques for flattening deeply nested REST and GraphQL responses before they reach the cache.
Pagination Normalization Patterns — cursor-based, offset, and infinite-scroll strategies that keep list metadata and entity records cleanly separated.
Relationship Stitching in Cache — how to store and resolve foreign-key relationships without embedding full objects or triggering cascading re-renders.
Cache Invalidation & Server Synchronization — the companion discipline: once your entities are normalized, this covers when and how to mark them stale and re-synchronize with the server.
State Architecture & Cache Fundamentals — foundational concepts — reference vs value storage, client vs server state boundaries, and cache layer architecture — that underpin every pattern on this page.