Entity Mapping Strategies

When raw API responses reach the client as deeply nested JSON trees, the cache stores multiple copies of the same logical entity under different query keys. The result is referential drift: a user record updated via mutation stays stale inside every query that embedded it. This page covers the transformation layer — entity mappers — that converts server payloads into flat, ID-indexed structures before they touch the store.

Entity mapping sits inside the broader Data Normalization & Query Key Design system. It is the step that makes relationship stitching in cache possible, and it directly determines whether pagination normalization patterns merge list pages cleanly or duplicate records. For a comparison of the tradeoffs this choice implies at the state-architecture level, see client vs server state boundaries.


Diagnostic checklist

You are in the right place if you observe any of these symptoms:

  • A mutation updates a record on the server, but some components still display the old value until a full page reload.
  • Paginating through a list inserts duplicate items because the same entity appears in two query key buckets.
  • DevTools shows the same object stored under three different keys (['user', 1], ['post', 7, 'author'], ['team', 3, 'members', 0]).
  • Invalidating a single query key causes a cascade of unrelated UI re-renders because child objects are embedded rather than referenced by ID.
  • Memory grows monotonically over a long session as normalized slices accumulate without garbage collection.

Prerequisites

Before implementing entity mappers, understand the following concepts (each is linked to its dedicated page):

  • Normalization principles for UI — why flat ID-indexed stores outperform nested object trees for read performance and cache coherence.
  • Reference vs value storage models — how structural sharing works in React Query’s structuralSharing option and Apollo’s normalized InMemoryCache, and why the distinction matters before you choose a mapping depth.
  • Designing stable query keys for React Query — query key structure must be derived from normalized entity IDs, not raw endpoint parameters, or the mapper and the cache will drift apart.

Architecture overview

The diagram below shows the transformation pipeline from network response to cache store. Each stage is a discrete boundary where a mapping or validation function operates before data moves to the next layer.

Entity mapping pipeline Four stages: API response arrives as nested JSON, passes through a Zod/io-ts boundary validator, enters the entity mapper that flattens nested objects into an ID-indexed dictionary, is written to the normalized cache store, and is finally projected to UI components via select or memoized selectors. API Response nested JSON Boundary Validator Zod / io-ts Entity Mapper flatten → ID dict Normalized Cache Store entities + ids UI select / memo reject — throw before cache write

Implementation 1 — Deterministic payload transformation with Zod

The goal of the first stage is to convert a validated server payload into a flat { entities, ids } slice ready for direct cache insertion.

Steps:

  1. Define a Zod schema that mirrors the raw API contract exactly, including nested relations.
  2. Parse the incoming array through the schema at the start of your queryFn (not inside the component). Zod throws synchronously on a schema mismatch, which React Query surfaces as an error state — preventing a malformed payload from reaching the cache.
  3. Iterate the validated array and build two structures: an entities record keyed by entity ID, and an ordered ids array that preserves server-side sort.
  4. Replace embedded child objects with arrays of their IDs (foreign keys). Store the child entities in a parallel slice.
// src/mappers/users.ts
import { z } from 'zod';

const PostSchema = z.object({
  id: z.string(),
  title: z.string(),
  publishedAt: z.string().nullable(),
});

const RawUserSchema = z.object({
  id: z.string(),
  name: z.string(),
  email: z.string().email(),
  posts: z.array(PostSchema),
});

type RawUser = z.infer<typeof RawUserSchema>;

export interface NormalizedUsers {
  entities: Record<string, { id: string; name: string; email: string; postIds: string[] }>;
  ids: string[];
}

export interface NormalizedPosts {
  entities: Record<string, z.infer<typeof PostSchema>>;
  ids: string[];
}

export function normalizeUsersPayload(raw: unknown): {
  users: NormalizedUsers;
  posts: NormalizedPosts;
} {
  // Throws ZodError on schema mismatch — React Query converts this to error state
  const validated = z.array(RawUserSchema).parse(raw);

  const userEntities: NormalizedUsers['entities'] = {};
  const userIds: string[] = [];
  const postEntities: NormalizedPosts['entities'] = {};
  const postIds: string[] = [];

  for (const user of validated) {
    // Replace embedded post objects with IDs (foreign-key flattening)
    userEntities[user.id] = {
      id: user.id,
      name: user.name,
      email: user.email,
      postIds: user.posts.map((p) => p.id),
    };
    userIds.push(user.id);

    for (const post of user.posts) {
      if (!postEntities[post.id]) {
        postEntities[post.id] = post;
        postIds.push(post.id);
      }
    }
  }

  return {
    users: { entities: userEntities, ids: userIds },
    posts: { entities: postEntities, ids: postIds },
  };
}

Cache Behavior Impact: React Query stores whatever the queryFn resolves with. Because normalizeUsersPayload returns a plain object, React Query’s structuralSharing (enabled by default) performs a deep equality check on each subsequent fetch — if only one user’s name changed, only that entry is replaced with a new object reference, keeping every unmodified user reference stable and preventing unnecessary re-renders in subscribed components. Zod’s parse runs synchronously before the promise resolves, so a malformed response never enters the cache; React Query transitions directly to status: 'error' with failureCount: 1 and respects your retry policy.

Configuration trade-offs:

  • Setting staleTime: Infinity on the normalized query prevents background refetches from overwriting optimistic cache patches mid-flight. Pair it with explicit invalidateQueries after mutation settlement.
  • gcTime (formerly cacheTime) controls how long the normalized slice persists after all subscribers unmount. Increase it (e.g. gcTime: 10 * 60 * 1000) on low-churn entities like org settings to survive route transitions without a refetch.
  • structuralSharing: true (default) is essential here — disabling it would cause every refetch to produce new object references even when data is unchanged, re-rendering every subscriber.

Implementation 2 — React Query adapter with select

The select option applies a read-time projection on the cached raw response without altering what is stored. This is the right pattern when you need different components to derive different shapes from the same server payload.

Steps:

  1. Write a queryFn that returns the raw validated payload.
  2. Pass select a pure function that maps the raw cache value to the shape the component needs.
  3. Derive the queryKey from the logical entity scope, not from URL parameters — see designing stable query keys for React Query for the full key structure guide.
  4. For mutations, write the normalized result directly to the affected query keys via queryClient.setQueryData to avoid a round-trip.
// src/hooks/useNormalizedUsers.ts
import { useQuery, useQueryClient, useMutation } from '@tanstack/react-query';
import { normalizeUsersPayload, type NormalizedUsers } from '../mappers/users';

// Raw fetch — stores the full normalizeUsersPayload result in cache
async function fetchUsers() {
  const res = await fetch('/api/v2/users');
  if (!res.ok) throw new Error(`HTTP ${res.status}`);
  return normalizeUsersPayload(await res.json());
}

// Hook A: component only needs the ID-ordered list
export function useUserIds() {
  return useQuery({
    queryKey: ['users', 'normalized'],
    queryFn: fetchUsers,
    staleTime: 30_000,
    // select runs per-subscriber at read time; does NOT alter cached value
    select: (data) => data.users.ids,
  });
}

// Hook B: component needs a single user by ID
export function useUser(userId: string) {
  return useQuery({
    queryKey: ['users', 'normalized'],
    queryFn: fetchUsers,
    staleTime: 30_000,
    select: (data) => data.users.entities[userId] ?? null,
  });
}

// Mutation: patch a user and update the cache without a refetch
export function useUpdateUser() {
  const queryClient = useQueryClient();

  return useMutation({
    mutationFn: (patch: { id: string; name: string }) =>
      fetch(`/api/v2/users/${patch.id}`, {
        method: 'PATCH',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(patch),
      }).then((r) => r.json()),

    onSuccess: (serverUser) => {
      // Write the single updated entity directly into the normalized cache slice
      queryClient.setQueryData(
        ['users', 'normalized'],
        (prev: ReturnType<typeof normalizeUsersPayload> | undefined) => {
          if (!prev) return prev;
          return {
            ...prev,
            users: {
              ...prev.users,
              entities: {
                ...prev.users.entities,
                [serverUser.id]: {
                  ...prev.users.entities[serverUser.id],
                  name: serverUser.name,
                },
              },
            },
          };
        },
      );
    },
  });
}

Cache Behavior Impact: Both useUserIds and useUser share a single underlying cache entry at ['users', 'normalized']. React Query calls select after retrieving the raw value from cache — not before storing it — so the network is only hit once regardless of how many subscribers are active. Each subscriber’s select result is memoized independently: useUser('u1') and useUser('u2') each compare their derived value to the previous call and skip a re-render if the entity did not change. The setQueryData call in onSuccess triggers structuralSharing internally, replacing only the modified entity reference while keeping every other user’s identity stable.

Configuration trade-offs:

  • select memoization uses Object.is comparison. If your selector returns a new array or object on every invocation (e.g. select: (d) => d.users.ids.slice(0, 10)), wrap it in useCallback or define it outside the component to prevent unnecessary re-renders.
  • Avoid storing the select output in state. React Query manages the derived value internally; double-storing it creates two sources of truth.
  • When multiple query keys need the same entity (e.g. ['users', 'normalized'] and ['dashboard']), prefer a single canonical key and derive from it rather than maintaining two normalized slices — the relationship stitching in cache patterns cover cross-key entity resolution in depth.

Implementation 3 — Apollo Client InMemoryCache with keyFields

Apollo’s InMemoryCache normalizes by __typename + id automatically. keyFields overrides the identity key when your server uses a non-standard field.

Steps:

  1. Configure InMemoryCache with typePolicies that declare keyFields for each type.
  2. Add merge functions for fields that return arrays to prevent Apollo from clobbering existing list entries on a partial update.
  3. Use read functions to derive computed values at read time (equivalent to React Query’s select).
  4. For polymorphic unions, add possibleTypes so Apollo can normalize interface-typed responses to the correct concrete type.
// src/apolloClient.ts
import { ApolloClient, InMemoryCache, gql } from '@apollo/client';

const cache = new InMemoryCache({
  typePolicies: {
    User: {
      // Use 'uuid' as the cache key instead of the default 'id'
      keyFields: ['uuid'],
      fields: {
        posts: {
          // Merge incoming posts with existing cached posts (pagination / refetch safety)
          merge(existing: readonly unknown[] = [], incoming: readonly unknown[]) {
            // Build a Set of existing refs to de-duplicate on append
            const seen = new Set(existing.map((ref) => (ref as { __ref: string }).__ref));
            const merged = [...existing];
            for (const item of incoming) {
              const ref = (item as { __ref: string }).__ref;
              if (!seen.has(ref)) {
                seen.add(ref);
                merged.push(item);
              }
            }
            return merged;
          },
        },
      },
    },
    Post: {
      keyFields: ['uuid'],
      fields: {
        // Computed read field: derive display title without storing it
        displayTitle: {
          read(_, { readField }) {
            const title = readField<string>('title');
            const publishedAt = readField<string | null>('publishedAt');
            return publishedAt ? title : `[Draft] ${title}`;
          },
        },
      },
    },
  },
  possibleTypes: {
    // Polymorphic interface — Apollo needs the concrete types to normalize correctly
    ContentNode: ['Post', 'Video', 'Poll'],
  },
});

export const client = new ApolloClient({
  uri: '/graphql',
  cache,
});

// Usage in a component
export const USERS_QUERY = gql`
  query GetUsers {
    users {
      uuid
      name
      email
      posts {
        uuid
        title
        publishedAt
        displayTitle @client
      }
    }
  }
`;

Cache Behavior Impact: When Apollo writes a GetUsers result, InMemoryCache walks every object in the response, reads __typename + the configured keyFields, and stores each entity at a stable cache ID (e.g. User:abc-123). If a mutation response includes the same User:abc-123 with an updated name, Apollo merges only that field — all other queries that reference User:abc-123 by cache ID instantly reflect the change without an additional network request. The merge function on posts prevents the common bug where a partial list fetch replaces the full cached list; instead, it appends only new post refs. The @client directive on displayTitle triggers the local read function, computed fresh from the cached fields on each read without storing a derived copy.

Configuration trade-offs:

  • Omitting a merge function on a list field causes Apollo to emit a warning and replace the cached array entirely on every partial refetch — a silent data loss bug under pagination.
  • keyFields: false disables normalization for a type, embedding it inline as a value object. Use this deliberately for immutable value types (currency amounts, timestamps) to avoid polluting the normalized store with objects that will never be updated individually.
  • Apollo’s fetchPolicy: 'cache-and-network' always fires a network request after serving the cache, which is useful for high-churn entities but doubles request frequency. Use cache-first with explicit refetchQueries in mutations for entities that change only via user action.

Common Pitfalls & Resolutions

Observable Issue Root Cause Diagnostic Resolution
Mutation updates a user but some components still show the stale name Mutation response bypasses the entity mapper and writes a partial object directly to a different query key Route all mutation responses through queryClient.setQueryData using the same normalized key and mapper shape; alternatively use Apollo’s automatic cache normalization via keyFields so any mutation including the entity ID triggers a merge
Paginated list shows duplicate rows after fetching page 2 New page entities are appended raw rather than merged through an ID-deduplication step Add an existingIds Set check in the mapper loop (as shown in Implementation 1); in Apollo, add a merge function that guards on __ref identity
select callback causes re-render on every background refetch even when data did not change select returns a new array/object reference on each call (e.g. .filter(...) inline) Stabilize the selector: move it outside the component, memoize it with useCallback, or use a stable selector library like reselect
Memory grows over a long session until the tab crashes Normalized entities are inserted without gcTime or garbage collection; no subscriber eviction Set a finite gcTime (default 5 min in React Query v5); for Apollo, call cache.evict({ id: 'User:abc' }) + cache.gc() after mutations that delete entities
Zod parse fails in production for a valid-looking payload Server added a new non-nullable field not reflected in the local schema Use .passthrough() temporarily to unblock; schedule a schema sync; consider z.object({...}).strip() for fields you intentionally ignore

Frequently Asked Questions

Should I normalize at the fetcher level or inside the UI component?

Always normalize at the fetcher or query adapter level. Component-level normalization runs per subscriber, producing redundant parse cycles for every component that calls the same hook. More critically, it breaks the single-source-of-truth guarantee: two components that independently normalize the same raw cache entry can produce different shapes depending on their rendering order or filter conditions, causing divergent UI state that is nearly impossible to debug.

Does the React Query select option store the transformed result in the cache?

No. The raw value returned by queryFn is stored. select is applied per subscriber at read time, after React Query retrieves the raw value from its internal store. This means: (a) multiple components can derive different shapes from a single cache entry without additional network requests; (b) invalidating the underlying key refetches and renormalizes once, then re-applies each subscriber’s select independently; © if you need the transformed result in another query’s queryFn, you must call queryClient.getQueryData and apply the mapper manually — select is not visible to the query client itself.

How do optimistic updates interact with entity mapping?

Optimistic payloads must follow the exact same normalized schema as the eventual server response. In React Query, write the optimistic value via queryClient.setQueryData using the same mapper output type before the mutation fires (onMutate), then roll back with the snapshot in onError. If the optimistic shape diverges from the server response shape — for example, the optimistic update omits postIds — the rollback will produce a structural mismatch that structuralSharing cannot resolve, causing a visible flash. In Apollo, pass optimisticResponse with __typename included so InMemoryCache can apply the same merge policy it would use for a real server response.

When should I use Apollo's InMemoryCache keyFields over a manual mapper function?

Use keyFields when your API consistently provides __typename + a stable unique ID for every object type and you want normalization to be automatic across all queries and mutations. Use a manual mapper function (as in Implementations 1 and 2) when: the server omits __typename (REST, non-GraphQL); you need to rename, split, or derive fields during normalization (not just identify entities); the normalization shape differs from the wire format for business logic reasons; or you are using React Query, SWR, or RTK Query rather than Apollo. The two approaches are not mutually exclusive — you can wrap Apollo responses in a mapper for shape transformation and still rely on InMemoryCache for entity identity and merge.


  • Data Normalization & Query Key Design — the parent reference covering the full normalization system, including query key structure, entity lifecycle, and cross-framework patterns.
  • Designing Stable Query Keys for React Query — a focused implementation guide: how to structure query keys so they derive from normalized entity IDs rather than raw URL parameters, preventing cache key drift after mapping.
  • Relationship Stitching in Cache — how to resolve foreign-key references across normalized slices at read time, the step that follows entity mapping in the data pipeline.
  • Pagination Normalization Patterns — extending entity mapping to cursor-based and offset pagination, including the merge strategies that prevent duplicate records across pages.
  • Nested Data Flattening Techniques — sibling coverage for deeply nested GraphQL responses where multiple levels of embedding must be collapsed before ID-indexed storage.
  • Cache Layer Architecture — the foundational layer below entity mapping: understanding where the normalized store sits relative to the network, service worker, and rendering layers.