Pagination Normalization Patterns

Paginated APIs expose a structural tension that breaks naive cache implementations: the same entity can appear in multiple pages, traversal metadata mutates independently of entity data, and rapid scrolling generates overlapping concurrent fetches that race to write the same cache slots. Resolving this requires applying the Data Normalization & Query Key Design discipline specifically to list traversal — separating cursor state, page metadata, and entity storage into distinct cache layers with different invalidation lifecycles.

This guide addresses the concrete implementation decisions: how to build a unified adapter that handles both offset and cursor APIs, how to architect the resultIds / entities split that prevents re-render storms, and how to enforce deduplication merge semantics that survive concurrent fetches. For the cross-cutting normalization theory that underpins this work, see Entity Mapping Strategies and Relationship Stitching in Cache.

Diagnostic Checklist

You are likely hitting a pagination normalization problem if you observe:

Duplicate rows appearing in an infinite-scroll list after rapid scrolling or a page refresh
A single entity mutation (e.g. updating a user’s name) causing the entire paginated list to re-render from scratch
Cursor-based pagination silently skipping records when a backend delete shifts item positions between page fetches
useInfiniteQuery returning undefined for data.pages[n] mid-scroll when a concurrent fetch resolves out of order
Memory footprint growing unbounded during infinite scroll because full entity payloads are stored inside each page array rather than a flat entity dictionary
Inconsistent totalCount values across different list views that share the same underlying entities

Prerequisites

Before implementing the patterns below, you should be comfortable with:

Flat entity stores: understand the resultIds + entities split and why nested array storage causes re-render cascades (covered in Entity Mapping Strategies)
Query key hierarchies: know how TanStack Query v5 matches invalidation patterns against key arrays (covered in Designing Stable Query Keys for React Query)
Foreign key resolution: understand how the normalized store resolves entity references at read-time rather than write-time (covered in Relationship Stitching in Cache)
TanStack Query v5 useInfiniteQuery: familiarity with initialPageParam, getNextPageParam, fetchNextPage, and isFetchingNextPage

Implementation 1 — Unified Offset/Cursor Adapter

The first decision point is whether to write two separate fetch functions (one for offset APIs, one for cursor APIs) or to route both through a shared adapter that normalizes them to a common PageData contract. The unified approach pays off when a single UI must talk to multiple endpoints that use different pagination schemes — common in microservice architectures.

Steps:

Define a PageData<T> interface that captures both cursor and offset metadata alongside the item array.
Write two thin protocol adapters (CursorAdapter and OffsetAdapter) that each implement the shared interface.
Pass the active adapter to useInfiniteQuery so the hook remains agnostic to the underlying API shape.
Use the select projection to strip traversal metadata before the data reaches the component, preventing metadata changes from triggering entity re-renders.

import { useInfiniteQuery, InfiniteData } from '@tanstack/react-query';

// ── Shared contract ────────────────────────────────────────────────────────
interface PageData<T> {
  items: T[];
  nextCursor: string | null;
  cursor: string;        // the cursor used to fetch this page
  hasNextPage: boolean;
  totalCount?: number;
}

type PaginationAdapter<T> = {
  queryFn: (pageParam: unknown) => Promise<PageData<T>>;
  getNextPageParam: (lastPage: PageData<T>) => string | undefined;
  initialPageParam: unknown;
};

// ── Cursor adapter ─────────────────────────────────────────────────────────
function cursorAdapter<T>(endpoint: string): PaginationAdapter<T> {
  return {
    initialPageParam: null,
    getNextPageParam: (last) => last.nextCursor ?? undefined,
    queryFn: ({ pageParam }) =>
      fetch(`${endpoint}?cursor=${pageParam ?? ''}`).then((r) => {
        if (!r.ok) throw new Error(`Fetch failed: ${r.status}`);
        return r.json() as Promise<PageData<T>>;
      }),
  };
}

// ── Offset adapter ─────────────────────────────────────────────────────────
const PAGE_SIZE = 25;

function offsetAdapter<T>(endpoint: string): PaginationAdapter<T> {
  return {
    initialPageParam: 0,
    getNextPageParam: (last, _, lastPageParam) =>
      last.hasNextPage ? (lastPageParam as number) + PAGE_SIZE : undefined,
    queryFn: ({ pageParam }) =>
      fetch(`${endpoint}?offset=${pageParam}&limit=${PAGE_SIZE}`).then((r) => {
        if (!r.ok) throw new Error(`Fetch failed: ${r.status}`);
        return r.json() as Promise<PageData<T>>;
      }),
  };
}

// ── Hook (adapter-agnostic) ────────────────────────────────────────────────
interface User { id: string; name: string; email: string }

export function useNormalizedList(endpoint: string, mode: 'cursor' | 'offset') {
  const adapter =
    mode === 'cursor' ? cursorAdapter<User>(endpoint) : offsetAdapter<User>(endpoint);

  return useInfiniteQuery<PageData<User>, Error, { ids: string[]; hasNext: boolean }>({
    queryKey: ['entities', endpoint, mode],
    queryFn: adapter.queryFn,
    initialPageParam: adapter.initialPageParam,
    getNextPageParam: adapter.getNextPageParam,
    staleTime: 30_000,
    gcTime: 5 * 60_000,
    structuralSharing: true,
    // select strips cursor metadata — component only sees IDs and a boolean
    select: (data: InfiniteData<PageData<User>>) => ({
      ids: data.pages.flatMap((p) => p.items.map((item) => item.id)),
      hasNext: data.pages[data.pages.length - 1]?.hasNextPage ?? false,
    }),
  });
}

Cache Behavior Impact: The select projection runs after every fetch and after every cache read. By returning only ids and hasNext, the component’s dependency on cursor strings is severed — when TanStack Query updates the nextCursor on the last page internally, the select output is structurally identical to the previous render if no new IDs arrived, so React skips the re-render entirely. The structuralSharing: true default means TanStack Query performs a deep equality check on each page object; only pages with new data get a new object reference.

Configuration Trade-offs:

Setting staleTime: 30_000 prevents immediate background refetches when the user navigates away and returns, but means list order can be 30 seconds stale after a backend mutation — tune against your update frequency.
gcTime: 5 * 60_000 keeps all loaded pages in memory for five minutes after the last subscriber unmounts. For very long lists (200+ pages), consider reducing gcTime or implementing virtual scroll with page eviction.
The select projection re-executes on every render that reads from the cache. If ids is large (500+), memoize the flatMap result with useMemo inside select or move it to a dedicated selector function.
Mixing cursor and offset adapters under the same queryKey prefix (['entities', endpoint]) allows queryClient.invalidateQueries({ queryKey: ['entities', endpoint] }) to flush both schemes simultaneously — useful after a mutation that might affect either list view.

Implementation 2 — Normalized List State Architecture

Raw useInfiniteQuery stores full entity payloads inside each page array. When entity A appears on page 1 and page 3, two copies exist in cache. When a mutation updates entity A, you must either flush the entire list (expensive) or leave stale copies in other pages (incorrect). The resultIds / entities split eliminates both problems by ensuring each entity has exactly one canonical record.

Steps:

After fetching, extract entity records into a flat entities dictionary keyed by ID.
Store only the ordered ID array in the list slice — never the full entity objects.
Write a memoized selector that joins resultIds against entities at read-time, analogous to the lazy stitching pattern from Relationship Stitching in Cache.
On mutation, update only the entity’s record in the flat dictionary; the resultIds order is unaffected and the list does not re-render.

import { useQueryClient, useQuery, useMutation } from '@tanstack/react-query';

interface User { id: string; name: string; avatar: string }

interface NormalizedListState {
  resultIds: string[];
  entities: Record<string, User>;
  nextCursor: string | null;
  hasNextPage: boolean;
}

// ── Selector: join IDs → entities at read-time ─────────────────────────────
function selectOrderedUsers(state: NormalizedListState): User[] {
  return state.resultIds.map((id) => state.entities[id]).filter(Boolean) as User[];
}

// ── Read hook ──────────────────────────────────────────────────────────────
export function useNormalizedUsers(listKey: string) {
  return useQuery<NormalizedListState, Error, User[]>({
    queryKey: ['normalizedList', listKey],
    queryFn: () => fetch(`/api/users?key=${listKey}`).then((r) => r.json()),
    select: selectOrderedUsers,
    staleTime: 60_000,
  });
}

// ── Mutation: update entity without touching resultIds ─────────────────────
export function useUpdateUser(listKey: string) {
  const queryClient = useQueryClient();

  return useMutation<User, Error, { id: string; name: string }>({
    mutationFn: ({ id, name }) =>
      fetch(`/api/users/${id}`, {
        method: 'PATCH',
        body: JSON.stringify({ name }),
        headers: { 'Content-Type': 'application/json' },
      }).then((r) => r.json()),

    // Optimistic: update only the entity slot
    onMutate: async ({ id, name }) => {
      await queryClient.cancelQueries({ queryKey: ['normalizedList', listKey] });
      const previous = queryClient.getQueryData<NormalizedListState>(['normalizedList', listKey]);

      queryClient.setQueryData<NormalizedListState>(['normalizedList', listKey], (old) => {
        if (!old) return old;
        return {
          ...old,
          entities: {
            ...old.entities,
            [id]: { ...old.entities[id], name },
          },
          // resultIds is untouched — list order does not change
        };
      });

      return { previous };
    },

    onError: (_err, _vars, context) => {
      if (context?.previous) {
        queryClient.setQueryData(['normalizedList', listKey], context.previous);
      }
    },

    onSettled: () => {
      // Invalidate only the specific entity, not the whole list
      queryClient.invalidateQueries({ queryKey: ['normalizedList', listKey] });
    },
  });
}

Cache Behavior Impact: queryClient.setQueryData with the entity-only patch triggers a structural sharing comparison on the NormalizedListState object. TanStack Query detects that resultIds is the same array reference as before (we spread ...old without touching resultIds), so the selector selectOrderedUsers receives the same ID sequence. React only re-renders components that read the specific updated entity — not the entire list. The onSettled invalidation refetches from the server to confirm the optimistic value, but the visual update is instant.

Configuration Trade-offs:

Storing entities as a flat Record<string, User> enables O(1) mutation updates but requires the selector to perform resultIds.length lookups per render. For lists under ~1000 items this is negligible; for larger lists, use useMemo to cache the join result between renders.
Omitting entities from the resultIds array (via .filter(Boolean)) silently drops items deleted by concurrent users. In production, make the missing-entity case explicit: return a placeholder and trigger a targeted refetch for that ID.
If the normalized store is managed by RTK Query’s normalizr integration or Apollo InMemoryCache, you can skip manual entity extraction and read entity updates directly from the cache layer — but you still need an explicit resultIds slice to maintain list order.

Implementation 3 — Duplicate-Free Infinite Scroll Merge

Merging Paginated Lists Without Duplicates is the most operationally risky step: a backend that uses keyset pagination may return the same item on consecutive pages when an insert shifts item positions between fetches. Your client-side merge must be the last line of defence.

Steps:

Maintain the canonical resultIds set as a Set<string> to make membership checks O(1) regardless of list length.
Before appending any incoming IDs, filter them through the existing set.
Spread incoming.entities onto existing.entities to merge entity records — later pages’ data wins, which is correct because it is fresher.
Update nextCursor from the incoming page, never from the existing state.
Guard fetchNextPage with isFetchingNextPage and hasNextPage checks to prevent duplicate in-flight requests.

import { useInfiniteQuery } from '@tanstack/react-query';
import { useMemo, useCallback } from 'react';

interface Item { id: string; title: string; updatedAt: string }

interface PageData {
  items: Item[];
  nextCursor: string | null;
  hasNextPage: boolean;
}

// ── Merge function ─────────────────────────────────────────────────────────
interface MergedState {
  resultIds: string[];
  entities: Record<string, Item>;
  nextCursor: string | null;
}

function mergePages(pages: PageData[]): MergedState {
  const entities: Record<string, Item> = {};
  const seenIds = new Set<string>();
  const resultIds: string[] = [];

  for (const page of pages) {
    for (const item of page.items) {
      // Always update entities (later page = fresher data)
      entities[item.id] = item;
      // Only add to ordered list once
      if (!seenIds.has(item.id)) {
        seenIds.add(item.id);
        resultIds.push(item.id);
      }
    }
  }

  const lastPage = pages[pages.length - 1];
  return {
    resultIds,
    entities,
    nextCursor: lastPage?.nextCursor ?? null,
  };
}

// ── Hook ───────────────────────────────────────────────────────────────────
export function useInfiniteItems(endpoint: string) {
  const query = useInfiniteQuery<PageData, Error>({
    queryKey: ['infiniteItems', endpoint],
    queryFn: ({ pageParam }) =>
      fetch(`${endpoint}?cursor=${pageParam ?? ''}`).then((r) => {
        if (!r.ok) throw new Error(`${r.status}`);
        return r.json() as Promise<PageData>;
      }),
    initialPageParam: null,
    getNextPageParam: (last) => last.nextCursor ?? undefined,
    staleTime: 20_000,
    gcTime: 10 * 60_000,
    structuralSharing: true,
    refetchOnWindowFocus: false, // avoid mid-scroll refetches resetting cursor chain
  });

  // Derive merged state outside select to keep the raw pages available
  const merged = useMemo(
    () => (query.data ? mergePages(query.data.pages) : null),
    [query.data]
  );

  const loadMore = useCallback(() => {
    if (!query.isFetchingNextPage && query.hasNextPage) {
      query.fetchNextPage();
    }
  }, [query.isFetchingNextPage, query.hasNextPage, query.fetchNextPage]);

  return { merged, loadMore, isLoading: query.isLoading, isError: query.isError };
}

Cache Behavior Impact: useInfiniteQuery stores each page as a discrete entry inside data.pages. When page N arrives, TanStack Query appends it to the array and runs structuralSharing on the full InfiniteData object: pages 0 through N-1 retain their previous object references, and only the new page slot gets a new reference. The mergePages call in useMemo then fires with a new query.data reference (because the pages array changed), but the resulting merged.entities objects for entities that have not changed will be identical references to the previous merge output — React can skip those sub-tree re-renders. Setting refetchOnWindowFocus: false prevents TanStack Query from re-fetching all pages when the user alt-tabs, which would rebuild the mergePages loop over potentially hundreds of pages on return.

Configuration Trade-offs:

refetchOnWindowFocus: false improves UX for long scroll sessions but means entity data can drift from the server during an extended session. Consider a targeted invalidation after 5 minutes of inactivity instead of disabling window-focus refetches entirely.
gcTime: 10 * 60_000 (10 minutes) retains all loaded pages after unmount, enabling instant hydration when the user navigates back — but 10 minutes × many pages × entity size can exhaust memory on low-end devices. Profile with the Performance tab before deploying to mobile-first audiences.
The mergePages loop runs in O(n) where n is the total item count across all loaded pages. For feeds exceeding 500 items, move this computation to a Web Worker or use a memoized reducer so the main thread is not blocked during scroll events.
Setting staleTime: 20_000 allows a 20-second window where navigating back to the list shows cached data. Adjust based on how frequently your backend pushes new items — real-time feeds should use a shorter staleTime or a push-based invalidation strategy via background refetch strategies.

Common Pitfalls & Resolutions

Observable Issue	Root Cause	Diagnostic Resolution
Duplicate rows in infinite scroll after rapid downward scroll	`fetchNextPage` called before previous page committed — two pages share items due to backend keyset overlap	Add `if (isFetchingNextPage \|\| !hasNextPage) return;` guard before `fetchNextPage`; add Set-based deduplication in `mergePages`
Entire list re-renders when a single entity name is updated via mutation	Entity payload stored directly inside the `pages` array rather than in a flat `entities` dictionary	Migrate to the `resultIds + entities` split from Implementation 2; update only the entity slot in `setQueryData`
`nextCursor` from page N is used to fetch page N instead of page N+1	`getNextPageParam` reads from `firstPage` rather than `lastPage`, or `pageParam` is not forwarded correctly to the `queryFn`	Verify `getNextPageParam` receives `(lastPage, allPages, lastPageParam)` and returns `lastPage.nextCursor ?? undefined`, not `firstPage.nextCursor`
Cursor-based list skips records after a backend delete	Keyset cursor points to a deleted record; backend advances the cursor beyond the gap	After mutations that delete items, call `queryClient.invalidateQueries({ queryKey: ['infiniteItems', endpoint] })` to restart pagination from the beginning rather than patching cursors client-side
`totalCount` on the first page is stale after new items are added	`totalCount` is cached with the page data but not invalidated when the list grows	Exclude `totalCount` from `staleTime` caching by refetching the count separately on a shorter `staleTime`, or derive it from the flat `entities` dictionary length after full load

Frequently Asked Questions

Should pagination metadata be normalized into the entity store alongside item data?

No. Metadata like nextCursor, hasNextPage, and totalCount belongs to the list traversal context, not the entity graph. Store it in a dedicated list-scoped slice keyed by the pagination query key. This prevents metadata mutations from triggering entity cache evictions and allows independent invalidation of the cursor chain when a single entity changes.

How do I prevent stale cursor references after a mutation that reorders or removes items?

After any mutation that affects list ordering (deletes, re-ranks, status changes), call queryClient.invalidateQueries({ queryKey: ['entities', endpoint, 'cursor'] }) to flush all pages. Do not attempt surgical cursor patching — the backend cannot guarantee cursor stability across mutations, so a full list invalidation is the only safe recovery path. If invalidating all pages is too expensive, surface a “Refresh list” prompt to the user instead of auto-refetching.

What happens when two concurrent infinite-scroll fetches return overlapping pages in TanStack Query v5?

TanStack Query v5 serializes page fetches through getNextPageParam, but rapid scroll events can still trigger duplicate requests before the previous page has committed. Guard against this by checking isFetchingNextPage before calling fetchNextPage, and implement a Set-based deduplication merge in your select projection so that duplicate IDs from race-condition overlaps are filtered before the result array reaches your component.

How does structuralSharing interact with merged infinite-scroll pages?

TanStack Query’s structuralSharing (enabled by default) performs a deep equality check on each page object before replacing its reference. When a new page arrives, only the changed page slice gets a new reference — existing pages retain their identity. This means React can skip reconciliation for all stable page entries, making structuralSharing critical for large lists where a single append should not re-render all prior rows.

Data Normalization & Query Key Design — the parent section covering entity mapping, query key hierarchies, and cache topology decisions that this work builds on.
Merging Paginated Lists Without Duplicates — deep-dive into the Set-based merge algorithm, including edge cases for real-time feeds where items can shift between pages between fetches.
Normalizing Cursor-Based Pagination — specific recipes for opaque cursor formats, Base64-encoded keysets, and cursor validation at the adapter boundary.
Entity Mapping Strategies — the upstream normalization pass that extracts entity records from raw API responses before they reach the pagination layer.
Relationship Stitching in Cache — how to resolve foreign keys between entities at read-time so that paginated lists composed of relational records stay consistent without re-fetching related resources.
Background Refetch Strategies — when and how to configure refetchInterval, refetchOnWindowFocus, and SWR-style revalidation for paginated endpoints that must stay fresh without blocking scroll.