Normalizing Cursor-Based Pagination

Implementing cursor-based pagination without a disciplined normalization layer produces three predictable failure modes: cache fragmentation (pages stored as disconnected slices rather than a unified entity graph), cursor drift (stale or mutated tokens that shift page boundaries on every refetch), and duplicate entity hydration (the same record appearing twice because overlapping page ranges were merged naively). This page shows how to fix all three by applying the Pagination Normalization Patterns your team already relies on for list management, using production-ready TypeScript with TanStack Query v5 and Apollo Client v3. If you are also fighting duplicates produced by concurrent fetchNextPage calls, read Merging Paginated Lists Without Duplicates alongside this guide — the two techniques are complementary.


Before you start: diagnostic checklist

Check all five items before writing code. Most cursor bugs are traceable to exactly one of these triggers.

  • Cursor tokens are transformed before the next request — URI-encoding, JSON-stringifying, or trimming an opaque base64 cursor changes its byte sequence and breaks the API contract silently.
  • Pagination metadata lives inside the entity array — if nextCursor is a sibling of entity fields inside the same object, structural sharing algorithms treat it as an entity property and pollute the primary data graph.
  • Query keys depend on the entire cursor chain — a key like ['feed', cursor1, cursor2, cursor3] causes React Query to invalidate every page when any one cursor changes, not just the affected slice.
  • Concurrent fetchNextPage calls share no deduplication guard — rapid scrolling triggers multiple in-flight requests; without request cancellation or a version vector, the oldest response may overwrite the newest normalized state.
  • Normalized entity map is never pruned — an unbounded entities object retains every record ever fetched, causing JS heap growth that DevTools heap snapshots can confirm.

Data-flow: cursor page lifecycle

The diagram below shows the full lifecycle — from API response through the normalization reducer to the cache — and highlights where each of the three failure modes is injected.

Cursor page lifecycle: normalization pipeline A flow diagram showing an API response split into cursor metadata and entity array, both passing through a normalization reducer that produces a flat entity map and a separate pagination metadata slice, which combine into the unified cache. API Response entities: [...] nextCursor: "abc123" Cursor slice nextCursor, hasNextPage Entity array id, fields… ⚠ mutate here → drift ⚠ no dedup → duplicates Normalization reducer Set-based dedup + merge Unified cache entities{} + pagination{} cursor metadata isolated ✓ targetable invalidation ✓

Step 1 — Separate cursor metadata from entity data at the boundary

The normalization pipeline starts at the point where the API response enters your code. The single most impactful change you can make is to split the response into two independent shapes before touching the cache: a pagination object that stores cursor tokens, and an entities record that stores normalized data keyed by ID.

// types.ts
export interface CursorPage<T> {
  items: T[];
  nextCursor: string | null;
  hasNextPage: boolean;
}

export interface NormalizedFeedState<T extends { id: string }> {
  entities: Record<string, T>;
  pageIds: string[];        // ordered list of entity IDs in scroll order
  pagination: {
    nextCursor: string | null;
    hasNextPage: boolean;
    lastFetchedAt: number;  // derived from response header when possible
  };
}

// normalizePageResponse.ts
export function normalizePageResponse<T extends { id: string }>(
  incoming: CursorPage<T>,
  existing: NormalizedFeedState<T>,
): NormalizedFeedState<T> {
  // Build entity map: shallow-merge incoming fields onto existing record
  const entities = { ...existing.entities };
  for (const item of incoming.items) {
    entities[item.id] = { ...entities[item.id], ...item };
  }

  // Append only net-new IDs to preserve insertion order
  const existingSet = new Set(existing.pageIds);
  const nextIds = [
    ...existing.pageIds,
    ...incoming.items.filter((i) => !existingSet.has(i.id)).map((i) => i.id),
  ];

  return {
    entities,
    pageIds: nextIds,
    pagination: {
      nextCursor: incoming.nextCursor,
      hasNextPage: incoming.hasNextPage,
      lastFetchedAt: Date.now(), // replace with server Date header when available
    },
  };
}

Cache Behavior Analysis. TanStack Query’s structuralSharing setting (enabled by default) performs a deep equality check between the previous and incoming query data before deciding whether to trigger re-renders. When cursor tokens are stored alongside entity data, structuralSharing cannot isolate stable entity objects from changing cursor strings, so it replaces the entire reference tree on every page fetch — causing every list item to re-render. Splitting the shapes lets React only re-render components that subscribe to pagination.nextCursor.

Trade-offs.

  • structuralSharing: true (the default) keeps prior page object references stable when fields are unchanged. If you set structuralSharing: false to simplify merge logic, every fetchNextPage causes a full re-render.
  • Using Date.now() for lastFetchedAt is a reasonable client-side proxy, but staleness calculations become unreliable when the client clock drifts. Prefer the Date response header when your API sets it.

Step 2 — Wire the reducer into TanStack Query useInfiniteQuery

TanStack Query v5 exposes getNextPageParam to extract the cursor from each page and select to transform the raw page data into your normalized shape before it reaches components.

// useFeed.ts
import { useInfiniteQuery, useQueryClient } from "@tanstack/react-query";
import type { NormalizedFeedState } from "./types";
import { normalizePageResponse } from "./normalizePageResponse";

interface FeedItem { id: string; title: string; createdAt: string }

async function fetchFeedPage(
  cursor: string | null,
  signal: AbortSignal,
): Promise<CursorPage<FeedItem>> {
  const url = cursor ? `/api/feed?cursor=${cursor}` : "/api/feed";
  const res = await fetch(url, { signal });
  if (!res.ok) throw new Error(`Feed fetch failed: ${res.status}`);
  return res.json();
}

export function useFeed() {
  return useInfiniteQuery({
    queryKey: ["feed"] as const,     // base key only — cursor is NOT in the key
    queryFn: ({ pageParam, signal }) =>
      fetchFeedPage(pageParam ?? null, signal),  // signal wires AbortController
    initialPageParam: null as string | null,
    getNextPageParam: (lastPage) => lastPage.nextCursor ?? undefined,
    staleTime: 30_000,               // pages are fresh for 30 s; no background refetch
    gcTime: 5 * 60_000,             // keep inactive pages 5 min before GC
    select: (data) =>
      // Reduce all raw pages into one unified normalized state
      data.pages.reduce<NormalizedFeedState<FeedItem>>(
        (acc, page) => normalizePageResponse(page, acc),
        { entities: {}, pageIds: [], pagination: { nextCursor: null, hasNextPage: true, lastFetchedAt: 0 } },
      ),
  });
}

Cache Behavior Analysis. Keeping queryKey: ["feed"] — without embedding the cursor — is deliberate. TanStack Query appends page parameters internally to each InfiniteQueryObserver page slot; if you also embed the cursor in the query key, targeted invalidateQueries({ queryKey: ["feed"] }) still works, but removeQueries will match only the specific cursor slice and leave stale pages dangling. A base key also enables refetching from the first page when the user triggers a full cache invalidation. The signal parameter threads the native AbortSignal from each queryFn invocation, so React Query automatically cancels in-flight requests when the component unmounts or when a superseding fetch fires.

Trade-offs.

  • staleTime: 30_000 prevents background refetches during rapid scrolling but means users see data that is up to 30 seconds old. Lower this to 0 on feeds where real-time order matters.
  • gcTime: 5 * 60_000 (formerly cacheTime) keeps the inactive query in memory for 5 minutes. For very large feeds this adds memory pressure; consider gcTime: 60_000 on lower-end devices.

Step 3 — Apollo Client: isolate cursor pages with typePolicies

In Apollo Client v3, the InMemoryCache normalizes by __typename:id by default. Cursor pagination requires a custom merge policy so Apollo treats each page append as an update to the same cached list rather than a replacement.

// apolloClient.ts
import { ApolloClient, InMemoryCache } from "@apollo/client";

export const client = new ApolloClient({
  uri: "/graphql",
  cache: new InMemoryCache({
    typePolicies: {
      Query: {
        fields: {
          feed: {
            // Merges incoming edges onto the existing list without duplicates
            keyArgs: [],              // no cursor in cache key → one list slot
            merge(
              existing: { edges: any[]; pageInfo: any } = { edges: [], pageInfo: {} },
              incoming: { edges: any[]; pageInfo: any },
            ) {
              const existingIds = new Set(
                existing.edges.map((e: any) => e.__ref ?? e.node?.id),
              );
              const novelEdges = incoming.edges.filter(
                (e: any) => !existingIds.has(e.__ref ?? e.node?.id),
              );
              return {
                // pageInfo forwarded from incoming — never merge with stale values
                pageInfo: incoming.pageInfo,
                edges: [...existing.edges, ...novelEdges],
              };
            },
          },
        },
      },
    },
  }),
});

Cache Behavior Analysis. Setting keyArgs: [] collapses all cursor variants of feed(cursor: "…") onto a single normalized cache slot. Apollo merges incoming edges into that slot via the merge function, and its read field (omitted here for brevity) exposes the accumulated list to queries. Without keyArgs: [], each distinct cursor value creates a separate cache entry, producing the fragmentation symptom: the cache holds many small disconnected page slices instead of one unified list.

Trade-offs.

  • keyArgs: [] means a refetch() or cache.evict on feed always resets the entire list, not just the affected page. For filtered feeds (e.g., feed(category: "tech")), include filter args in keyArgs while still omitting the cursor: keyArgs: ["category"].
  • Apollo’s default GC removes normalized objects not reachable from any active query. Very long scroll sessions may retain thousands of FeedItem references. Call client.cache.gc() periodically or on route change to recover memory.

Edge cases and gotchas

Gotcha 1 — cursor token mutation

Opaque cursors (typically base64-encoded strings) break silently when modified. Anything that JSON-serializes and re-parses the cursor (e.g., JSON.stringifyJSON.parse round-trips in a Redux reducer) may add or drop whitespace depending on the runtime. Store cursors as raw strings in a dedicated pagination slice and never pass them through any transform. Validate with a type guard:

function isValidCursor(value: unknown): value is string {
  return typeof value === "string" && value.length > 0 && value.length <= 512;
}

Reject anything that fails the guard and fall back to a full refetch from the initial cursor rather than propagating a corrupted token.

Gotcha 2 — race condition on rapid scroll

When a user scrolls fast enough to outpace in-flight requests, two fetchNextPage calls fire simultaneously. The older response, which resolves later due to network jitter, may overwrite the normalized state set by the newer response. TanStack Query cancels superseded requests automatically via AbortSignal, but only when the queryFn passes the signal to fetch (as shown in Step 2). If you use axios or a custom HTTP client, explicitly call throw new axios.CancelledError() inside signal.addEventListener("abort", …).

Gotcha 3 — unbounded memory during long sessions

An entity map that retains every record ever fetched climbs monotonically. Implement a sliding window eviction that preserves cursor metadata while removing entities outside the visible viewport:

function evictDistantPages(
  state: NormalizedFeedState<any>,
  visibleStartIndex: number,
  windowSize = 200,
): NormalizedFeedState<any> {
  const visibleIds = new Set(
    state.pageIds.slice(visibleStartIndex, visibleStartIndex + windowSize),
  );
  const prunedEntities: Record<string, any> = {};
  for (const id of visibleIds) {
    if (state.entities[id]) prunedEntities[id] = state.entities[id];
  }
  return { ...state, entities: prunedEntities };
  // pageIds is kept intact so forward navigation can refetch evicted pages
}

The critical detail: pageIds is preserved in full even though entities is pruned. This means a user who scrolls back into an evicted range will trigger a cache miss and a targeted refetch for those IDs, rather than a full list reset.


Common Pitfalls & Resolutions

Observable Issue Root Cause Diagnostic Resolution
Duplicate entities appear in the UI after scrolling back to a previously visited section Overlapping cursor boundaries cause the same entity to be fetched in two consecutive pages; naive array concatenation appends without ID checking Add a Set-based deduplication step in the reducer before updating pageIds; compare against existingSet before appending
Infinite scroll hangs even though hasNextPage is true Cursor token was URI-encoded or JSON-stringified during normalization; the API receives a transformed string and returns 0 results for the “unknown” cursor Store the raw cursor string in an isolated pagination slice; add a isValidCursor type guard and log the exact token value sent to the API
invalidateQueries(["feed"]) resets the list to page 1 Query key includes cursor segments, so invalidation targets only the first matching key; remaining pages are left stale Use a base key (["feed"]) without cursor segments; let React Query manage per-page slots internally via InfiniteQueryObserver

Frequently Asked Questions

Should cursor tokens be stored inside the normalized entity map?

No. Cursor tokens belong in a separate pagination metadata slice. Mixing them into the entity graph pollutes structural sharing, causes cache key collisions, and makes targeted invalidation much harder. The pagination object should be the only consumer of cursor state; entity components should never read it.

How do I handle cursor pagination when the API returns nested relationships?

Flatten nested arrays into top-level normalized entities first using your normalization reducer, then stitch relationships via foreign key references — for example, store authorId: string on a Post entity rather than embedding the full author object. Apply cursor metadata to the store only after the entity graph is fully flattened. This mirrors the entity mapping strategies the rest of your normalization layer already uses.

How can I detect cursor drift in production without manual inspection?

Log cursor sequences alongside request timestamps and entity counts. Compare expected page sizes against actual merged totals — a mismatch of more than 1–2 entities signals boundary overlap or stale token propagation. For critical feeds, emit a custom performance.mark() at each cursor boundary and use the Performance Observer API to stream these events to your observability backend.