Normalizing Cursor-Based Pagination
Implementing cursor-based pagination without a disciplined normalization layer produces three predictable failure modes: cache fragmentation (pages stored as disconnected slices rather than a unified entity graph), cursor drift (stale or mutated tokens that shift page boundaries on every refetch), and duplicate entity hydration (the same record appearing twice because overlapping page ranges were merged naively). This page shows how to fix all three by applying the Pagination Normalization Patterns your team already relies on for list management, using production-ready TypeScript with TanStack Query v5 and Apollo Client v3. If you are also fighting duplicates produced by concurrent fetchNextPage calls, read Merging Paginated Lists Without Duplicates alongside this guide — the two techniques are complementary.
Before you start: diagnostic checklist
Check all five items before writing code. Most cursor bugs are traceable to exactly one of these triggers.
- Cursor tokens are transformed before the next request — URI-encoding, JSON-stringifying, or trimming an opaque base64 cursor changes its byte sequence and breaks the API contract silently.
- Pagination metadata lives inside the entity array — if
nextCursoris a sibling of entity fields inside the same object, structural sharing algorithms treat it as an entity property and pollute the primary data graph. - Query keys depend on the entire cursor chain — a key like
['feed', cursor1, cursor2, cursor3]causes React Query to invalidate every page when any one cursor changes, not just the affected slice. - Concurrent
fetchNextPagecalls share no deduplication guard — rapid scrolling triggers multiple in-flight requests; without request cancellation or a version vector, the oldest response may overwrite the newest normalized state. - Normalized entity map is never pruned — an unbounded
entitiesobject retains every record ever fetched, causing JS heap growth that DevTools heap snapshots can confirm.
Data-flow: cursor page lifecycle
The diagram below shows the full lifecycle — from API response through the normalization reducer to the cache — and highlights where each of the three failure modes is injected.
Step 1 — Separate cursor metadata from entity data at the boundary
The normalization pipeline starts at the point where the API response enters your code. The single most impactful change you can make is to split the response into two independent shapes before touching the cache: a pagination object that stores cursor tokens, and an entities record that stores normalized data keyed by ID.
// types.ts
export interface CursorPage<T> {
items: T[];
nextCursor: string | null;
hasNextPage: boolean;
}
export interface NormalizedFeedState<T extends { id: string }> {
entities: Record<string, T>;
pageIds: string[]; // ordered list of entity IDs in scroll order
pagination: {
nextCursor: string | null;
hasNextPage: boolean;
lastFetchedAt: number; // derived from response header when possible
};
}
// normalizePageResponse.ts
export function normalizePageResponse<T extends { id: string }>(
incoming: CursorPage<T>,
existing: NormalizedFeedState<T>,
): NormalizedFeedState<T> {
// Build entity map: shallow-merge incoming fields onto existing record
const entities = { ...existing.entities };
for (const item of incoming.items) {
entities[item.id] = { ...entities[item.id], ...item };
}
// Append only net-new IDs to preserve insertion order
const existingSet = new Set(existing.pageIds);
const nextIds = [
...existing.pageIds,
...incoming.items.filter((i) => !existingSet.has(i.id)).map((i) => i.id),
];
return {
entities,
pageIds: nextIds,
pagination: {
nextCursor: incoming.nextCursor,
hasNextPage: incoming.hasNextPage,
lastFetchedAt: Date.now(), // replace with server Date header when available
},
};
}
Cache Behavior Analysis. TanStack Query’s structuralSharing setting (enabled by default) performs a deep equality check between the previous and incoming query data before deciding whether to trigger re-renders. When cursor tokens are stored alongside entity data, structuralSharing cannot isolate stable entity objects from changing cursor strings, so it replaces the entire reference tree on every page fetch — causing every list item to re-render. Splitting the shapes lets React only re-render components that subscribe to pagination.nextCursor.
Trade-offs.
structuralSharing: true(the default) keeps prior page object references stable when fields are unchanged. If you setstructuralSharing: falseto simplify merge logic, everyfetchNextPagecauses a full re-render.- Using
Date.now()forlastFetchedAtis a reasonable client-side proxy, but staleness calculations become unreliable when the client clock drifts. Prefer theDateresponse header when your API sets it.
Step 2 — Wire the reducer into TanStack Query useInfiniteQuery
TanStack Query v5 exposes getNextPageParam to extract the cursor from each page and select to transform the raw page data into your normalized shape before it reaches components.
// useFeed.ts
import { useInfiniteQuery, useQueryClient } from "@tanstack/react-query";
import type { NormalizedFeedState } from "./types";
import { normalizePageResponse } from "./normalizePageResponse";
interface FeedItem { id: string; title: string; createdAt: string }
async function fetchFeedPage(
cursor: string | null,
signal: AbortSignal,
): Promise<CursorPage<FeedItem>> {
const url = cursor ? `/api/feed?cursor=${cursor}` : "/api/feed";
const res = await fetch(url, { signal });
if (!res.ok) throw new Error(`Feed fetch failed: ${res.status}`);
return res.json();
}
export function useFeed() {
return useInfiniteQuery({
queryKey: ["feed"] as const, // base key only — cursor is NOT in the key
queryFn: ({ pageParam, signal }) =>
fetchFeedPage(pageParam ?? null, signal), // signal wires AbortController
initialPageParam: null as string | null,
getNextPageParam: (lastPage) => lastPage.nextCursor ?? undefined,
staleTime: 30_000, // pages are fresh for 30 s; no background refetch
gcTime: 5 * 60_000, // keep inactive pages 5 min before GC
select: (data) =>
// Reduce all raw pages into one unified normalized state
data.pages.reduce<NormalizedFeedState<FeedItem>>(
(acc, page) => normalizePageResponse(page, acc),
{ entities: {}, pageIds: [], pagination: { nextCursor: null, hasNextPage: true, lastFetchedAt: 0 } },
),
});
}
Cache Behavior Analysis. Keeping queryKey: ["feed"] — without embedding the cursor — is deliberate. TanStack Query appends page parameters internally to each InfiniteQueryObserver page slot; if you also embed the cursor in the query key, targeted invalidateQueries({ queryKey: ["feed"] }) still works, but removeQueries will match only the specific cursor slice and leave stale pages dangling. A base key also enables refetching from the first page when the user triggers a full cache invalidation. The signal parameter threads the native AbortSignal from each queryFn invocation, so React Query automatically cancels in-flight requests when the component unmounts or when a superseding fetch fires.
Trade-offs.
staleTime: 30_000prevents background refetches during rapid scrolling but means users see data that is up to 30 seconds old. Lower this to0on feeds where real-time order matters.gcTime: 5 * 60_000(formerlycacheTime) keeps the inactive query in memory for 5 minutes. For very large feeds this adds memory pressure; considergcTime: 60_000on lower-end devices.
Step 3 — Apollo Client: isolate cursor pages with typePolicies
In Apollo Client v3, the InMemoryCache normalizes by __typename:id by default. Cursor pagination requires a custom merge policy so Apollo treats each page append as an update to the same cached list rather than a replacement.
// apolloClient.ts
import { ApolloClient, InMemoryCache } from "@apollo/client";
export const client = new ApolloClient({
uri: "/graphql",
cache: new InMemoryCache({
typePolicies: {
Query: {
fields: {
feed: {
// Merges incoming edges onto the existing list without duplicates
keyArgs: [], // no cursor in cache key → one list slot
merge(
existing: { edges: any[]; pageInfo: any } = { edges: [], pageInfo: {} },
incoming: { edges: any[]; pageInfo: any },
) {
const existingIds = new Set(
existing.edges.map((e: any) => e.__ref ?? e.node?.id),
);
const novelEdges = incoming.edges.filter(
(e: any) => !existingIds.has(e.__ref ?? e.node?.id),
);
return {
// pageInfo forwarded from incoming — never merge with stale values
pageInfo: incoming.pageInfo,
edges: [...existing.edges, ...novelEdges],
};
},
},
},
},
},
}),
});
Cache Behavior Analysis. Setting keyArgs: [] collapses all cursor variants of feed(cursor: "…") onto a single normalized cache slot. Apollo merges incoming edges into that slot via the merge function, and its read field (omitted here for brevity) exposes the accumulated list to queries. Without keyArgs: [], each distinct cursor value creates a separate cache entry, producing the fragmentation symptom: the cache holds many small disconnected page slices instead of one unified list.
Trade-offs.
keyArgs: []means arefetch()orcache.evictonfeedalways resets the entire list, not just the affected page. For filtered feeds (e.g.,feed(category: "tech")), include filter args inkeyArgswhile still omitting the cursor:keyArgs: ["category"].- Apollo’s default GC removes normalized objects not reachable from any active query. Very long scroll sessions may retain thousands of
FeedItemreferences. Callclient.cache.gc()periodically or on route change to recover memory.
Edge cases and gotchas
Gotcha 1 — cursor token mutation
Opaque cursors (typically base64-encoded strings) break silently when modified. Anything that JSON-serializes and re-parses the cursor (e.g., JSON.stringify → JSON.parse round-trips in a Redux reducer) may add or drop whitespace depending on the runtime. Store cursors as raw strings in a dedicated pagination slice and never pass them through any transform. Validate with a type guard:
function isValidCursor(value: unknown): value is string {
return typeof value === "string" && value.length > 0 && value.length <= 512;
}
Reject anything that fails the guard and fall back to a full refetch from the initial cursor rather than propagating a corrupted token.
Gotcha 2 — race condition on rapid scroll
When a user scrolls fast enough to outpace in-flight requests, two fetchNextPage calls fire simultaneously. The older response, which resolves later due to network jitter, may overwrite the normalized state set by the newer response. TanStack Query cancels superseded requests automatically via AbortSignal, but only when the queryFn passes the signal to fetch (as shown in Step 2). If you use axios or a custom HTTP client, explicitly call throw new axios.CancelledError() inside signal.addEventListener("abort", …).
Gotcha 3 — unbounded memory during long sessions
An entity map that retains every record ever fetched climbs monotonically. Implement a sliding window eviction that preserves cursor metadata while removing entities outside the visible viewport:
function evictDistantPages(
state: NormalizedFeedState<any>,
visibleStartIndex: number,
windowSize = 200,
): NormalizedFeedState<any> {
const visibleIds = new Set(
state.pageIds.slice(visibleStartIndex, visibleStartIndex + windowSize),
);
const prunedEntities: Record<string, any> = {};
for (const id of visibleIds) {
if (state.entities[id]) prunedEntities[id] = state.entities[id];
}
return { ...state, entities: prunedEntities };
// pageIds is kept intact so forward navigation can refetch evicted pages
}
The critical detail: pageIds is preserved in full even though entities is pruned. This means a user who scrolls back into an evicted range will trigger a cache miss and a targeted refetch for those IDs, rather than a full list reset.
Common Pitfalls & Resolutions
| Observable Issue | Root Cause | Diagnostic Resolution |
|---|---|---|
| Duplicate entities appear in the UI after scrolling back to a previously visited section | Overlapping cursor boundaries cause the same entity to be fetched in two consecutive pages; naive array concatenation appends without ID checking | Add a Set-based deduplication step in the reducer before updating pageIds; compare against existingSet before appending |
Infinite scroll hangs even though hasNextPage is true |
Cursor token was URI-encoded or JSON-stringified during normalization; the API receives a transformed string and returns 0 results for the “unknown” cursor | Store the raw cursor string in an isolated pagination slice; add a isValidCursor type guard and log the exact token value sent to the API |
invalidateQueries(["feed"]) resets the list to page 1 |
Query key includes cursor segments, so invalidation targets only the first matching key; remaining pages are left stale | Use a base key (["feed"]) without cursor segments; let React Query manage per-page slots internally via InfiniteQueryObserver |
Frequently Asked Questions
Should cursor tokens be stored inside the normalized entity map?
No. Cursor tokens belong in a separate pagination metadata slice. Mixing them into the entity graph pollutes structural sharing, causes cache key collisions, and makes targeted invalidation much harder. The pagination object should be the only consumer of cursor state; entity components should never read it.
How do I handle cursor pagination when the API returns nested relationships?
Flatten nested arrays into top-level normalized entities first using your normalization reducer, then stitch relationships via foreign key references — for example, store authorId: string on a Post entity rather than embedding the full author object. Apply cursor metadata to the store only after the entity graph is fully flattened. This mirrors the entity mapping strategies the rest of your normalization layer already uses.
How can I detect cursor drift in production without manual inspection?
Log cursor sequences alongside request timestamps and entity counts. Compare expected page sizes against actual merged totals — a mismatch of more than 1–2 entities signals boundary overlap or stale token propagation. For critical feeds, emit a custom performance.mark() at each cursor boundary and use the Performance Observer API to stream these events to your observability backend.
Related
- Pagination Normalization Patterns — the parent topic covering offset, cursor, and infinite-scroll normalization as a unified discipline.
- Merging Paginated Lists Without Duplicates — deep dive into O(1)
Map-based deduplication for overlapping page boundaries, directly complementing the reducer shown here. - Data Normalization & Query Key Design — covering entity mapping, query key structure, and normalization strategy across React Query, Apollo, SWR, and RTK Query.
- Entity Mapping Strategies — how to design the normalized entity graph that cursor pages write into, including foreign key stitching and structural sharing rules.