[multi-vector] Detect L1/L2 cache sizes for tile budget by wuw92 · Pull Request #1134 · microsoft/DiskANN

wuw92 · 2026-06-05T03:54:46Z

Why

TileBudget::default() was hardcoded to (L2 = 1.25 MB, L1d = 48 KB, from PR #863), so the tile planner mis-sized tiles on every other microarch. This detects the host's real L1d/L2 and feeds it to TileBudget.

Detection

Arch-first — CPUID where it exists, an OS API only where it doesn't:

Arch / OS	Mechanism
x86_64 (Windows, Linux, macOS)	CPUID via `raw-cpuid`
aarch64 Linux	sysfs
aarch64 macOS	sysctl `hw.perflevel0.*`, L2 ÷ `cpusperl2`
anything else	`CacheInfo::FALLBACK` (32 KB / 256 KB)

Windows is covered by the x86_64 CPUID path — no Win32 codepath, no
windows-sys dependency. Only Windows-on-ARM falls back.
macOS divides the cluster L2 by cpusperl2: Apple Silicon shares one L2 per
P-core cluster, so the raw value over-budgets L2 ~4×.

Benchmark

The multi-vector benchmark on my devbox (Windows x86_64) — detected
L1d = 32 KB, L2 = 512 KB vs the hardcoded 48 KB / 1.25 MB. ns /
inner-product, min of 50 measurements, AVX-512 jobs excluded (host is V3).
(Q, D, Dim) = (queries, docs, dim).

f32 — neutral, all shapes within ±2 %:

Q	D	Dim	Hardcoded	Detected	Δ
8	32	128	5.281	5.227	−1.0 %
16	64	256	5.137	5.176	+0.8 %
32	128	384	7.654	7.690	+0.5 %
32	16	256	5.244	5.322	+1.5 %
64	32	264	5.361	5.293	−1.3 %
32	1250	128	2.632	2.618	−0.5 %
64	1250	512	10.438	10.600	+1.6 %
64	32	128	2.539	2.537	−0.1 %
32	32	512	10.273	10.293	+0.2 %

f16 — improves on 8 of 9 shapes, the last neutral:

Q	D	Dim	Hardcoded	Detected	Δ
8	32	128	41.844	8.164	−80.5 %
16	64	256	14.863	6.523	−56.1 %
32	16	256	23.730	8.398	−64.6 %
64	32	128	7.507	3.374	−55.1 %
32	32	512	22.324	13.320	−40.3 %
64	32	264	10.166	6.611	−35.0 %
32	128	384	10.669	9.180	−14.0 %
32	1250	128	3.093	2.735	−11.6 %
64	1250	512	10.800	10.825	+0.2 %

…tile budget Wires runtime L1d/L2 detection into TileBudget::default() so the multi-vector tile planner sizes A/B tiles against the host's actual cache geometry instead of hardcoded Skylake-X estimates (1.25 MB L2, 48 KB L1d from PR #863). Detection lives in diskann-quantization, alongside the existing ISA-capability probe in isa.rs. Cache size is fundamentally a CPU/arch property: the OS-API is a discovery mechanism, not the concept being captured. Putting the module here mirrors the existing diskann-wide / diskann-vector / diskann-quantization stack, which handles all arch dispatch internally without depending on diskann-platform. Detection strategy follows what gemm-common / faer / OpenBLAS do: CPUID where available, OS API where required. - x86_64 (any OS): CPUID via the `raw-cpuid` crate (one path) - aarch64 Linux: sysfs (/sys/devices/system/cpu/cpu0/cache/...) - aarch64 macOS: sysctl (hw.perflevel0.*, P-core L2 / cpusperl2) - Anything else: CacheInfo::FALLBACK (32 KB L1d, 256 KB L2) On Apple Silicon the per-cluster L2 is divided by cpusperl2 to give a per-core budget. Windows-on-ARM falls back to the conservative defaults: CI doesn't cover that target and DiskANN production doesn't deploy there; dropping it removes a Win32 codepath and lets the crate avoid pulling windows-sys. Equality between CPUID and Win32 GetLogicalProcessorInformationEx was verified on Windows x86_64 (32 KB L1d / 512 KB L2 on the test host) during development. Final commit removes the side-by-side test along with the temporary dependency on diskann-platform. Closes #1062.

Copilot

Pull request overview

This PR updates the multi-vector distance tiling planner in diskann-quantization to derive its tile budgets from runtime-detected L1d/L2 cache sizes (with per-arch fallbacks), replacing the prior hardcoded Skylake-X assumptions.

Changes:

Added a memoized cache-size probe (L1d, L2) with x86_64 CPUID detection and aarch64 Linux/macOS OS-specific probes, plus a conservative fallback.
Updated TileBudget::default() to use detected cache sizes when computing L1/L2-derived budgets for tile planning.
Introduced target-specific dependencies (raw-cpuid on x86_64; libc on aarch64 macOS) to support probing.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
diskann-quantization/src/multi_vector/distance/mod.rs	Wires in the new `cache` submodule.
diskann-quantization/src/multi_vector/distance/kernels/mod.rs	Switches `TileBudget::default()` to runtime-detected cache sizes.
diskann-quantization/src/multi_vector/distance/cache/mod.rs	Adds memoized cache probing API and basic plausibility/memoization tests.
diskann-quantization/src/multi_vector/distance/cache/cpuid.rs	Implements x86_64 cache detection using `raw-cpuid`.
diskann-quantization/src/multi_vector/distance/cache/linux.rs	Implements aarch64 Linux detection via sysfs cache entries.
diskann-quantization/src/multi_vector/distance/cache/macos.rs	Implements aarch64 macOS detection via sysctl `hw.perflevel0.*`.
diskann-quantization/Cargo.toml	Adds target-specific dependencies for the detection paths.
Cargo.lock	Records the new dependency resolutions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-06-05T04:10:04Z

Codecov Report

❌ Patch coverage is 95.91837% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.41%. Comparing base (6168ef0) to head (3be0354).

Files with missing lines	Patch %	Lines
...uantization/src/multi_vector/distance/cache/mod.rs	90.47%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1134      +/-   ##
==========================================
+ Coverage   89.40%   89.41%   +0.01%     
==========================================
  Files         485      487       +2     
  Lines       92079    92126      +47     
==========================================
+ Hits        82324    82376      +52     
+ Misses       9755     9750       -5

Flag	Coverage Δ
miri	`89.41% <95.91%> (+0.01%)`	⬆️
unittests	`89.07% <95.91%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...ntization/src/multi_vector/distance/cache/cpuid.rs	`100.00% <100.00%> (ø)`
...ntization/src/multi_vector/distance/kernels/mod.rs	`100.00% <100.00%> (ø)`
...uantization/src/multi_vector/distance/cache/mod.rs	`90.47% <90.47%> (ø)`

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- cpuid.rs: reword the module doc so the CPUID 0x4 / 0x8000001D leaf selection is attributed to raw-cpuid's cache-parameter enumeration. The vendor dispatch is internal to the crate, not visible at our call site, which the original wording obscured. - linux.rs: parse_size uses checked_mul for the K/M/G suffixes so an oversized sysfs value returns None instead of silently wrapping in release builds, where overflow checks are off. Add a regression test.

wuw92 requested review from a team and Copilot June 5, 2026 03:54

Copilot started reviewing on behalf of wuw92 June 5, 2026 03:54 View session

wuw92 changed the title ~~[diskann-quantization] Detect L1/L2 cache sizes for tile budget~~ [multi-vector] Detect L1/L2 cache sizes for tile budget Jun 5, 2026

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread diskann-quantization/src/multi_vector/distance/cache/cpuid.rs Outdated

Comment thread diskann-quantization/src/multi_vector/distance/cache/linux.rs

wuw92 linked an issue Jun 5, 2026 that may be closed by this pull request

Add support for reading L1/L2 cache sizes for different platforms(Windows, Linux, MacOS) for our efficient cache aware multi-vector distance functions. #1062

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multi-vector] Detect L1/L2 cache sizes for tile budget#1134

[multi-vector] Detect L1/L2 cache sizes for tile budget#1134
wuw92 wants to merge 2 commits into
mainfrom
users/wuw92/cache-size-detection

wuw92 commented Jun 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wuw92 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Detection

Benchmark

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wuw92 commented Jun 5, 2026 •

edited

Loading

codecov-commenter commented Jun 5, 2026 •

edited

Loading