Decoupling Vector Data and Index Storage for Space Efficiency

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

Decoupling Vector Data and Index Storage for Space Efficiency

[USENIX FAST 2026] DecoupleVS: Reclaiming 58% Storage via Vector-Metadata Decoupling

总结

问题

方法

结果

要点

摘要

DecoupleVS is a decoupled vector storage management framework designed for disk-based Approximate Nearest Neighbor Search (ANNS). By separating high-dimensional vector data from auxiliary index metadata, it introduces tailored lossless compression and latency-aware I/O scheduling, achieving up to 58.7% storage reduction while improving search throughput by up to 2.18x compared to state-of-the-art monolithic systems like DiskANN.

TL;DR

As vector datasets scale to the billion and trillion levels, "monolithic" storage—where vectors and their indices are packed together—is hitting a wall. DecoupleVS breaks this paradigm. By decoupling vector data from index metadata, it enables tailored lossless compression (XOR-delta + Elias-Fano) and a latency-aware search pipeline. The result? A massive 58.7% reduction in storage footprint and a 2x boost in search throughput without losing a single bit of precision.

The Problem: The High Cost of Co-location

Modern disk-based ANNS systems like DiskANN and SPANN typically store full-precision vectors and their neighbor lists together. While this simplifies I/O, it introduces three "hidden" costs:

Space Inefficiency: Neighbor lists are often as large as the vectors themselves. Page-alignment requirements lead to internal fragmentation.
Read Amplification: Graph traversal reads massive amounts of data, but only a fraction is used for the final re-ranking.
Write Amplification: Updating a single vector requires rewriting entire index sections to keep the data co-located.

As SSD prices surge due to AI demand, these inefficiencies translate directly into massive operational costs.

Methodology: The Power of Decoupling

The core insight of DecoupleVS is that vectors and indices have different semantics and access patterns.

1. Tailored Lossless Compression

Instead of using general-purpose tools like LZ4, DecoupleVS exploits the physical properties of the data:

For Vectors: It uses XOR-based delta compression. Since dimensions in normalized vectors often have similar byte-positional distributions, XORing against a "base vector" and then applying Huffman coding significantly reduces entropy.
For Indices: Neighbor IDs are sorted and compressed using Elias-Fano encoding, a succinct data structure that allows for high compression while remaining searchable.

2. Hierarchical Storage Layout

To handle the variable-sized blocks created by compression, DecoupleVS introduces a segment -> chunk -> block hierarchy. Model Architecture Figure: The DecoupleVS Architecture showing the separation of Vector Data (Segments) and Auxiliary Index.

3. Latency-Aware Search Pipeline

DecoupleVS removes vector data I/Os from the critical path of graph traversal. It explores the graph using only the index and prefetches full-precision vectors only when the search "stabilizes." This adaptive prefetching ensures that the SSD's bandwidth is used for the most promising candidates.

Experimental Results: SOTA Performance

DecoupleVS was tested on billion-scale public and proprietary datasets.

Storage Savings: On a proprietary billion-scale dataset, DecoupleVS reduced storage by 58.7% compared to DiskANN. Even on pre-quantized datasets like SIFT100M, it saved over 40% by eliminating internal fragmentation and compressing the index.

Search Throughput: By optimizing cache efficiency (more compressed neighbor lists fit in RAM) and pipelining I/Os, DecoupleVS achieved up to 2.58x throughput gain. Performance Comparison Figure: Search Throughput (QPS) vs. Recall@10. DecoupleVS (blue) consistently stays at the Pareto frontier.

Updates: Through a log-structured append-only strategy for vectors and batched merges for indices, DecoupleVS maintained search performance during heavy update workloads, where monolithic systems often see latency spikes.

Deep Insight: Why This Matters

The "co-location" strategy was a relic of an era where disk seeks were the primary bottleneck. In the era of high-speed NVMe and billion-scale embeddings, the bottleneck has shifted to Data Volume and Memory Bandwidth.

DecoupleVS proves that by treating the vector database like a specialized file system—one that understands the mathematical distribution of vectors—we can achieve massive cost savings without the accuracy loss associated with lossy compression (like Product Quantization).

Conclusion

DecoupleVS is a masterclass in storage systems design for AI. It demonstrates that as models grow, our storage architectures must become "context-aware," treating different parts of the data object with specialized logic. For any organization running massive RAG (Retrieval-Augmented Generation) clusters, these 58% savings are game-changing.

Limitations: There is a slight increase in CPU overhead due to decompression, though this is largely mitigated by high-performance Huffman implementations and multi-threading.

发现相似论文

试试这些示例

Search for recent papers that apply Elias-Fano encoding or similar succinct data structures to optimize graph-based vector indices.
Which paper first proposed the Vamana graph or DiskANN architecture, and how does the decoupled I/O path in DecoupleVS fundamentally alter its "best-first search" logic?
Explore if the tailored lossless delta-compression used in DecoupleVS can be adapted for GPU-accelerated ANNS or specialized hardware like SmartSSDs.

[USENIX FAST 2026] DecoupleVS: Reclaiming 58% Storage via Vector-Metadata Decoupling

1. TL;DR

2. The Problem: The High Cost of Co-location

3. Methodology: The Power of Decoupling

3.1. 1. Tailored Lossless Compression

3.2. 2. Hierarchical Storage Layout

3.3. 3. Latency-Aware Search Pipeline

4. Experimental Results: SOTA Performance

5. Deep Insight: Why This Matters

6. Conclusion