pasta::bit_vector
This header-only library contains a highly tuned (uncompressed) bit vector implementation with multiple space efficient rank and select support data structures. Our fastest rank and select support has a space overhead of only ~3.51% and makes use of data level parallelism via SIMD instructions.
Cite this software
Description
pasta::bit_vector
This header-only library contains a highly tuned (uncompressed) bit vector implementation with multiple space efficient rank and select support data structures.
Our fastest rank and select support has a space overhead of only ~3.51% and makes use of data level parallelism via SIMD instructions.
If you use this code in a scientific context, please cite our paper.
@inproceedings{Kurpicz2022CompactRankSelect,
author = {Florian Kurpicz},
title = {Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors},
booktitle = {{SPIRE}},
series = {Lecture Notes in Computer Science},
volume = {13617},
pages = {257--272},
publisher = {Springer},
year = {2022},
doi = {10.1007/978-3-031-20643-6\_19}
}
Contents
This repository contains the following algorithms and data structures.
Our documentation contains in-depth on the usage of all these algorithms and data structures including easy to follow examples.
You can find the example in the screenshot below as text, too.
Bit Vectors
Bit vectors play an important role in many compressed text indices, e.g., the FM-index.
This repository contains the following bit vector implementations:
- highly tuned uncompressed bit vector with access operator
- compact rank and select support for the uncompressed bit vector based on
Dong Zhou and David G. Andersen and Michael Kaminsky,
Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences,
SEA 2013.
- improved rank and select support requiring the same amount of memory but providing faster rank (up to 8% speedup) and select (up to 16.5% speedup) queries, and
- a very fast rank support that can also answer select queries.
Easy to Use
Since this is a header-only library, you have to simply add it to your projects include path to use it.
A small example can be found below.
We refer to the documentation for more information.
#include <pasta/bit_vector/bit_vector.hpp>
#include <pasta/bit_vector/support/flat_rank_select.hpp>
// Create a bit vector of size 1000 containing only zeros and flip every other bit.
pasta::BitVector bv(1000, 0);
for (size_t i = 0; i < bv.size(); ++i) {
if (i % 2 == 0) { bv[i] = 1; }
}
// Print the bit vector to see that everything worked ;-)
for (auto it = bv.begin(); it != bv.end(); ++it) {
std::cout << ((*it == true) ? '1' : '0');
}
std::cout << std::endl;
// Create rank and select support and print the result of some queries.
pasta::FlatRankSelect rs(bv);
std::cout << rs.rank0(250) << ", " << rs.rank1(250)
<< ", "
<< rs.select0(250) << ", " << rs.rank1(250)
<< std::endl;
Benchmarks and Tests
There exist an easy to use benchmark, which helps to compare the implementations in this repository.
To build the benchmark, run the CMake command with -DPASTA_BIT_VECTOR_BUILD_BENCHMARKS=On.
Our tests are contained in the folder tests.
To build the tests, run the CMake command with -DPASTA_BIT_VECTOR_BUILD_TESTS=On.
We also conducted an extensive experimental evaluation.
To this end, we use our rank and select benchmark where we compare our implementations with many other compact rank and select data structures.
We refer to our paper for a full description of the results, i.e., hardware, inputs, and competitors.
Below, you can find some of the figures we present in the paper.




How to Get This
Below, we list all commands that are required to build the code in this repository.
To this end, we provide three CMake presets (debug, release, and release with debug information).
- The debug preset creates a
debugfolder and uses the compiler flags-DDEBUG -O0 -g -ggdb -fsanitize=address. - The release preset creates a
buildfolder and uses the compiler flags-DNDEBUG -march=native -O3. - The release with debug information preset creates a
build_with_debug_infofolder and uses the compiler flags-DDEBUG -g -march=native -O3.
Per default, we use the following compiler flags: -Wall -Wextra -Wpedantic -fdiagnostics-color=always.
Requirements
pasta::bit_vector is written in C++20 and requires a compiler that supports it.
We use Ninja as build system.
For more information on how to use this library, please refer to our documentation.
tl;dr
To just clone the source code, use the following.
git clone git@github.com:pasta-toolbox/bit_vector
cd bit_vector
git submodule update --init --recursive
If you also want to build the test, please continue with the following commands.
cmake --preset=[debug|build|relwithdeb]-DPASTA_BIT_VECTOR_BUILD_TESTS=On
cmake --build --preset=[debug|release|relwithdeb]
ctest --test-dir [debug|build|relwithdeb]
Reference papers
Mentions
- 1.Author(s): Paolo Ferragina, Mariagiovanna Rotundo, Giorgio VinciguerraPublished in Lecture Notes in Computer Science, String Processing and Information Retrieval by Springer Nature Switzerland in 2023, page: 203-21710.1007/978-3-031-43980-3_16
- 2.Author(s): Florian Kurpicz, Hans-Peter Lehmann, Peter SandersPublished in 2023 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX) by Society for Industrial and Applied Mathematics in 2023, page: 162-17510.1137/1.9781611977561.ch14
- 1.Author(s): Patrick Dinklage, Johannes Fischer, Florian Kurpicz, Jan-Philipp TarnowskiPublished in Proceedings of the 2024 ACM Workshop on Highlights of Parallel Computing by ACM in 2024, page: 37-3810.1145/3670684.3673419
- 2.Author(s): Matteo Ceregini, Florian Kurpicz, Rossano VenturiniPublished in 2024 Data Compression Conference (DCC) by IEEE in 202410.1109/dcc58796.2024.00030
- 1.Author(s): Florian Kurpicz, Angelo Savino, Rossano VenturiniPublished in Software: Practice and Experience by Wiley in 2025, page: 1931-194610.1002/spe.70013
- 2.Author(s): Diego Arroyuelo, Gabriel Carmona, Héctor Larrañaga, Francisco Riveros, Carlos Eugenio Rojas-Morales, Erick SepúlvedaPublished in The Computer Journal by Oxford University Press (OUP) in 202510.1093/comjnl/bxaf102
- 3.Author(s): Antonio Boffa, Paolo Ferragina, Francesco Tosoni, Giorgio VinciguerraPublished in Information Systems by Elsevier BV in 2024, page: 10231610.1016/j.is.2023.102316
- 4.Author(s): Marco Costa, Paolo Ferragina, Giorgio VinciguerraPublished in Proceedings of the ACM on Management of Data by Association for Computing Machinery (ACM) in 2024, page: 1-2310.1145/3639258
- 5.Author(s): Lipeng Xu, Zhizhou Wu, Yinhai Wang, Jinjun Tang, Yunyi LiangPublished in IEEE Transactions on Intelligent Transportation Systems by Institute of Electrical and Electronics Engineers (IEEE) in 2024, page: 5389-540110.1109/tits.2024.3355744
Related projects
Core Informatics
A Helmholtz Pilot Program
