Pure speed

WhiteDB is a lightweight NoSQL database library written in C, operating fully in main memory. There is no server process. Data is read and written directly from/to shared memory, no sockets are used between WhiteDB and the application program.

Tutorial Speed Download

Project goals

  • speed
  • portability
  • simplicity and small footprint
  • low memory usage
  • easy to use in embedded systems
  • graph database applications
  • extended rdf database applications
  • fast interprocess communication
  • seamless integration with a wGandalf rule engine (work in progress)

Data storage

Data is kept in shared memory by default, making all the data accessible to separate processes.

Each database record is a tuple of N elements, encoded in WhiteDB-s simple compact format. You can store both conventional datatypes and direct pointers to records: the latter enables highly efficient traversal of complex data.

Supported features

  • indexes (T-tree)
  • persistence through logging and memory dumps
  • concurrency through locking
  • limited queries (conjunctive only)
  • json, CSV and RDF support
  • Linux and Windows
  • Python bindings
  • command line utility tools
  • json REST tools

Built with WhiteDB

Roboswarm

An early version of WhiteDB was used in the Roboswarm EU project enhancing the (cooperative) intelligence of iRobot Roombas functioning as a swarm.

All the external commands and data arriving from the robot sensors are stored in a WhiteDB onboard Roomba, running on a tiny linux computer. The reasoner generates new tasks for the Roomba reactively, in real-time, using rules and the WhiteDB contents.

Roboswarm videoclips »

Travel planner

The personalized tourism travel planner of a main national tourism site uses WhiteDB as the storage for the hard search tasks.

The planner has to sift through, match and evaluate all the tourism objects in the country during a heuristics-guided search for the best available plan to visit interesting sites.

Travel planner »

Telemedicine

Intelligent telemedicine systems developed at eliko use WhiteDB running on a small MIPS type CPU for storing and analysing sensor data.

The principles of using whiteboard systems with WhiteDB as a core tool for fast interprocess communication of multi-agent systems are described in a Ph.D thesis "Whiteboard Architecture for the Multi-agent Sensor Systems"

Thesis pdf »

Technology

Direct memory access

Each record is stored as an array (N-tuple) of integers: configurable as either 32 or 64 bits. The integers in the tuple encode values directly or as pointers. Columns have no type: any encoded value can be stored to any field.

You can always get a direct pointer to a record, store it into a field of a record or use it in your own program directly. A record pointer can thus be used as an automatically assigned id of the record which requires no search at all to access the record.

To search for a record, either scan the chain of all records, scan a sublist/tree you have built yourself or perform an index search on an indexed field.

Data encoding

The low bits of an integer in a record indicate the type of data. Anything which does not fit into the remainining bits is allocated separately and pointed to by the same integer.

The datatypes are null, record(pointer), integer, double, string, xml literal, uri, blob, char, date, time, pointer to record.

Long strings are allocated uniquely, i.e. using the same string in many fields does not take up additional space and allows fast string equality check.

A record pointer is a persistent offset of the record, usable as an automatic id of the record. Pointers allow fast traversal of complex data without search.

Allocation and garbage collection

Conventional malloc does not function in shared memory, since we have to use offsets instead of conventional pointers. Hence WhiteDB uses its own implementation of malloc for shared memory.

A record and a uniquely kept long string can be pointed to from several fields. Hence we use reference counting garbage collection embedded into our allocation algorithm when deleting records and long strings. Reference counting is incremental and does not cause long pauses.

Locking

We use a database level lock implemented via a task-fair atomic spinlock queue for concurrency control, but alternative faster and simpler preference policies can be configured: either a reader-preference or a writer-preference spinlock.

Generally, a database level lock is characterized by very low overhead but maximum possible contention. This means that processes should spend as little time between acquiring a lock and releasing it, as possible.

We provide safe atomic updates of simple values without taking a write lock.

Indexes

The simplest index provided is a T-tree index on any field containing any mixture of objects (integers, strings, etc). The index is automatically maintained when records are added, deleted or changed.

The efficiency of indexing can be greatly enhanced by using template indexes, which create an index only for records having a given value in a given field. For example, create an index on column 0 that only contains records where the 2-nd column is equal to 6.

Deduplicating storage of long strings, xml literals and uri-s uses hash indexes internally.

Persistent storage

Two mechanisms are available for storing the shared memory database to disk. First, the whole database can be dumped and restored. Since the database uses offsets instead of conventional pointers, the absolute adress locations are not important.

Second, all inserts, deletions and updates can be logged to a file. The compact log thus created can be played back to restore the contents of the database (normally after the last dump). Logging can be switched on and off, depending on the data criticality/performance requirements.