Tutorial

A good place to start

1. Introduction

This tutorial will cover the basic usage of WhiteDB’s C API. Most examples you will encounter here will also be available in the Examples directory of the WhiteDB source distribution.

A thorough reference of the API is available in the Manual. If you’re looking for information on how to use WhiteDB from Python, please see Python documentation (also in semi-tutorial form).

2. Compiling the examples

Before we can get started with the tutorial, we need to know how to compile and run programs that use WhiteDB.

If you invoked the standard ./configure; make; make install, things are quite simple. Let’s try to compile Examples/demo.c (it’s already compiled, but let’s do it again).

gcc -o Examples/mydemo Examples/demo.c -lwgdb

That’s it! This is how you’d normally compile and link a WhiteDB program. The -lwgdb tells the linker to use libwgdb.a that sits somewhere in your library path. If you get an error at this point, it may be that your computer has libraptor installed. In that case WhiteDB automatically decides that it wants to use it; just add -lraptor to the command line and all should be fine.

You can now run Examples/mydemo and see what it does. Also, you may skip the rest of this section and go directly to Section 3, “Connecting to the database”.

In the event that you haven’t run make install (let’s say you’re still evaluating and getting acquainted with WhiteDB) or you’ve installed it in a location that’s not in your standard library path, you’ll need to add a few things.

gcc -L/alternate/libpath -o Examples/mydemo Examples/demo.c -lwgdb

Alternate libpath is where libwgdb.a is on your machine. If you didn’t make install at all, it is still in Main/.libs. Also, your system probably does not know where to look for the libraries, so if you just run Examples/mydemo, it will exit with "error while loading shared libraries".

LD_LIBRARY_PATH=/alternate/libpath Examples/mydemo

will take care of that. However, things will definitely be easier once you’ve done make install. Another issue that you might run into is that the compiler does not know where the API header files are. The programs in the Examples directory work around that by referring to the headers in the Db directory directly, but you might prefer a more flexible way. In this case, try

gcc -L/alternate/libpath -I/path/to/headers -o myprog myprog.c -lwgdb

where /path/to/headers is where dbapi.h is located.

Finally, in the event that your system does not have the make program or some other part of the toolchain required for the standard installation, but you still have the C compiler, such as gcc, you may directly compile the examples or your own programs with the WhiteDB sources. Have a look at Examples/compile_demo.sh to see what source files should be compiled.

2.1. So you’re on Windows

So far the there hasn’t been a peep about following the tutorial on a Windows computer, but don’t feel left out - the compilation is different enough that it deserves it’s own section.

You need the MSVC compiler (provided by Microsoft Visual Studio, Express Edition, for example). Set it up so you can run cl.exe from the command prompt. Visual Studio includes it’s own command prompt menu entry that has the environment set up correctly for you.

First, we recommend that you compile the wgdb.lib. If you followed the installation documentation, you probably have it in WhiteDB’s directory already. If not, run compile.bat. This produces everything you’ll need for the tutorial. Now try:

cl.exe /FeMYDEMO Examples\demo.c wgdb.lib

This produces mydemo.exe in the current directory. As long as the file wgdb.dll is also in the same directory, you can run mydemo.exe and see the output.

The above command works for the distributed examples, but it should be pointed out that the following way is more flexible, once you’ve started creating your own programs:

cl.exe /I"\path\to\whitedb\headers" yourprog.c \path\to\wgdb.lib

Replace the \path\to..-s with the actual directories where the WhiteDB files are on your computer.

3. Connecting to the database

Before you can read from or write to the WhiteDB database, your program needs to connect to it. Let’s look at how we might do that (Examples/tut1.c):

#include <whitedb/dbapi.h> /* or #include <dbapi.h> on Windows */

int main(int argc, char **argv) {
  void *db;
  db = wg_attach_database("1000", 2000000);
  return 0;
}

First, the program needs to include the API headers. There are a few other header files distributed with WhiteDB, but dbapi.h is the one we’ll need for now.

!Note

The programs in the Examples directory use a different way of including the headers, by referring to their location directly. This is so that the examples can be compiled before the installation of the database and it is perfectly acceptable - but let’s stick to the standard way of using library headers in this tutorial.

void *db is the database handle. Once we have the handle, we can use it in all the subsequent database operations - it will always point to the same database we originally attached to. Why stress that it is the same database? WhiteDB allows using multiple databases in parallel, without any prior configuration. The number "1000" we give to the wg_attach_database() function is the key that refers to the shared memory segment containing our database.

Observe that when using wg_attach_database(), it does not matter whether the database already exists or not - it will be created, if necessary. The size of the database will be the one we supplied, 2MB in this case. When the program exits, the database will remain in memory.

When you have already created a database in shared memory, you can later use wg_attach_existing_database(dbname) which functions exactly as wg_attach_database(dbname,...) but does not create a new database. If no database with the name dbname is found, it simply returns NULL. This is quite handy when you want to avoid creating a new base or just want to check whether it exists already.

4. Adding data to the database

An empty database isn’t usually much of a practical use, so we need to learn how to populate it with data. It is actually a three-step process: creating a record, encoding the data and writing to the fields of the records.

4.1. Records

A WhiteDB record is a n-tuple of encoded data. The n refers to the length of the record and there is no specific limit except that it must fit inside the database memory segment (of course, the size is given as wg_int, the universal datatype of WhiteDB, which itself has a maximum value, but this is quite large, especially on a 64-bit system).

void *rec = wg_create_record(db, 10);

The datatype of the record is void *, just like the database handle. Now we can use rec any time we need to do something with the record we’ve created. By the way, the records do not all need to be the same size, so we could do

void *rec2 = wg_create_record(db, 2);

and have two records, one of them 10 fields and the other 2 fields in size. However, the size is final and cannot be changed later.

4.2. Data in WhiteDB

An important distinction between WhiteDB and traditional databases is that the user can and in some cases must pop the hood open and get their hands dirty. Data encoding is one of such cases.

Everything inside the database is a "WhiteDB int", or a wg_int when we’re writing C code. These are basically numbers (32-bit or 64-bit integers, depending on your system), but for WhiteDB they contain encoded pieces of information - type of a value and the value itself or some way to access the value.

So whenever we need to write something, be it a string, a number or a date to the database, first we have to encode it so that WhiteDB is ready to handle it.

wg_int enc = wg_encode_int(db, 443);
wg_int enc2 = wg_encode_str(db, "this is my string", NULL);

The first line should be self-explanatory - enc is now 443 in WhiteDB’s internal format. When encoding a string, be aware that the string itself will be written to the database memory segment at that point - the encoded value enc2 will merely contain a reference to it. Also, there is a third parameter which we can ignore for simple applications.

4.3. Setting field values

You may be asking yourself why do we need to bother with encoding the values when we could simply write things like integers or character arrays directly. The main reason for that is that WhiteDB is schemaless. When we created records, we did not specify what type any of the fields were - they can be of any type. The encoded value is how WhiteDB can tell what type of data it is dealing with, since field 1 could be an integer in one record, a floating-point number in another one and so on.

With that out of the way, let’s take our encoded data and store it properly in the database:

wg_set_field(db, rec, 7, enc);
wg_set_field(db, rec2, 0, enc2);

Field 7 of the first record now contains 443 and field 0 of the second record (which has two fields, field 0 and field 1) contains "this is my string".

We didn’t touch any of the other fields and if we were to look at the contents of the records now, these would be filled with NULL values. Each time a new record is created, it initially contains a row of NULL-s which the user can then overwrite with their own data.

Here is our complete example (Examples/tut2.c):

#include <whitedb/dbapi.h>

int main(int argc, char **argv) {
  void *db, *rec, *rec2;
  wg_int enc, enc2;

  db = wg_attach_database("1000", 2000000);
  rec = wg_create_record(db, 10);
  rec2 = wg_create_record(db, 2);

  enc = wg_encode_int(db, 443);
  enc2 = wg_encode_str(db, "this is my string", NULL);

  wg_set_field(db, rec, 7, enc);
  wg_set_field(db, rec2, 0, enc2);

  return 0;
}

It is likely that you need to deal with more types than just strings and integers. The manual will provide a full list of supported types.

5. The wgdb utility

Once you’ve started working with WhiteDB, the wgdb tool may come in handy to manage the databases, so let’s take a quick look at it. First we deal with database persistence, you may skip to Section 5.2, “Looking at data” if you’re not on Windows.

5.1. Database persistence on Windows

The way shared memory works on Windows is that it is only present as long as there is a program holding a handle to it. So when we compile and run the previous example, the data gets written to the memory but then the program terminates and the database immediately disappears. To get around that, run

wgdb.exe 1000 server 2000000

in another window. That will keep the shared memory present, until you press CTRL+C. You can now run the tutorial programs and the following examples should work.

5.2. Looking at data

If you ran the program from the previous section, there should be some records in memory now. Let’s take a look:

wgdb 1000 select 20

It should return something like this:

[NULL,NULL,NULL,NULL,NULL,NULL,NULL,443,NULL,NULL]
["this is my string",NULL]

The "1000" in the command is the same shared memory key we used earlier. "select" prints records from the database and "20" limits the maximum number of records that will be shown. There is also a query command that lets you specify which records you are interested in:

wgdb 1000 query 7 = 443

That will only return the first record, the one where field 7 equals 443. There are other comparison operators: "!=" for not equal, "<" for less than, "<=" for less than or equal and so forth. Currently the query command does not have a row limit parameter.

5.3. Modifying data

The command line tools allows some data manipulation: deleting and adding records. The "del" command has the same syntax as the query command, so

wgdb 1000 del 7 = 443

will delete the first row from the database. We can also add records, but only integer and string values are recognized this way - dealing with other types unambiguously would become complicated.

wgdb 1000 add 1 2 3

This created a record with the length 3 and inserted three integer values in it. Let’s see what the database now contains. By the way, since "1000" is the default key, we may omit it:

wgdb select 20

The entire contents of the database will now be:

["this is my string",NULL]
[1,2,3]

5.4. Freeing the memory

At some point we may need to delete the database for whatever reason. The wgdb tool will help:

wgdb 1000 free

The database with the given key will be freed. Again, "1000" may be omitted as it is the default.

6. Making queries

6.1. Finding matching records

Finding records that match some condition is easy:

void *rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, NULL);

This returns the first record that has the integer 443 in field 7. That much is obvious, but some of the parameters might need extra explanation.

First, just a reminder that db is the database handle we’ve been using each time we call a WhiteDB function. As a second parameter we give the number of the field that the database engine should check against the value we’ve given.

The third parameter is the condition: we need some way of stating that we want records where "field 7" "equals" "443" so that is the "equals" part. There are other conditions, for example, if we substituted WG_COND_LESSTHAN there, we would receive a record where the value in field 7 is less than 443. The full list of possibe conditions is given in the manual.

The fourth parameter is, of course, the value. The function we called ended with _int and that parameter should also be of the type int. There is a function for most of WhiteDB’s datatypes, so if we were looking for a string we would use wg_find_record_str() instead.

Now let’s turn our attention to the mysterious NULL parameter. Remember that our example function call returned the first record that matched our parameters? What if there are more matching records and we want to find those too? That can be done:

void *nextrec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, rec);

Instead of the NULL, we can give the record that the function returned last time and WhiteDB will return the next one that matches the same condition.

The following example will call wg_find_record_int() in a cycle, finding all the matching record from the database. We’re adding some records, so it will print "Found a record…" at least once. Run it multiple times and see the number of matching records increase (Examples/tut3.c):

#include <stdio.h>
#include <whitedb/dbapi.h>

int main(int argc, char **argv) {
  void *db, *rec;
  wg_int enc;

  db = wg_attach_database("1000", 2000000);

  /* create some records for testing */
  rec = wg_create_record(db, 10);
  enc = wg_encode_int(db, 443); /* will match */
  wg_set_field(db, rec, 7, enc);

  rec = wg_create_record(db, 10);
  enc = wg_encode_int(db, 442);
  wg_set_field(db, rec, 7, enc); /* will not match */

  /* now find the records that match our condition
   * "field 7 equals 443"
   */
  rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, NULL);
  while(rec) {
    printf("Found a record where field 7 is 443\n");
    rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, rec);
  }

  return 0;
}

6.2. Full query interface

The above method of finding data is convinient, but it can be too limited (and in some specific cases inefficient) so eventually we may need to make use of full queries. The query API has a number of features we will not be discussing here, instead we’ll look at only the basic steps.

Running a query in WhiteDB consists of preparing the argument list, creating the query itself and then fetching the matching records from the query.

wg_query_arg arglist[2];
arglist[0].column = 7;
arglist[0].cond = WG_COND_EQUAL;
arglist[0].value = wg_encode_query_param_int(db, 443);

The wg_query_arg type is where we store one condition that the returned records should match (or, a "clause" of the query, if you like more exact terminology). Here we’ve specified again that we’d like to find records where "field 7 equals 443".

Notice that we declared arglist as an array of 2 elements? Well, this is because we can give more than one argument:

arglist[1].column = 6;
arglist[1].cond = WG_COND_EQUAL;
arglist[1].value = wg_encode_query_param_null(db, NULL);

Now we’re looking for records where "field 7 equals 443 and field 6 equals NULL". The value for both arguments is encoded and it’s recommended to use the special wg_encode_query_param_*() functions for that purpose.

wg_query *query = wg_make_query(db, NULL, 0, arglist, 2);

We pass the argument list and it’s size, which is 2, to WhiteDB (ignore the other parameters for now, they’re not used if you use the argument list). We receive a query object in return and are finally ready to start fetching records:

void *rec = wg_fetch(db, query);

The wg_fetch() function will return a different record each time you call it and eventually it will return NULL, meaning that you’ve already fetched all the records that match our argument list. Finally we should do some housekeeping, as queries may take up quite a bit of memory:

wg_free_query(db, query);
wg_free_query_param(db, arglist[0].value);
wg_free_query_param(db, arglist[1].value);

That was quite a bit of work to do essentialy the same thing we achieved with the help of just one function earlier, but it will also give you more power and flexibility. The following program (Examples/tut4.c) summarizes what we’ve looked at here:

#include <stdio.h>
#include <whitedb/dbapi.h>

int main(int argc, char **argv) {
  void *db, *rec;
  wg_int enc;
  wg_query_arg arglist[2]; /* holds the arguments to the query */
  wg_query *query;         /* used to fetch the query results */

  db = wg_attach_database("1000", 2000000);

  /* just in case, create some records for testing */
  rec = wg_create_record(db, 10);
  enc = wg_encode_int(db, 443); /* will match */
  wg_set_field(db, rec, 7, enc);

  rec = wg_create_record(db, 10);
  enc = wg_encode_int(db, 442);
  wg_set_field(db, rec, 7, enc); /* will not match */

  /* now find the records that match the condition
   * "field 7 equals 443 and field 6 equals NULL". The
   * second part is a bit redundant but we're adding it
   * to show the use of the argument list.
   */
  arglist[0].column = 7;
  arglist[0].cond = WG_COND_EQUAL;
  arglist[0].value = wg_encode_query_param_int(db, 443);

  arglist[1].column = 6;
  arglist[1].cond = WG_COND_EQUAL;
  arglist[1].value = wg_encode_query_param_null(db, NULL);

  query = wg_make_query(db, NULL, 0, arglist, 2);

  while((rec = wg_fetch(db, query))) {
    printf("Found a record where field 7 is 443 and field 6 is NULL\n");
  }

  /* Free the memory allocated for the query */
  wg_free_query(db, query);
  wg_free_query_param(db, arglist[0].value);
  wg_free_query_param(db, arglist[1].value);
  return 0;
}

You may run this program a couple of times, then run wgdb select 20 to verify that the tutorial program prints the correct number of rows.

7. Doing things properly

The examples we’ve been following up to now have been a bit sloppy. We haven’t bothered to check whether the WhiteDB functions fail or succeed, nor to clean up after after we were done - there was only the bit about freeing queries which was just too important to ignore.

First thing you should consider is that attaching to a database can sometimes fail. For example, it is possible that you requested more shared memory than the system configuration allows. So do a check like this:

void *db = wg_attach_database("1000", 1000000);
if(db == NULL) { /* do something to handle the error */ }

Creating records or encoding data can fail if the database is full - actually a common occurence since memory databases are naturally smaller than the traditional disk-based ones.

void *rec = wg_create_record(db, 1000);
if(rec == NULL) { /* record was not created, can't use it */ }
wg_int enc = wg_encode_str(db, "This could fail", NULL);
if(enc == WG_ILLEGAL) { /* string encoding failed */ }

Notice the WG_ILLEGAL value? This is a special encoded value that WhiteDB uses to tell you that something went wrong. All wg_encode_*() functions return that so it is easy to check for encode errors.

Decoding values can also fail. The most obvious case is when you expect some field in a record to be of a certain type, but some other program or user has written a different value there. This can sometimes be tricky to detect so you should consult the Manual how a particular wg_decode_*() function behaves.

If the database is used in a way that makes it difficult to predict what type of data a field contains, there is a way to find out:

if(wg_get_field_type(db, rec, 0)) == WG_STRTYPE) {
  printf("Field 0 in rec is a string\n");
}

Finally, here’s something that is nice to do whenever a program stops using a database:

wg_detach_database(db);

This detaches us from the shared memory and also frees any memory that may be allocated for the database handle.

Let’s try to apply all of those things in practice (Examples/tut5.c). The program creates records in a loop and writes a short string value in each of them. We have a small database, so soon we’ll run out of space, causing some errors which we are now able to cope with:

#include <stdio.h>
#include <stdlib.h>
#include <whitedb/dbapi.h>

int main(int argc, char **argv) {
  void *db, *rec, *lastrec;
  wg_int enc;
  int i;

  db = wg_attach_database("1000", 1000000); /* 1MB should fill up fast */
  if(!db) {
    printf("ERR: Could not attach to database.\n");
    exit(1);
  }

  lastrec = NULL;
  for(i=0;;i++) {
    char buf[20];
    rec = wg_create_record(db, 1);
    if(!rec) {
      printf("ERR: Failed to create a record (made %d so far)\n", i);
      break;
    }
    lastrec = rec;
    sprintf(buf, "%d", i); /* better to use snprintf() in real applications */
    enc = wg_encode_str(db, buf, NULL);
    if(enc == WG_ILLEGAL) {
      printf("ERR: Failed to encode a string (%d records currently)\n", i+1);
      break;
    }
    if(wg_set_field(db, rec, 0, enc)) {
      printf("ERR: This error is less likely, but wg_set_field() failed.\n");
      break;
    }
  }

  /* For educational purposes, let's pretend we're interested in what's
   * stored in the last record.
   */
  if(lastrec) {
    char *str = wg_decode_str(db, wg_get_field(db, lastrec, 0));
    if(!str) {
      printf("ERR: Decoding the string field failed.\n");
      if(wg_get_field_type(db, lastrec, 0) != WG_STRTYPE) {
        printf("ERR: The field type is not string - "
          "should have checked that first!\n");
      }
    }
  }

  wg_detach_database(db);
  return 0;
}

To try this out, let’s first try to cause an error on attaching. Type these commands:

wgdb free
wgdb create 999999

Then run the example program. Since we’ve already created a database and it’s smaller than what the progam is requesting, it should complain and exit early. When done with this, type wgdb free again to delete the smaller database. Run the example program again, this time there is nothing in the way so it can create the database named "1000" itself.

The results can be a bit unpredictable, but you should either see a record creation error or a number of errors related to a string field. Of course, this does not mean that the program failed or did nothing useful - a number of records were created successfully, we just eventually ran out of database space. Everything our program printed has "ERR:" in front of it - the rest of the messages come from WhiteDB.

8. Parallel use

WhiteDB never locks the database for you. Whenever something is read or written, the engine just goes and does it without checking whether some other program (or user) is currently using the database. This goes with the philosophy of speed and simplicity.

But there are many use cases where parallel use of the database is needed and in those cases everybody cannot just crowd the database at the same time and start making changes to the shared memory area - that could result in inconsistent data, or worse, a corrupt and useless database. Fortunately, WhiteDB does provide the user with the tools to handle that.

The rule of thumb is that you need these concurrency control functions whenever the database is both read and written by several processes and possibly at the same time. For example, if a database is serving data to a webserver and there are occasional updates to the data (without shutting down the webserver), that would qualify as needing concurrency control.

To implement this concurrency control, we first request permission to read whenever we are about to read something from the database and similarly declare our intention to write to the database. Once we’re done we inform the database engine that we’re finished so others may proceed. So,

wg_int lock_id = wg_start_read(db);
/* do some reading */
wg_end_read(db, lock_id);

requests permission to read. There may be some time until the function wg_start_read() returns - it may need to wait for some other process to finish whatever it is doing with the database. Once it returns the lock_id, we have shared access to the database - we may read it safely and so may other processes, but no one can write anything. wg_end_read() declares that we no longer need the read access.

It is quite possible that wg_start_read() fails - it can happen under heavy load or if some other process is hogging the database for a long time. We should always check:

lock_id = wg_start_read(db);
if(!lock_id) {
  printf("wg_start_read() timed out\n");
  exit(1); /* or go and retry, whatever is appropriate */
}

Getting write access is similar, the major difference is that once we get the permission, we have exclusive access - everyone else has to wait until we’re done adding or updating the data:

lock_id = wg_start_write(db);
if(!lock_id) {
  printf("wg_start_write() timed out\n");
  exit(1);
}
/* do some writing */
wg_end_write(db, lock_id);
!Note

At present time these functions behave exactly like operating on a single big database-level lock. This tutorial does not make a secret of it, however, the future direction may be that they start behaving more like transactions. The important thing to remember is that the purpose is to allow you to read and write data safely, without corruption and inconsistency. By following the pattern described here, your program will continue to work unmodified, no matter how WhiteDB implements things internally.

To illustrate parallel use, we will implement a counter that is incremented from two programs simultaneously. This kind of example is frequently used in parallel programming tutorials, when done naively the counter counts incorrectly because the processes end up ignoring some of the increments done by the other process. We will place this counter value inside a WhiteDB database (Examples/tut6.c).

#include <stdio.h>
#include <stdlib.h>
#include <whitedb/dbapi.h>

#define NUM_INCREMENTS 100000

void die(void *db, int err) {
  wg_detach_database(db);
  exit(err);
}

int main(int argc, char **argv) {
  void *db, *rec;
  wg_int lock_id;
  int i, val;

  if(!(db = wg_attach_database("1000", 1000000))) {
    exit(1); /* failed to attach */
  }

  /* First we need to make sure both counting programs start at the
   * same time (otherwise the example would be boring).
   */
  lock_id = wg_start_read(db);
  rec = wg_get_first_record(db); /* our database only contains one record,
                                  * so we don't need to make a query.
                                  */
  wg_end_read(db, lock_id);

  if(!rec) {
    /* There is no record yet, we're the first to run and have
     * to set up the counter.
     */
    lock_id = wg_start_write(db);
    if(!lock_id) die(db, 2);
    rec = wg_create_record(db, 1);
    wg_end_write(db, lock_id);

    if(!rec) die(db, 3);
    printf("Press a key when all the counter programs have been started.");
    fgetc(stdin);

    /* Setting the counter to 0 lets each counting program know it can
     * start counting now.
     */
    lock_id = wg_start_write(db);
    if(!lock_id) die(db, 2);
    wg_set_field(db, rec, 0, wg_encode_int(db, 0));
    wg_end_write(db, lock_id);
  } else {
    /* Some other program has started first, we wait until the counter
     * is ready.
     */
    int ready = 0;

    while(!ready) {
      lock_id = wg_start_read(db);
      if(!lock_id) die(db, 2);
      if(wg_get_field_type(db, rec, 0) == WG_INTTYPE)
        ready = 1;
      wg_end_read(db, lock_id);
    }
  }

  /* Now start the actual counting. */
  for(i=0; i<NUM_INCREMENTS; i++) {
    lock_id = wg_start_write(db);
    if(!lock_id) die(db, 2);

    /* This is the "critical section" for the counter. we read the value,
     * increment it and write it back.
     */
    val = wg_decode_int(db, wg_get_field(db, rec, 0));
    wg_set_field(db, rec, 0, wg_encode_int(db, ++val));

    wg_end_write(db, lock_id);
  }

  printf("\nCounting done. My last value was %d\n", val);
  return 0;
}

Before running the program, type wgdb free. We’ve tried to keep the example as simple as possible, so it needs a completely empty database to start. Next start the program in one window; it will prompt you to press a key. Now go to another window and start the program one more time - this one will just wait and do nothing.

Both programs are now ready to start counting, but the first program is waiting for a key press and the second one is waiting for the counter to get it’s initial value. Go back to the first window and press a key. The counter will be initialized and both programs proceed to increment it 100000 times. Type wgdb select 1 to check what the value of the counter is - it should be 200000.

If those programs were for different sensors in a real-life program, counting some event, we would have managed to get the correct number of events in the end.

9. Network database

If you can picture the content of a WhiteDB database, there isn’t necessarily much structure there - just many, many records which may be of different sizes. Because you can query records by the field contents, the fields can be thought of as columns and the entire database as a flat table where the records form the rows.

To fit any kind of data model in WhiteDB, the user needs to create the structure themselves. Carrying over the RDBMS mindset and creating, for example employee, address and customer records, which can be referred to by some id field, would certainly work, but there is another way.

Before there were relational databases, there was the network model. It is a fairly simple concept where instead of referring to to something by key, you include it directly: if I want to add an address to a customer, I just insert the address into the customer record. The difference may seem superficial, but the underlying mechanics are very different.

9.1. Record linking in WhiteDB

WhiteDB implements the network model in a straightforward way. This tutorial has already shown how to write an integer or a string into a field inside a record, putting another record there works exactly the same:

void *rec = wg_create_record(db, 2); /* this is some record */
void *rec2 = wg_create_record(db, 3); /* this is another record */
wg_int enc = wg_encode_record(db, rec);  /* encode as a WhiteDB value */
wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL));
wg_set_field(db, rec2, 2, enc);

Assuming we don’t add any other fields, the first record now looks like this: [NULL, "hello"]. But the second one is more interesting: [NULL, NULL, [NULL, "hello"]]. Of course, the data isn’t actually duplicated inside the second record, its field 2 just contains a reference. But if your program already holds rec2, getting to the contents of rec is very fast.

So how does one make use of such data? If I have the second record, I can get to the message inside the first record like this (checking for the type isn’t mandatory of course, this is just to show that the record is a datatype just like an integer or a double):

if(wg_get_field_type(db, rec2, 2) == WG_RECORDTYPE) {
  void *rec = wg_decode_record(db, wg_get_field(db, rec2, 2));
  char *message = wg_decode_str(db, wg_get_field(db, rec, 1));
  /* we find that the message is "hello" */
}

Linking records together can produce a graph, a tree or whatever is needed. Or we can treat the records as containing other records inside, so our initial view of them as flat table rows was a bit deceptive and a set would perhaps be a more accurate description.

!Note

A field containing a reference to a record isn’t different from other fields. It can still be indexed and you could write a query like "give me a record where field 3 is [ "owner", [ "name", "Peter", "age", 25 ]]" and it would work. However, while record references are fast, keeping track of such recursion can be expensive so it is best to avoid deep chains of records on indexed fields.

The next program (Examples/tut7.c) creates some records and links them together. Run it, then use wgdb select 20 to examine the database contents. The records will appear nested inside each other:

#include <stdlib.h>
#include <whitedb/dbapi.h>

int main(int argc, char **argv) {
  void *db, *rec, *rec2, *rec3;
  wg_int enc;

  if(!(db = wg_attach_database("1000", 2000000)))
    exit(1); /* failed to attach */

  rec = wg_create_record(db, 2); /* this is some record */
  rec2 = wg_create_record(db, 3); /* this is another record */
  rec3 = wg_create_record(db, 4); /* this is a third record */

  if(!rec || !rec2 || !rec3)
    exit(2);

  /* Add some content */
  wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL));
  wg_set_field(db, rec2, 0, wg_encode_str(db,
    "I'm pointing to other records", NULL));
  wg_set_field(db, rec3, 0, wg_encode_str(db,
    "I'm linked from two records", NULL));

  /* link the records to each other */
  enc = wg_encode_record(db, rec);
  wg_set_field(db, rec2, 2, enc); /* rec2[2] points to rec */
  enc = wg_encode_record(db, rec3);
  wg_set_field(db, rec2, 1, enc); /* rec2[1] points to rec3 */
  wg_set_field(db, rec, 0, enc); /* rec[0] points to rec3 */

  wg_detach_database(db);
  return 0;
}

9.2. When to use the network model

!Note

the code included in this section isn’t here to show you how to do things but rather to illustrate what would happen if things were done this way.

Consider our earlier example with two records, one of them containing the important message "hello". Let’s try to use techniques that work in relational databases to accomplish the same thing. Storing the data isn’t complicated:

void *rec = wg_create_record(db, 2);
void *rec2 = wg_create_record(db, 3);
wg_int hello_id = wg_encode_int(db, 11913); /* an arbitrary record id */
wg_set_field(db, rec, 0, hello_id); /* assign the id */
wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL));
wg_set_field(db, rec2, 2, hello_id); /* reference by id */

So far so good. The records should now contain [11913, "hello"] and [NULL, NULL, 11913]. Pretend again that we have just the rec2 (as a result of a query, for example) and need the contents of rec:

int hello_id = wg_decode_int(db, wg_get_field(db, rec2, 2));
/* this will search the database for 11913 */
void *rec = wg_find_record_int(db, 0, WG_COND_EQUAL, hello_id, NULL);
char *message = wg_decode_str(db, wg_get_field(db, rec, 1));

If you have an index on field 0, this doesn’t look that bad. Sure, the value "11913" needs to be looked up, but wg_find_record_int() can manage it reasonably fast. What if you had to do this inside a loop, though? How about a nested loop? Executing a million queries isn’t that fun anymore, even if they don’t take up time individually - wg_decode_record() would have taken a fraction of that.

The lesson to learn from here is that whenever you have a data model where objects have relations to each other and need to perform JOIN type queries on them, it can be much faster in WhiteDB to implement the "join" in advance by linking the object records to each other. Those links can then be navigated rapidly to collect all the data.

Another use case is storing semi-structured data. We have used the network model in our implementation of JSON documents in WhiteDB (a work in progress) where for example an array can contain other arrays or objects: the record linking feature allows us to accomplish this elegantly by making the array a record whose fields contain links to other records holding the child elements.

10. More examples

There are a few more examples distributed with WhiteDB that were not covered in this tutorial. You may look at Examples/demo.c and Examples/query.c that should be commented well enough to be understandable by now.

Examples in Examples/speed are geared towards speed testing and are covered in http://whitedb.org/speed.html

A bit more involved example to look at is Server/dserve.c: making queries from WhiteDB with a simple REST cgi program giving json or csv output. dserve is useful both as it is and as an example/template for making your own tools for WhiteDB data handling: see http://whitedb.org/tools.html