This tutorial will cover the basic usage of WhiteDB’s C API. Most examples you will encounter here will also be available in the Examples directory of the WhiteDB source distribution.
A thorough reference of the API is available in the Manual. If you’re looking for information on how to use WhiteDB from Python, please see Python documentation (also in semi-tutorial form).
Before we can get started with the tutorial, we need to know how to compile and run programs that use WhiteDB.
If you invoked the standard ./configure; make; make install
, things are
quite simple. Let’s try to compile Examples/demo.c (it’s already compiled,
but let’s do it again).
gcc -o Examples/mydemo Examples/demo.c -lwgdb
That’s it! This is how you’d normally compile and link a WhiteDB program.
The -lwgdb
tells the linker to use libwgdb.a that sits somewhere in your
library path. If you get an error at this point, it may be that your
computer has libraptor installed. In that case WhiteDB automatically decides
that it wants to use it; just add -lraptor
to the command line and all
should be fine.
You can now run Examples/mydemo
and see what it does. Also, you may skip the
rest of this section and go directly to Section 3, “Connecting to the database”.
In the event that you haven’t run make install
(let’s say you’re still
evaluating and getting acquainted with WhiteDB) or you’ve installed it
in a location that’s not in your standard library path, you’ll need to
add a few things.
gcc -L/alternate/libpath -o Examples/mydemo Examples/demo.c -lwgdb
Alternate libpath is where libwgdb.a is on your machine. If you didn’t
make install
at all, it is still in Main/.libs. Also, your system probably
does not know where to look for the libraries, so if you just run
Examples/mydemo
, it will exit with "error while loading shared libraries".
LD_LIBRARY_PATH=/alternate/libpath Examples/mydemo
will take care of that. However, things will definitely be easier once you’ve
done make install
. Another issue that you might run into is that the compiler
does not know where the API header files are. The programs in the Examples
directory work around that by referring to the headers in the Db directory
directly, but you might prefer a more flexible way. In this case, try
gcc -L/alternate/libpath -I/path/to/headers -o myprog myprog.c -lwgdb
where /path/to/headers is where dbapi.h is located.
Finally, in the event that your system does not have the make program or some other part of the toolchain required for the standard installation, but you still have the C compiler, such as gcc, you may directly compile the examples or your own programs with the WhiteDB sources. Have a look at Examples/compile_demo.sh to see what source files should be compiled.
So far the there hasn’t been a peep about following the tutorial on a Windows computer, but don’t feel left out - the compilation is different enough that it deserves it’s own section.
You need the MSVC compiler (provided by Microsoft Visual Studio, Express
Edition, for example). Set it up so you can run cl.exe
from the command
prompt. Visual Studio includes it’s own command prompt menu entry that has
the environment set up correctly for you.
First, we recommend that you compile the wgdb.lib. If you followed the
installation documentation, you probably have it in WhiteDB’s directory
already. If not, run compile.bat
. This produces everything you’ll
need for the tutorial. Now try:
cl.exe /FeMYDEMO Examples\demo.c wgdb.lib
This produces mydemo.exe in the current directory. As long as the file
wgdb.dll is also in the same directory, you can run mydemo.exe
and see
the output.
The above command works for the distributed examples, but it should be pointed out that the following way is more flexible, once you’ve started creating your own programs:
cl.exe /I"\path\to\whitedb\headers" yourprog.c \path\to\wgdb.lib
Replace the \path\to..-s with the actual directories where the WhiteDB files are on your computer.
Before you can read from or write to the WhiteDB database, your program needs to connect to it. Let’s look at how we might do that (Examples/tut1.c):
#include <whitedb/dbapi.h> /* or #include <dbapi.h> on Windows */ int main(int argc, char **argv) { void *db; db = wg_attach_database("1000", 2000000); return 0; }
First, the program needs to include the API headers. There are a few other header files distributed with WhiteDB, but dbapi.h is the one we’ll need for now.
! | Note |
---|---|
The programs in the Examples directory use a different way of including the headers, by referring to their location directly. This is so that the examples can be compiled before the installation of the database and it is perfectly acceptable - but let’s stick to the standard way of using library headers in this tutorial. |
void *db
is the database handle. Once we have the handle, we can use it in
all the subsequent database operations - it will always point to the same
database we originally attached to. Why stress that it is the same database?
WhiteDB allows using multiple databases in parallel, without any prior
configuration. The number "1000" we give to the wg_attach_database()
function
is the key that refers to the shared memory segment containing our database.
Observe that when using wg_attach_database()
, it does not matter whether the
database already exists or not - it will be created, if necessary. The size
of the database will be the one we supplied, 2MB in this case. When the
program exits, the database will remain in memory.
When you have already created a database in shared memory, you can later use
wg_attach_existing_database(dbname)
which functions exactly as
wg_attach_database(dbname,...)
but does not create a new database.
If no database with the name dbname
is found, it simply returns NULL.
This is quite handy when you want to avoid creating a new base or just want
to check whether it exists already.
An empty database isn’t usually much of a practical use, so we need to learn how to populate it with data. It is actually a three-step process: creating a record, encoding the data and writing to the fields of the records.
A WhiteDB record is a n-tuple of encoded data. The n refers to the length of
the record and there is no specific limit except that it must fit inside
the database memory segment (of course, the size is given as wg_int
, the
universal datatype of WhiteDB, which itself has a maximum value, but this is
quite large, especially on a 64-bit system).
void *rec = wg_create_record(db, 10);
The datatype of the record is void *
, just like the database handle. Now we
can use rec
any time we need to do something with the record we’ve created.
By the way, the records do not all need to be the same size, so we could do
void *rec2 = wg_create_record(db, 2);
and have two records, one of them 10 fields and the other 2 fields in size. However, the size is final and cannot be changed later.
An important distinction between WhiteDB and traditional databases is that the user can and in some cases must pop the hood open and get their hands dirty. Data encoding is one of such cases.
Everything inside the database is a "WhiteDB int", or a wg_int
when we’re
writing C code. These are basically numbers (32-bit or 64-bit integers,
depending on your system), but for WhiteDB they contain encoded pieces of
information - type of a value and the value itself or some way to access the
value.
So whenever we need to write something, be it a string, a number or a date to the database, first we have to encode it so that WhiteDB is ready to handle it.
wg_int enc = wg_encode_int(db, 443); wg_int enc2 = wg_encode_str(db, "this is my string", NULL);
The first line should be self-explanatory - enc
is now 443 in WhiteDB’s
internal format. When encoding a string, be aware that the string itself
will be written to the database memory segment at that point - the encoded
value enc2
will merely contain a reference to it. Also, there is a third
parameter which we can ignore for simple applications.
You may be asking yourself why do we need to bother with encoding the values when we could simply write things like integers or character arrays directly. The main reason for that is that WhiteDB is schemaless. When we created records, we did not specify what type any of the fields were - they can be of any type. The encoded value is how WhiteDB can tell what type of data it is dealing with, since field 1 could be an integer in one record, a floating-point number in another one and so on.
With that out of the way, let’s take our encoded data and store it properly in the database:
wg_set_field(db, rec, 7, enc); wg_set_field(db, rec2, 0, enc2);
Field 7 of the first record now contains 443 and field 0 of the second record (which has two fields, field 0 and field 1) contains "this is my string".
We didn’t touch any of the other fields and if we were to look at the contents of the records now, these would be filled with NULL values. Each time a new record is created, it initially contains a row of NULL-s which the user can then overwrite with their own data.
Here is our complete example (Examples/tut2.c):
#include <whitedb/dbapi.h> int main(int argc, char **argv) { void *db, *rec, *rec2; wg_int enc, enc2; db = wg_attach_database("1000", 2000000); rec = wg_create_record(db, 10); rec2 = wg_create_record(db, 2); enc = wg_encode_int(db, 443); enc2 = wg_encode_str(db, "this is my string", NULL); wg_set_field(db, rec, 7, enc); wg_set_field(db, rec2, 0, enc2); return 0; }
It is likely that you need to deal with more types than just strings and integers. The manual will provide a full list of supported types.
Once you’ve started working with WhiteDB, the wgdb
tool may come in handy
to manage the databases, so let’s take a quick look at it. First we deal with
database persistence, you may skip to Section 5.2, “Looking at data” if you’re not on
Windows.
The way shared memory works on Windows is that it is only present as long as there is a program holding a handle to it. So when we compile and run the previous example, the data gets written to the memory but then the program terminates and the database immediately disappears. To get around that, run
wgdb.exe 1000 server 2000000
in another window. That will keep the shared memory present, until you press CTRL+C. You can now run the tutorial programs and the following examples should work.
If you ran the program from the previous section, there should be some records in memory now. Let’s take a look:
wgdb 1000 select 20
It should return something like this:
[NULL,NULL,NULL,NULL,NULL,NULL,NULL,443,NULL,NULL] ["this is my string",NULL]
The "1000" in the command is the same shared memory key we used earlier. "select" prints records from the database and "20" limits the maximum number of records that will be shown. There is also a query command that lets you specify which records you are interested in:
wgdb 1000 query 7 = 443
That will only return the first record, the one where field 7 equals 443. There are other comparison operators: "!=" for not equal, "<" for less than, "<=" for less than or equal and so forth. Currently the query command does not have a row limit parameter.
The command line tools allows some data manipulation: deleting and adding records. The "del" command has the same syntax as the query command, so
wgdb 1000 del 7 = 443
will delete the first row from the database. We can also add records, but only integer and string values are recognized this way - dealing with other types unambiguously would become complicated.
wgdb 1000 add 1 2 3
This created a record with the length 3 and inserted three integer values in it. Let’s see what the database now contains. By the way, since "1000" is the default key, we may omit it:
wgdb select 20
The entire contents of the database will now be:
["this is my string",NULL] [1,2,3]
Finding records that match some condition is easy:
void *rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, NULL);
This returns the first record that has the integer 443 in field 7. That much is obvious, but some of the parameters might need extra explanation.
First, just a reminder that db
is the database handle we’ve been using
each time we call a WhiteDB function. As a second parameter we give the
number of the field that the database engine should check against the
value we’ve given.
The third parameter is the condition: we need some way of stating that we want records where "field 7" "equals" "443" so that is the "equals" part. There are other conditions, for example, if we substituted WG_COND_LESSTHAN there, we would receive a record where the value in field 7 is less than 443. The full list of possibe conditions is given in the manual.
The fourth parameter is, of course, the value. The function we called ended
with _int and that parameter should also be of the type int
. There is
a function for most of WhiteDB’s datatypes, so if we were looking for a
string we would use wg_find_record_str()
instead.
Now let’s turn our attention to the mysterious NULL parameter. Remember that our example function call returned the first record that matched our parameters? What if there are more matching records and we want to find those too? That can be done:
void *nextrec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, rec);
Instead of the NULL, we can give the record that the function returned last time and WhiteDB will return the next one that matches the same condition.
The following example will call wg_find_record_int()
in a cycle, finding
all the matching record from the database. We’re adding some records, so it
will print "Found a record…" at least once. Run it multiple times and see
the number of matching records increase (Examples/tut3.c):
#include <stdio.h> #include <whitedb/dbapi.h> int main(int argc, char **argv) { void *db, *rec; wg_int enc; db = wg_attach_database("1000", 2000000); /* create some records for testing */ rec = wg_create_record(db, 10); enc = wg_encode_int(db, 443); /* will match */ wg_set_field(db, rec, 7, enc); rec = wg_create_record(db, 10); enc = wg_encode_int(db, 442); wg_set_field(db, rec, 7, enc); /* will not match */ /* now find the records that match our condition * "field 7 equals 443" */ rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, NULL); while(rec) { printf("Found a record where field 7 is 443\n"); rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, rec); } return 0; }
The above method of finding data is convinient, but it can be too limited (and in some specific cases inefficient) so eventually we may need to make use of full queries. The query API has a number of features we will not be discussing here, instead we’ll look at only the basic steps.
Running a query in WhiteDB consists of preparing the argument list, creating the query itself and then fetching the matching records from the query.
wg_query_arg arglist[2]; arglist[0].column = 7; arglist[0].cond = WG_COND_EQUAL; arglist[0].value = wg_encode_query_param_int(db, 443);
The wg_query_arg
type is where we store one condition that the returned
records should match (or, a "clause" of the query, if you like more exact
terminology). Here we’ve specified again that we’d like to find records where
"field 7 equals 443".
Notice that we declared arglist
as an array of 2 elements? Well, this is
because we can give more than one argument:
arglist[1].column = 6; arglist[1].cond = WG_COND_EQUAL; arglist[1].value = wg_encode_query_param_null(db, NULL);
Now we’re looking for records where "field 7 equals 443 and field 6 equals
NULL". The value for both arguments is encoded and it’s recommended to use the
special wg_encode_query_param_*()
functions for that purpose.
wg_query *query = wg_make_query(db, NULL, 0, arglist, 2);
We pass the argument list and it’s size, which is 2, to WhiteDB (ignore the other parameters for now, they’re not used if you use the argument list). We receive a query object in return and are finally ready to start fetching records:
void *rec = wg_fetch(db, query);
The wg_fetch()
function will return a different record each time you call it
and eventually it will return NULL, meaning that you’ve already fetched all
the records that match our argument list. Finally we should do some
housekeeping, as queries may take up quite a bit of memory:
wg_free_query(db, query); wg_free_query_param(db, arglist[0].value); wg_free_query_param(db, arglist[1].value);
That was quite a bit of work to do essentialy the same thing we achieved with the help of just one function earlier, but it will also give you more power and flexibility. The following program (Examples/tut4.c) summarizes what we’ve looked at here:
#include <stdio.h> #include <whitedb/dbapi.h> int main(int argc, char **argv) { void *db, *rec; wg_int enc; wg_query_arg arglist[2]; /* holds the arguments to the query */ wg_query *query; /* used to fetch the query results */ db = wg_attach_database("1000", 2000000); /* just in case, create some records for testing */ rec = wg_create_record(db, 10); enc = wg_encode_int(db, 443); /* will match */ wg_set_field(db, rec, 7, enc); rec = wg_create_record(db, 10); enc = wg_encode_int(db, 442); wg_set_field(db, rec, 7, enc); /* will not match */ /* now find the records that match the condition * "field 7 equals 443 and field 6 equals NULL". The * second part is a bit redundant but we're adding it * to show the use of the argument list. */ arglist[0].column = 7; arglist[0].cond = WG_COND_EQUAL; arglist[0].value = wg_encode_query_param_int(db, 443); arglist[1].column = 6; arglist[1].cond = WG_COND_EQUAL; arglist[1].value = wg_encode_query_param_null(db, NULL); query = wg_make_query(db, NULL, 0, arglist, 2); while((rec = wg_fetch(db, query))) { printf("Found a record where field 7 is 443 and field 6 is NULL\n"); } /* Free the memory allocated for the query */ wg_free_query(db, query); wg_free_query_param(db, arglist[0].value); wg_free_query_param(db, arglist[1].value); return 0; }
You may run this program a couple of times, then run wgdb select 20
to
verify that the tutorial program prints the correct number of rows.
The examples we’ve been following up to now have been a bit sloppy. We haven’t bothered to check whether the WhiteDB functions fail or succeed, nor to clean up after after we were done - there was only the bit about freeing queries which was just too important to ignore.
First thing you should consider is that attaching to a database can sometimes fail. For example, it is possible that you requested more shared memory than the system configuration allows. So do a check like this:
void *db = wg_attach_database("1000", 1000000); if(db == NULL) { /* do something to handle the error */ }
Creating records or encoding data can fail if the database is full - actually a common occurence since memory databases are naturally smaller than the traditional disk-based ones.
void *rec = wg_create_record(db, 1000); if(rec == NULL) { /* record was not created, can't use it */ }
wg_int enc = wg_encode_str(db, "This could fail", NULL); if(enc == WG_ILLEGAL) { /* string encoding failed */ }
Notice the WG_ILLEGAL value? This is a special encoded value that WhiteDB uses
to tell you that something went wrong. All wg_encode_*()
functions return
that so it is easy to check for encode errors.
Decoding values can also fail. The most obvious case is when you expect some
field in a record to be of a certain type, but some other program or user has
written a different value there. This can sometimes be tricky to detect
so you should consult the Manual how a particular wg_decode_*()
function behaves.
If the database is used in a way that makes it difficult to predict what type of data a field contains, there is a way to find out:
if(wg_get_field_type(db, rec, 0)) == WG_STRTYPE) { printf("Field 0 in rec is a string\n"); }
Finally, here’s something that is nice to do whenever a program stops using a database:
wg_detach_database(db);
This detaches us from the shared memory and also frees any memory that may be allocated for the database handle.
Let’s try to apply all of those things in practice (Examples/tut5.c). The program creates records in a loop and writes a short string value in each of them. We have a small database, so soon we’ll run out of space, causing some errors which we are now able to cope with:
#include <stdio.h> #include <stdlib.h> #include <whitedb/dbapi.h> int main(int argc, char **argv) { void *db, *rec, *lastrec; wg_int enc; int i; db = wg_attach_database("1000", 1000000); /* 1MB should fill up fast */ if(!db) { printf("ERR: Could not attach to database.\n"); exit(1); } lastrec = NULL; for(i=0;;i++) { char buf[20]; rec = wg_create_record(db, 1); if(!rec) { printf("ERR: Failed to create a record (made %d so far)\n", i); break; } lastrec = rec; sprintf(buf, "%d", i); /* better to use snprintf() in real applications */ enc = wg_encode_str(db, buf, NULL); if(enc == WG_ILLEGAL) { printf("ERR: Failed to encode a string (%d records currently)\n", i+1); break; } if(wg_set_field(db, rec, 0, enc)) { printf("ERR: This error is less likely, but wg_set_field() failed.\n"); break; } } /* For educational purposes, let's pretend we're interested in what's * stored in the last record. */ if(lastrec) { char *str = wg_decode_str(db, wg_get_field(db, lastrec, 0)); if(!str) { printf("ERR: Decoding the string field failed.\n"); if(wg_get_field_type(db, lastrec, 0) != WG_STRTYPE) { printf("ERR: The field type is not string - " "should have checked that first!\n"); } } } wg_detach_database(db); return 0; }
To try this out, let’s first try to cause an error on attaching. Type these commands:
wgdb free wgdb create 999999
Then run the example program. Since we’ve already created a database and it’s
smaller than what the progam is requesting, it should complain and exit early.
When done with this, type wgdb free
again to delete the smaller database.
Run the example program again, this time there is nothing in the way so it
can create the database named "1000" itself.
The results can be a bit unpredictable, but you should either see a record creation error or a number of errors related to a string field. Of course, this does not mean that the program failed or did nothing useful - a number of records were created successfully, we just eventually ran out of database space. Everything our program printed has "ERR:" in front of it - the rest of the messages come from WhiteDB.
WhiteDB never locks the database for you. Whenever something is read or written, the engine just goes and does it without checking whether some other program (or user) is currently using the database. This goes with the philosophy of speed and simplicity.
But there are many use cases where parallel use of the database is needed and in those cases everybody cannot just crowd the database at the same time and start making changes to the shared memory area - that could result in inconsistent data, or worse, a corrupt and useless database. Fortunately, WhiteDB does provide the user with the tools to handle that.
The rule of thumb is that you need these concurrency control functions whenever the database is both read and written by several processes and possibly at the same time. For example, if a database is serving data to a webserver and there are occasional updates to the data (without shutting down the webserver), that would qualify as needing concurrency control.
To implement this concurrency control, we first request permission to read whenever we are about to read something from the database and similarly declare our intention to write to the database. Once we’re done we inform the database engine that we’re finished so others may proceed. So,
wg_int lock_id = wg_start_read(db); /* do some reading */ wg_end_read(db, lock_id);
requests permission to read. There may be some time until the function
wg_start_read()
returns - it may need to wait for some other process to
finish whatever it is doing with the database. Once it returns the lock_id
,
we have shared access to the database - we may read it safely and so may
other processes, but no one can write anything. wg_end_read()
declares
that we no longer need the read access.
It is quite possible that wg_start_read()
fails - it can happen under heavy
load or if some other process is hogging the database for a long time. We
should always check:
lock_id = wg_start_read(db); if(!lock_id) { printf("wg_start_read() timed out\n"); exit(1); /* or go and retry, whatever is appropriate */ }
Getting write access is similar, the major difference is that once we get the permission, we have exclusive access - everyone else has to wait until we’re done adding or updating the data:
lock_id = wg_start_write(db); if(!lock_id) { printf("wg_start_write() timed out\n"); exit(1); } /* do some writing */ wg_end_write(db, lock_id);
! | Note |
---|---|
At present time these functions behave exactly like operating on a single big database-level lock. This tutorial does not make a secret of it, however, the future direction may be that they start behaving more like transactions. The important thing to remember is that the purpose is to allow you to read and write data safely, without corruption and inconsistency. By following the pattern described here, your program will continue to work unmodified, no matter how WhiteDB implements things internally. |
To illustrate parallel use, we will implement a counter that is incremented from two programs simultaneously. This kind of example is frequently used in parallel programming tutorials, when done naively the counter counts incorrectly because the processes end up ignoring some of the increments done by the other process. We will place this counter value inside a WhiteDB database (Examples/tut6.c).
#include <stdio.h> #include <stdlib.h> #include <whitedb/dbapi.h> #define NUM_INCREMENTS 100000 void die(void *db, int err) { wg_detach_database(db); exit(err); } int main(int argc, char **argv) { void *db, *rec; wg_int lock_id; int i, val; if(!(db = wg_attach_database("1000", 1000000))) { exit(1); /* failed to attach */ } /* First we need to make sure both counting programs start at the * same time (otherwise the example would be boring). */ lock_id = wg_start_read(db); rec = wg_get_first_record(db); /* our database only contains one record, * so we don't need to make a query. */ wg_end_read(db, lock_id); if(!rec) { /* There is no record yet, we're the first to run and have * to set up the counter. */ lock_id = wg_start_write(db); if(!lock_id) die(db, 2); rec = wg_create_record(db, 1); wg_end_write(db, lock_id); if(!rec) die(db, 3); printf("Press a key when all the counter programs have been started."); fgetc(stdin); /* Setting the counter to 0 lets each counting program know it can * start counting now. */ lock_id = wg_start_write(db); if(!lock_id) die(db, 2); wg_set_field(db, rec, 0, wg_encode_int(db, 0)); wg_end_write(db, lock_id); } else { /* Some other program has started first, we wait until the counter * is ready. */ int ready = 0; while(!ready) { lock_id = wg_start_read(db); if(!lock_id) die(db, 2); if(wg_get_field_type(db, rec, 0) == WG_INTTYPE) ready = 1; wg_end_read(db, lock_id); } } /* Now start the actual counting. */ for(i=0; i<NUM_INCREMENTS; i++) { lock_id = wg_start_write(db); if(!lock_id) die(db, 2); /* This is the "critical section" for the counter. we read the value, * increment it and write it back. */ val = wg_decode_int(db, wg_get_field(db, rec, 0)); wg_set_field(db, rec, 0, wg_encode_int(db, ++val)); wg_end_write(db, lock_id); } printf("\nCounting done. My last value was %d\n", val); return 0; }
Before running the program, type wgdb free
. We’ve tried to keep the example
as simple as possible, so it needs a completely empty database to start. Next
start the program in one window; it will prompt you to press a key. Now go
to another window and start the program one more time - this one will just
wait and do nothing.
Both programs are now ready to start counting, but the first program is
waiting for a key press and the second one is waiting for the counter to get
it’s initial value. Go back to the first window and press a key. The counter
will be initialized and both programs proceed to increment it 100000 times.
Type wgdb select 1
to check what the value of the counter is - it should be
200000.
If those programs were for different sensors in a real-life program, counting some event, we would have managed to get the correct number of events in the end.
If you can picture the content of a WhiteDB database, there isn’t necessarily much structure there - just many, many records which may be of different sizes. Because you can query records by the field contents, the fields can be thought of as columns and the entire database as a flat table where the records form the rows.
To fit any kind of data model in WhiteDB, the user needs to create the structure themselves. Carrying over the RDBMS mindset and creating, for example employee, address and customer records, which can be referred to by some id field, would certainly work, but there is another way.
Before there were relational databases, there was the network model. It is a fairly simple concept where instead of referring to to something by key, you include it directly: if I want to add an address to a customer, I just insert the address into the customer record. The difference may seem superficial, but the underlying mechanics are very different.
WhiteDB implements the network model in a straightforward way. This tutorial has already shown how to write an integer or a string into a field inside a record, putting another record there works exactly the same:
void *rec = wg_create_record(db, 2); /* this is some record */ void *rec2 = wg_create_record(db, 3); /* this is another record */ wg_int enc = wg_encode_record(db, rec); /* encode as a WhiteDB value */ wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL)); wg_set_field(db, rec2, 2, enc);
Assuming we don’t add any other fields, the first record now looks like this:
[NULL, "hello"]
. But the second one is more interesting: [NULL, NULL,
[NULL, "hello"]]
. Of course, the data isn’t actually duplicated inside the
second record, its field 2 just contains a reference. But if your program
already holds rec2
, getting to the contents of rec
is very fast.
So how does one make use of such data? If I have the second record, I can get to the message inside the first record like this (checking for the type isn’t mandatory of course, this is just to show that the record is a datatype just like an integer or a double):
if(wg_get_field_type(db, rec2, 2) == WG_RECORDTYPE) { void *rec = wg_decode_record(db, wg_get_field(db, rec2, 2)); char *message = wg_decode_str(db, wg_get_field(db, rec, 1)); /* we find that the message is "hello" */ }
Linking records together can produce a graph, a tree or whatever is needed. Or we can treat the records as containing other records inside, so our initial view of them as flat table rows was a bit deceptive and a set would perhaps be a more accurate description.
! | Note |
---|---|
A field containing a reference to a record isn’t different from other fields. It can still be indexed and you could write a query like "give me a record where field 3 is [ "owner", [ "name", "Peter", "age", 25 ]]" and it would work. However, while record references are fast, keeping track of such recursion can be expensive so it is best to avoid deep chains of records on indexed fields. |
The next program (Examples/tut7.c) creates some records and links them
together. Run it, then use wgdb select 20
to examine the database contents.
The records will appear nested inside each other:
#include <stdlib.h> #include <whitedb/dbapi.h> int main(int argc, char **argv) { void *db, *rec, *rec2, *rec3; wg_int enc; if(!(db = wg_attach_database("1000", 2000000))) exit(1); /* failed to attach */ rec = wg_create_record(db, 2); /* this is some record */ rec2 = wg_create_record(db, 3); /* this is another record */ rec3 = wg_create_record(db, 4); /* this is a third record */ if(!rec || !rec2 || !rec3) exit(2); /* Add some content */ wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL)); wg_set_field(db, rec2, 0, wg_encode_str(db, "I'm pointing to other records", NULL)); wg_set_field(db, rec3, 0, wg_encode_str(db, "I'm linked from two records", NULL)); /* link the records to each other */ enc = wg_encode_record(db, rec); wg_set_field(db, rec2, 2, enc); /* rec2[2] points to rec */ enc = wg_encode_record(db, rec3); wg_set_field(db, rec2, 1, enc); /* rec2[1] points to rec3 */ wg_set_field(db, rec, 0, enc); /* rec[0] points to rec3 */ wg_detach_database(db); return 0; }
! | Note |
---|---|
the code included in this section isn’t here to show you how to do things but rather to illustrate what would happen if things were done this way. |
Consider our earlier example with two records, one of them containing the important message "hello". Let’s try to use techniques that work in relational databases to accomplish the same thing. Storing the data isn’t complicated:
void *rec = wg_create_record(db, 2); void *rec2 = wg_create_record(db, 3); wg_int hello_id = wg_encode_int(db, 11913); /* an arbitrary record id */ wg_set_field(db, rec, 0, hello_id); /* assign the id */ wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL)); wg_set_field(db, rec2, 2, hello_id); /* reference by id */
So far so good. The records should now contain [11913, "hello"]
and [NULL,
NULL, 11913]
. Pretend again that we have just the rec2
(as a result of a
query, for example) and need the contents of rec
:
int hello_id = wg_decode_int(db, wg_get_field(db, rec2, 2)); /* this will search the database for 11913 */ void *rec = wg_find_record_int(db, 0, WG_COND_EQUAL, hello_id, NULL); char *message = wg_decode_str(db, wg_get_field(db, rec, 1));
If you have an index on field 0, this doesn’t look that bad. Sure, the value
"11913" needs to be looked up, but wg_find_record_int()
can manage it
reasonably fast. What if you had to do this inside a loop, though? How about
a nested loop? Executing a million queries isn’t that fun anymore, even if
they don’t take up time individually - wg_decode_record()
would have taken
a fraction of that.
The lesson to learn from here is that whenever you have a data model where objects have relations to each other and need to perform JOIN type queries on them, it can be much faster in WhiteDB to implement the "join" in advance by linking the object records to each other. Those links can then be navigated rapidly to collect all the data.
Another use case is storing semi-structured data. We have used the network model in our implementation of JSON documents in WhiteDB (a work in progress) where for example an array can contain other arrays or objects: the record linking feature allows us to accomplish this elegantly by making the array a record whose fields contain links to other records holding the child elements.
There are a few more examples distributed with WhiteDB that were not covered in this tutorial. You may look at Examples/demo.c and Examples/query.c that should be commented well enough to be understandable by now.
Examples in Examples/speed are geared towards speed testing and are covered in http://whitedb.org/speed.html
A bit more involved example to look at is Server/dserve.c: making queries
from WhiteDB with a simple REST cgi program giving json or csv output.
dserve
is useful both as it is and as an example/template for making your own
tools for WhiteDB data handling: see http://whitedb.org/tools.html