A White Paper by David Baka

# Modeling Data Structures with a Nadair

At a fundamental level a program is made up of operators and operands. Operands are areas in memory that hold data. Operators are the instructions that modify the data.

Consider the following example in pseudo-assembly:

Here the operator is add, the operands are 5 and 7. In more sophisticated programs data can be stored in many complex ways such as arrays, tables, lists, linked lists, queues, dequeues, records, collections, vectors, b trees, hash tables, sparse arrays, and few others. In creating an initial design determining which data structure can become a surmountable task because each has it’s own pros and cons.

First of all one has to consider how the data will be operated on. The data is influenced by code which may change it's value or scope depending on what code is applied to manipulate the data. For example SQL that contains a join will change the scope of data that can be viewed in the resulting record set.

To help determine the data design of more complex programs a mathematical model called a Nadair can be used. The Nadair has two main parts, the data structure and the influences on the data structure. Here is a simple example:

T1 is a table containing rows and columns, T2 is a table containing rows and columns.

J represents the join between the two tables. K is used to denote a list of Influences. the Symbol >=> means “Acts On” denoting that the data structure follows will be acted on by the proceeding influences,.

After the Symbol >=> is a complex set of symbols which represents the Baka Matrix. The number at the top left corner denotes the number of layers of the matrix. A matrix can represent any data structure and can be as abstract or defined as needed. In this case the matrix  has one layer so the data structure appears to be two dimensional. It contains two elements both in brackets. The brackets denote that these elements are a Matrix as well but the structure is defined or will be defined somewhere else. In this case the two elements are tables. However if we wanted to use arrays instead of tables for our structure the model would not have to change only the meaning of the symbols would change. However the code that we would write to implement this would be considerably different.

What if we wanted our model to represent multiple join between several sets of two tables that we might use to create a view, what would our model look like?  Look at the formula below.

For the Influences J was replaced with J1 - Jn meaning that we could have 1 though n number of joins involved.  In the matrix then number  1 was replaced by N meaning there could be several layers of matrix depending on what N is equal to.

In database management systems tables are have key fields which are used to relate tables in a hierarchy of tables. For example you may have a table for Departments and a table for Employee

Each department has many employees. There is a one to many relationship between departments and employees. The department table has the key field Department Number and the Employee Table has the key field Employee Number.  Each employee record will have a field which contains a department number. This relates the record in the Department table with the same department number to those records in the employee table. The department record is said to be the parent record of the child employee record. In the Nadair model this is denoted by putting the Department element (represented by D) first followed by a semi-colon. The Employee element (represented by E) is placed after it as in the following diagram.

A common element in a data structure is called cardinality[1] and is represented by 2 pipe symbols on both sides of the element. This data element contains the number of elements in a structure. For example if  we have an array S  the number of elements in S would be represented as ||S|| .

This is very useful when we have a data structure such as an array where we are adding values to the elements of the array and want to keep track of where the last element was written to. For example let say we are designing a program that will load records into a database. For this database, only 3,000 records can be loaded at a time. We need to load 50,000 records so there will have to be some mechanism by which we only load batches of 3,000 records at a time.

To do this we will assign a batch number  to each load. The  records are currently on a legacy system and must be formatted to fit the data definitions of the new database. A new set of tables will be created to hold the formatted data before sending it to the new database. During this formatting Batch Numbers will be assigned to the records and when the 3,000th record is assigned the batch number will be incremented by 1.

These batch numbers will have to be stored in a structure so that when the records are loaded the batch numbers can be recalled to load each batch in Sequence and can keep track of the batches that are rolled back.

To do this we might have an array with 2 X 20 elements running from 0 to 1 and 0 to 19. We can think of this array as two rows of numbers with row 0 being the top row and row 1 being the bottom row as in the following diagram:

The first column will hold the number of elements that are filled with batch numbers in element 0, 0 ( the top left most element). The bottom element of the first column will hold a number representing the state of the process, 0 is for creating batch numbers, 1 is for loading batches in the database. As the new batch numbers are assigned the element 0,0 will hold the index (the numbers on top) of the last element (last column) assigned a batch number.

The diagram below shows what happens after  batch number 1000 was assigned.

Now Element 0,0 contains 1 because the last column to be assigned a batch number is column 1.

So row 0 could be represented like this:  ||B||, B where B represents columns 1 through 19.

In a Baka Matrix the first row of our array would look like this:

In this case we use |[B]| instead of ||B|| because B is now defined as it’s own matrix [B].

It does not matter that both elements |[B]| and [B} are going to be in the same array because we are not defining what data structure we are using in the Baka Matrix. We are only defining what elements make up the structure and how they are related. It’s an abstract model not an exact definition. We could have the same functionality by using a pointer and linked lists if we are using a language that supports them  such as Pascal or C.

In the bottom row  have a status indicator for the entire structure that represents whether the structure is being assigned batch numbers or the batch numbers are being read to load records for each batch. This type of element which desolates a state for the structure is symbolized by adding a sharp sign (#) after the symbol used to represent that status. In this case we will use S#

|[B]| and S# are elements that describe the structure and are used by influences to determine what actions should be taken.  These elements are usually at the beginning or “Head” of the structure and are referred to collectively as the Header. To further denote the header information

the header element are usually placed at the top of the Baka Matrix and may be separated from the rest of the elements by a line. The following example shows this arrangement.

In this example L* represents the status for each column for Columns 1 - 19 which are stored on the bottom row of the array. In this case the Symbol * is used to denote that the element is a status indicator because it only indicates status for the column not the entire structure. L*  will hold the status for loaded, rolled back or not loaded. The actual value stored will be a number that is used to represent these states.

Let’s continue with our example of assigning record batches to an array for tracking batch numbers. If another batch number is assigned then the data will look like this:

Element 0,0 has been changed to 2 because the last element to be assigned a batch number has the index number 2. If my array is called batcharray then I can indicated the last batch number by batcharray[0, batcharray[0,0] ][2] because batcharray[0,0] holds the index of the last batch number assigned. This is also important in my program because if I read past this column I will no longer be accessing a valid batch number. Batcharray[0,0] is used while loading batches to tell the program when to stop. It could be said that Batcharray[0,0] “Points” to the last column of the array.

This is demonstrated in the diagram below.

Technically Batcharray[0,0] is not really a pointer is more like a handle. A pointer is variable type that contains the memory address of another variable or structure. Batcharray[0,0] contains a number or an index which hold the memory address of the array elements. This is a method know as indirect addressing.

What changes will have to be made to our Baka Matrix to indicate if the structure is loading or assigning batches?  The answer is None! The assigning of values belong to functions not data structures. In order to complete the design there will need to be two influences. One will assign batch numbers and one will load records. These influence will have a lot of smaller steps to them such as changing the status of indicators and setting the index of  the array in batcharray[0,0].

These details can be documented later. They could even be documented as a flow chart, pseudo code, UML or whatever methodology you prefer.

Here AB represents Assigning Batches and LB represents Loading Batches.  To complete the design for this level of definition we will add layers because there are a several tables that must be loaded each with there own set of batch numbers. We will also show the definitions of the elements by listing them underneath the Nadair. The following diagram illustrates this.

AB: Assign Batch (See Flowchart 1)

LB: Load Batch (See Flowchart 2)

-

N: The number of table to load

B: The list of batch numbers

As you can see this is a high level design The element descriptions are listed below the Nadair. First the elements of the Influences and then separated by a dash are the elements of the Baka Matrix.

The N in the upper left hand corner stands for the number of tables that may be loaded. But what if N = 0? Would that be an invalid state? What if we wanted to specify in our design that there will always be at least one table that will be used at any time the program is ran?

A new symbol called the existential[3], which is in the diagram below to represent the existence of at least one case.

In this case it means “There is at least one x such that x has property R”. This can be applied to a Nadair as in the following diagram.

Now there is at least one N where the Nadair exists according to this updated design.

Up until now we have looked at the concept of our program as what data structure we will use and then designed our Nadair from it. Although this is useful for explaining the concepts behind the Nadair, in practice we should design the Nadair first then use it to figure out from the Nadair what data structure and functions best suit the purpose.

Let’s start off on a new idea for a program. This program will mimick SQL by using a Graphical User Interface where you select fields and commands from drop down lists on a form. The program will be able to do everything SQL can do but does not require the user to write any SQL code.

First let’s list all the things we can do with SQL.

1) We can select a list of fields which will display all the values for those fields for all records selected

2) We can limit our selection by specifying that for some field only certain values are desired

3) we can count, get the sum of, get highest number, get lowest number, etc for a numeric field

4) We can sort records, ordering them by field values in ascending or descending order.

5) We can insert records into a table

6) we can create a table from a list of values

7) we can update fields in a record

8) We can join tables together

There are many other functions as well but this is enough to keep us busy for awhile.

[1] Knuth, The Art of Computer Programming Series (Information Structures)

[2]  Visual Basic uses parenthesis ( ) instead of brackets in arrays

[3] The Existential is commonly used in inductive math and logic.