The Architecture of Lotus Notes
This is it! A comprehensive explanation of what Lotus Notes really is and how it works.
Editor's Note: This article is an oldie but a goodie. Among our most requested, this article by Lotus engineer Hugh Pyle comprehensively explains the basic structures and concepts of Lotus Notes. While written in the context of Notes 3 with some specifics about 4, the core of Notes and Domino is essentially the same in Notes 5 and 6. Historians should note that this article was written before IBM purchased Lotus Development Corp.
Lotus Notes technology is
a departure from traditional database or messaging products. Traditional
database systems are the backbones of most corporate information systems. They
are designed to manage the tabular information generated by business
operations, such as order processing, inventory control, and payroll
management. These applications are data-centric, organizing information by
breaking it into basic elements. Knowledge and information are gained only by
sorting and querying these data elements in various ways. They are
transaction-oriented, built to reflect the most current state of the data. They
are generally not good at reflecting the changing states of information over
time. If they are distributed, they require a single-system image so that if a
number is debited in one place, itís debited everywhere in a single
transaction. In this sense they canít easily handle access by disconnected
Traditional electronic mail and messaging systems, on the other hand, are designed for efficient transmission of messages from one place to another. They can handle simple and complex information, and they can deliver it to specific individuals or applications. However, they generally have no facility for capturing or tracking the information; they simply provide reliable delivery.
Most organizations need robust and scalable strategies for both databases and messaging, to manage and disseminate knowledge. This is the mission of Lotus Notes.
Because it is unique, fully exploiting Notes requires a new perspective. This article explains the structure of the Notes programs on the client and server and the structure of the Notes database (figure 1). This article bridges the gap between general architectural descriptions and detailed information which can only be found in the API documentation.
For groups of people to work together, itís important that technology doesnít obstruct communication. Therefore, Notes software is provided for a wide variety of operating systems, and itís drivers support all the major network protocols. The result is an any-to-any connection matrix, allowing freedom of choice. Users can communicate anytime, anywhere, irrespective of their particular working environments.
Descriptions in this article relate to Lotus Notes version 3. However, the basic structures have been in place since version 1. layout, Notes still preserves backward compatibility. Future versions will continue to add features without changing the essential design. Documentation of the C data structures is in the Notes API Toolkit and isnít duplicated here.
Lotus Notes is a true client-server system. The Notes server, running on Windows/NT, OS/2, NetWare NLM, Solaris, SCO, IBM AIX, or HP/UX, does more than simply take queries from database users and return result sets. It maintains application-defined indexes, stores and forwards electronic mail, replicates databases with other servers or workstations, performs agent tasks, and is extensible to accommodate user-created agents or third-party add-ins. The Notes client, running on Windows, OS/2, Macintosh, Solaris, SCO, IBM AIX, or HP/UX also contains most of the server functionality. This lets users replicate databases from a server and take those files away from the network; send and receive e-mail; create, modify and index documents locally; and synchronize replica databases over a LAN or WAN connection
In addition to giving you choice, client-server is inherently more secure and efficient than simple file-sharing. Access to the data is controlled by the server at a fine level of granularity, resulting in smaller amounts of data being passed between the user's workstation and the remote server.
From the outset, Lotus Notes was developed as a secure system. Done in a manner that works across vendorsí operating environments, Notes utilizes technology licensed from RSA Data Security, Inc. of
Notes provides four classes of security:
∑ Authentication -- Identifies users using the industry-standard X.500 hierarchical naming syntax. Authentication in Notes is bi-directional. Servers authenticate the identity of users and users authenticate the identity of servers. Authentication is used whenever a user and a server or two servers are communicating with each other.
∑ Digital Signatures -- Guarantee that a given message is from whom it says itís from, essentially a user-to-user form of authentication. In addition, this technology enables the computer to notarize all, or a portion, of a message and guarantees that the message has neither been forged nor altered in transit.
∑ Access Control -- Allows or denies access to shared databases, documents, views, forms, and fields. Server access can also be controlled for individual users by either allowing or denying access to specific Notes servers within the organization.
∑ Encryption -- Involves ciphering or scrambling information so that even if accessed by the wrong individuals, it canít be understood. Encryption is available at three levels in the system. At the message level, individual messages can be encrypted for one or more intended recipients. At the network level, encryption prevents someone from promiscuously sniffing (tapping into) traffic on a LAN or dial-in line, because they canít see anything intelligible. At the field level, databases can be designed to encrypt document fields so that only specified users can read them.
There are no back-doors into the system. Neither Lotus nor any
hacker, foreign or domestic government agency, nor competitor has or will have
a super-user capability in your information system.
Lotus Notes includes an e-mail system which uses its native objects, Notes documents, and provides store-and-forward. The Notes server network intelligently provides best-path routing analysis and fault tolerance. The Notes kernel and mail transfer agent (MTA, also referred to as ROUTER) implement features, such as blind copies, delivery confirmation, return receipts, encrypted messaging, and sender authentication. Electronic mail is tightly integrated into Notes applications.
For those who wish to retain an existing e-mail system, the Notes workstation software lets users send mail to any VIM-compliant system and (with the forthcoming VIM-to-MAPI converter) to any MAPI-compliant e-mail transport. The coexistence of Notes with alternate mail systems is transparent to the users. To provide back-end integration between different mail systems, Lotus and its business partners have developed a large number of mail gateways, connecting to all the popular LAN- and host-based e-mail products.
E-mail standards are very important to this. The coming Lotus Communications Server (LCS, based on the Notes 4.0 server) provides native X.400 and SMTP/MIME MTAs in addition to the existing Notes and cc:Mail transport protocols. LCS will also provide three standard mail interfaces for programmers: VIM, CMC, and MAPI SPI. This integrated architecture gives a robust and flexible solution to enterprise mail requirements.
Store-and-forward e-mail is necessary but insufficient for many applications. When used as a method of database synchronization, a unidirectional message route is not robust. There needs to be a connection-oriented, bi-directional, method of document sharing.
Lotus Notesí unique replication facility means that databases on different servers are automatically synchronized and change conflicts are resolved. This scales easily from two to thousands of replicas of a database. Replication does not require administrators to configure the individual database connections. Nor does it require a particular server-connection topology. When any two copies of a database are replicated, each exchanges the latest modifications with the other, guaranteeing that they both finish in synchronization. If the network link breaks during this process, thereís no need for commit and rollback types of protection, since replication will continue when any link between replica databases on different servers is re-established.
The combination of true replication with the Notes object store, where documents and application design elements are self-contained note objects, releases workgroups from constraints of time and geography (figure 2).
Notes servers can be configured to communicate with one another in peer-to-peer, hub-and-spoke, and ad hoc topology. Due to the flexible nature of Notes, topology planning can be independent of application or information strategy planning, thus simplifying deployment within a large organization.
Also, because of the peer nature of the Notes architecture, interconnection of servers to other servers or users to servers is treated identically. This is why itís so easy and natural for mobile users to use a modem to connect their computers into the system. Notes is designed for occasionally-connected communications, whether the entity communicating is a user or another server.
Lotus Development Corp. is committed to making a Notes network manageable. Version 3 includes remote console administration and actively generates alerts that can be sent to network administrators through e-mail. Alerts indicate such things as low disk space, failed network connections, and communication bottlenecks. In early 1995, Lotus will provide notesView, an SNMP-based application to manage all aspects of the messaging system. Lotus messaging management facilities will support industry standard protocols such as SNMP, allowing the Lotus Communications Server to be managed by other vendorsí management systems.
Notes kernel services
The core modules of Notes are shared between client and server code. This kernel is exposed as the Notes C-language API. One function of the kernel is to insulate its clients from the underlying hardware and operating system. This approach is necessary in any architecture that must span existing and future operating-system choices. Insulation of the groupware application developer from the platformís quirks is the only way to create a truly open system, where choice of software is not constrained by a particular hardware or OS vendor.
Application developers, whether working at the API level or in the Notes user interface, gain three key isolation layers: from the operating system, from the network transport, and from physical location of the object store (via a transparent form of remote procedure call).
Other kernel functions include security, database access, document structures, indexes, and full-text search capabilities.
All these core modules are available to higher layers of Notes, including API programs, in the client and in the server environments, on all the platforms supported by Notes. For example, Notes API code can be written to be completely portable between Windows, OS/2, various flavors of UNIX, and NetWare NLM. Figure 3 shows many of the components.
Figure 3 -- The Notes kernel, the major constituents of the Notes workstation and server, and the integration options available to API developers.
Important modules in the Notes kernel are:
∑ OS -- Operating system isolation layer. Provides platform-independent access to memory, shared resources, semaphores, environment information, and so on. Beneath the isolation layer are highly optimized implementations of these services for each individual operating system.
∑ SEC -- Security module. Gives access to user information, certificates, and encryption keys, based on the BSAFE security package from RSA Security, Inc.
∑ NSF -- Notes storage facility. Manages the Notes database, allowing its users to create, open, and delete databases; create, open, and delete documents; and store and retrieve information. Part of this function includes on-disk structure (ODS) management, which ensures portability of the .NSF file format across platforms and on the network. All the published NSF interfaces are independent of the database location; a remote-procedure call (RPC) system transparently redirects requests to the local disk or to the appropriate server. This RPC layer is also used by other kernel modules, such as NIF (the indexer) to retrieve a pre-built index from the appropriate server.
∑ NET -- Network transport layer. Provides a single interface to drivers for many networking protocols. It can initiate and receive phone calls or create LAN-protocol sessions for communication over ports defined by connection documents in the Name & Address Book.
∑ COMPUTE -- Performs calculations, using the Notes formula language. Notes API programs can ask COMPUTE to create and evaluate formulas and also to implement custom @ functions which are called by the COMPUTE module.
∑ NIF -- Notes index facility. Manages indexing of Notes documents into views. Views define the selection of particular documents, and columns containing information from those documents (or calculated values based on the documents). Columns can be sorted, or categorized (where similar values appear under one heading), and can use the hierarchy of main and response documents in a database. These collections of documents are indexed in a B-tree structure for presentation to the user as a Notes view. NIF is responsible for maintaining and using the indexes, adding and removing information as documents are modified.
∑ FT -- Full text index facility. Provides content-based retrieval and weighting with Boolean-logic search through the full text of any document.
∑ NAME -- User directory service. Not strictly part of the kernel code, but has some direct APIs which give access to the Name & Address Book. This is a special Notes database which contains user names and e-mail addresses. User identification uses a X.500-based naming model and conforms to X.509 certificate and authentication standards.
The Notes client workstation software includes the Notes kernel and four major additional modules:
∑ DESK -- Notes desktop manager. Handles the layout of icons and the desktop, storage of user preferences, and general coordination of the Notes user interface.
∑ EDIT -- Notes document and forms editor. Also includes, creates, and renders compound documents.
∑ VIEW -- Notes view manager. Manages the userís interaction with indexes of the database.
∑ NEM -- Graphical-environment manager. Provides high-level abstractions of the graphical user interfaces present on various platforms.
The Notes server consists of one core server program. It manages the serverís other processes and threads, and usersí connections with the server. The server functionality is largely implemented with modules. Some are necessary; others are optional or developed by third parties. Important modules include the database replicator (which schedules and connects to other servers and workstations to replicate databases), the indexer (which keeps indexes up-to-date for immediate access by workstations), the mail router (which directs mail between mailboxes and between servers), and Chronos (which schedules agents to perform background tasks in
Notes databases at the application designerís request).
This server platform gives powerful, extensible intelligence. It can automatically connect to other servers to exchange mail and replicate databases. It uses the host operating system to exploit its particular features, such as SMP multiprocessor systems to spread the workload. It lets the clients be mobile, so a workstation can use local Notes databases when not connected to a network, and can connect to any server by LAN or dial up for direct server access or to replicate and exchange mail.
The transport between client and server is implemented in a number of drivers under a common network layer. Selection of the appropriate driver for any named port is controlled by the user. Network port selection is also user-definable if the machine runs multiple network protocols and has a preferred hunting order. Selection of the appropriate serial port for a WAN connection between any two machines is controlled through connection records in the Name & Address Book.
When a connection is established, the client's identity must be verified. The server creates a random number and encrypts it using the alleged client's public key, then asks the client to decrypt. This decryption can only be performed using the client's private key; if the resulting value is returned to the server correctly, the user is authenticated. Then a secret key can be created for secure channel of communication between client and server. The server starts a thread to process this client's requests.
The client-server request protocol lets the client ask the server to perform various bulk actionsócreate a full text index, open or compact a database, categorize or search a number of documents. This is the first level of delegation of processing by the client.
The second level, a form of remote procedure call, occurs inside the kernel modules. For instance, if the workstation calls NIFOpenCollection to open an index, the internal RPC layer transfers this request to the server, which updates the index if it is out of date. And when retrieving a view, only the necessary portions are transmitted for display on screen. Due to the RPC layer, these processes are independent of whether the database is on the local hard disk or on a server, and of whether the server is connected by LAN or over a telephone line.
Users compose, read, manipulate, forward, and interpret Notes documents. At the user interface, there are two main tools for document access. Forms give structure to documents and provide the editing environment. Views collate, categorize, and report on many documents in a database. Other tools are available. Macros can run on
the workstation and the server, on individual documents and on batches of documents. The full-text-retrieval engine is also an important tool.
Different users have differing perceptions of how this all works. The parable of the elephant in a darkened room seems appropriate. It is sometimes difficult for software professionals to "place" Notes in relation to other development tools, document management and workflow applications, relational databases, and legacy messaging systems.
Hereís a user view of Notes, using terms from object-oriented design:
∑ The user interacts with objects in the Notes system. The data of these object instances is held in Notes documents, in fields of various data types. The methods with which they manipulate the data are defined in Forms. Views are browsable collections of summary information from many object instances, allowing the user to search, sort, and relate the information in a database.
∑ Notesí formula language provides ways to manipulate the object data. These methods are augmented by OLE, and Datalens and ODBC drivers.
∑ There are two ways to relate the document "instances" and the form "methods". First, Notes stores in each document the name of the form used to create or modify the document; this is a loose linkage. Second, an alternative tight linkage can be established by storing the form in the document. This is appropriate where a document and its form (an object instance and its methods) must be sent together through e-mail. The loose linkage is natural in most Notes applications, where each database contains both data (documents) and the associated design elements (forms and views).
This is a valid description of Notes as a document-object database. But internally, thereís really little distinction between data documents, form designs, view definitions and other design elements. They are just different Notes Classes. This abstraction means that replicated databases not only distribute the object data, but also the object methods, using one consistent infrastructure. Distribution of applications in this manner is one of Notesí key strengths.
The simplified diagram in figure 4 illustrates how users interact with these documents, forms and views; and in the database, how the data and design elements are just different-colored instances of a more abstract Note.
Figure 4 -- Notes database, simplified, showing how users interact with major components.
Structure of the note
A note, whether a document, a form definition, or whatever, has a consistent and flexible structure. The reserved structures which are used to create the design notes are described in more detail later. A Note always consists of:
∑ NOTEID -- Identifies this Note in this particular database.
∑ ITEMS -- One or more fields containing data. In the Notes API, and this article, the fields in a Notes document are referred to as Items; the term field is reserved for describing field definitions on a Notes form.
∑ ORIGINATORID -- Identifies this Note in a globally unique way, so the same document can be identified in any replica of the database, and identifies the last modification, so replication conflicts due to simultaneous edits can be resolved.
The ORIGINATORID deserves more explanation. It is the structure that identifies all replicas of the same note. Here are its members and what they contain:
∑ File -- Unique random number, generated at the time the note is created.
∑ Note -- Date/time when the first copy of the note was stored into the first .NSF. (The structure to this point is also referred to as the UNID).
∑ Sequence -- Sequence number used to keep track of the most recent version of the note for replicated data purposes.
∑ SequenceTime -- Sequence number qualifier that allows the replicator to determine which note is later, given identical Sequenceís. Both are required. The sequence number is needed to prevent someone from locking out future edits by setting the time/date to the future. The sequence time qualifies the sequence number for two reasons: It prevents two concurrent updates from looking like no update at all. And it forces all systems to reach the same decision as to which update is the latest version. These are the key decisions on which replication is based.
Every note in the database can have a different number of items, of different data types. Contrast this self-contained flexible document format to traditional relational database systems, in which the structure of a table is defined for all records. Relationships between documents are provided by some internal structures, such as the relationship between main documents and responses, and by any application-defined structures, such as lookups into Notes views or external databases, to give referential integrity.
The NOTEID is a compact and efficient identifier for a Note. It provides a record relocation vector (RRV), essentially a file position pointer to this note in this database. When you need to store lists of document IDs or to identify collections of documents or lists of unread documents, thereís a structure called IDTABLE, which is a compressed list of NOTEIDs.
Neither the ORIGINATORID nor its subset the UNID is not a file pointer, but is rather a unique identifier constructed from random numbers and timestamps. An internal index in the database gives fast access to a document by knowing its UNID (for example, so that the replicator can identify documents in common between database replicas). In relational database terms, the UNID is the primary key. To the user, the Views provide the important document indexes, each having application-defined keys.
A Notes database contains documents and design elements (Notes) which each contain and define their own structure. This structure can include an enormous variety of data from different sources. Certain internal structures build on this open object store. Notes applications provide the user with tools for more- or less-structured interaction.
Summary and non-summary items
Some data items contain simple data types and are used to contain summary information about documents: author name, creation date, title, e-mail recipients, and so on. Others contain compound information: rich text, embedded objects, graphics, and the like. This distinction is important in any object management system that has to index and sort documents for user accessibility. In Notes, this distinction is made by flagging items as summary or non-summary.
Summary items are available for computation in Notes formulas, for lookup into or from external data sources, and for indexing and presentation in views. Non-summary items cannot be evaluated in formula, nor can they be used for collation. They provide the storage for compound objects.
Object linking and embedding (OLE) is a powerful method of packaging and referring to documents from a server application in such a way that they can be embedded or linked in other client applications. For example, a Freelance presentation or a Paintbrush graphic can be embedded in a wordprocessing document. The embedded object can be edited by clicking on the object, launching the OLE server it is associated with. Various verbs implement the different modes in which an OLE object can be manipulated; for a sound or video object, verbs might be Play and Edit.
This packaging is important because it provides a way for applications to include data types they donít directly handle. The client software need know nothing about the embedded object except its server name. The Registration Database in Windows gives more information about the server program, such as its location on disk and its verbs. OLE objects are opaque to the container application. A product such as Notes, designed to be a flexible object container, can store many OLE objects in many documents, but this gives little benefit to users unless they know something about the documents. It is not sufficient simply to know "This is a spreadsheet".
For this reason, Lotus developed the Notes/FX protocol. This enables bi-directional field exchange of summary information between the server and client. Using Notes/FX, a Notes application can store OLE objects and can also find important information about their contents, such as the number of pages, the values of spreadsheet ranges, and so on. In the Notes database, this summary information can be indexed and searched by users. So, a database of expense-report spreadsheets can be summarized and totaled by Notes. Field values from Notes documents can be placed into the spreadsheet, word processor, or server. For example, this can avoid duplicate storage or typing of addresses in a customer-service application. This gives true value to embedded objects. Microsoft has recently adopted Notes/FX in their product suite.
Document structures are usually defined by the forms with which they are edited. Additionally, there are important items for system use. These items include:
∑ Form -- Contains the name of the form used to create or edit this document. This may be overridden by specifying a form formula in a view, which can define the form used to display documents according to context. Additionally, the form definition can be stored as system items in the document itself.
∑ $REF -- Contains a reference to the documentís parent, if it is a response document. This item creates the response hierarchy, used in threaded discussions and to link (for example) correspondence documents with a customer-details document. The indexer uses this item to reflect the response hierarchy in Views.
∑ $Header and $Footer -- Contain print headers and footers for this document.
∑ $File -- Contains summary information about attached files and OLE objects, and pointers to their object RRVs.
∑ $Readers -- Contains the names of people authorized to read the particular document. This is in addition to other security provided by the ACL and encryption.
∑ Other reserved item names are documented in the C API header file STDNAMES.H.
E-mail is simply Notes documents which include particular reserved items and a flag which indicates to mail this document immediately. The reserved item names include:
∑ SendTo -- The name(s) of the primary recipients of this mail.
∑ CopyTo -- The name(s) of the secondary recipients of this mail.
∑ BlindCopyTo -- The name(s) of people who are to be sent blind carbon copies of the mail.
∑ DeliveryPriority -- Contains High, Medium or Low as appropriate.
To send a mail, the document is transferred by Notes into a special database named MAIL.BOX on the server. The mail router agent picks up the document, determines routing, and forwards the mail either to another server or to the recipientís mail database as defined in the user information.
Form definition notes define the layout on a form on screen, including fields (through which the document items are displayed), formulas for field default values, input translation and validation, and form security. Form items include:
∑ $Body -- Most of the form definition in a rich text item.
∑ $Title -- The title of the form.
∑ $WindowTitle -- A formula which defines the title of the form. The title appears in the Notes user interface when the user is creating or reading documents with the form.
∑ $FormUsers -- The list of people who are authorized to use the form. This is in addition to other security provided by the ACL and encryption.
View definition notes define the selection of documents, and columns specifying calculation or field-retrieval formulas and sorting. Specific items in the view definition are:
∑ $Title -- The title of the view.
∑ $Formula -- The formula which defines selection of documents to appear in this view.
∑ $ViewFormat -- A detailed structure defining the structure of the view, its display options, and for each column: the columnís name, the formula which computes values to display in the column, and sorting/categorization flags.
∑ $Collection -- A pointer to the object which holds the index.
The view definition note doesnít contain an index of documents in the database; the index is held as an object in the database, and maintained by the NIF subsystem. Thus, when view definitions replicate to other servers, the index collection is not replicated but rebuilt locally. This reduces connection cost and allows for selective replication and planned differences between replicas of one database.
The Design note is an internal index of important information about all the forms, views, and other design elements of the database. This provides quick access, for example, such as when forms must be loaded to display documents.
Other design elements are classes of note, including:
∑ Selective Replication Definition -- Contains a formula used in replicating between pairs of servers and/or users, so that, for example, a mobile user need only replicate relevant documents to the laptop.
∑ Shared Field -- Holds the definition of a field which can be inherited by more than one form.
∑ Icon -- Contains the databaseís icon.
∑ Help-About -- Also called the Policy Document, a special document to give users information on the databaseís purpose.
∑ Help-Using -- A special document to give users information on the databaseís usage.
∑ ACL -- Holding the access control information for the database.
Many of these note classes are single-instance; they occur once per database. This is marked with a special flag so they can be identified quickly, and so that the replicator can behave accordingly, not storing more than one copy per database.
Item data structures
The Summary items (text, text-list, number, number-range, timedate, timedate-range) are stored in very compact, efficient structures; this enables their contents to be indexed and computed very quickly. This also provide enormous flexibility in multi-value field storage.
The Compound Document (CD) data structure is designed for maximum flexibility and extensibility to incorporate the variety of rich objects which must be included in Notes rich-text. It also includes nesting of paired structures. For example, "hotspot begin" and "hotspot end", between which may be "graphic begin"..."end" and so on. The CD structureís comprehensive rich-text support means that duplicate information may be stored if required. For example, a single graphic image can be stored in bitmap, metafile and PICT formats, for quick rendering on any Notes workstation on any operating system.
The extensibility of the Compound Document structure makes it possible to incorporate support for future objects in Notes documents. For example, a major part of the Notes 4.0 development is to provide excellent OLE 2 capability. Figure 5 shows the internal structure of these data types.
Figure 5 -- Internal structure of various Notes data types.
The common data types include:
∑ TYPE_TEXT -- Plain text, stored in LMBCS (Lotus multi-byte character set, a variable-length-character encoding. In its European and North-American form, this corresponds to Codepage 850 with the exception of codes less than x20.
∑ TYPE_TEXT_LIST -- A multi-value item where each value is plain LMBCS text. In the Notes user interface, no great distinction is made between single-value and multi-value items. A single-element text list is transparently converted to a plain text item. In a Notes API program, you must treat these as two distinct data types. This also applies to multi-value number and timedate items.
∑ TYPE_NUMBER -- A 64-bit IEEE floating point number.
∑ TYPE_NUMBER_RANGE -- A compound multi-value structure: a list of numbers, and/or a list of number pairs.
∑ TYPE_TIMEDATE -- A comprehensive time stamp structure, containing the number of days since Julian Day 0 (January 1, 4713 BC); or the constant ANYDAY, the number of ticks (hundredths of a second) since midnight; or the constant ALLDAY, the timezone (with quarter-hour granularity), and the daylight-savings flag.
∑ TYPE_TIMEDATE_RANGE -- A list of TIMEDATEs, and/or a list of TIMEDATE pairs.
∑ TYPE_COMPOSITE -- The rich-text (compound document) field structure.
These highly flexible and efficient data types give enormous potential to the application developer.
The above description shows how Notes makes a distinction between summary (computable, indexable) information and non-summary (compound, relatively opaque) information. These two types of information have different access requirements. Access to summary information about a note must be highly optimized. Notes accomplishes this by physically separating summary items from a note into a summary buffer, which is stored for high-speed access, and objects, which contain non-summary information and may be retrieved at lower speed. Thus, rich-text fields are stored as objects (BLObs) and other fields are held in a summary buffer.
Summary data can be manipulated very efficiently, by the Compute engine and the NIF indexer, for example. The Notes formula language, executed through the Compute module, allows application designers to build intelligence into forms, for computation at the workstation; into views, for selection and collation; and into macros, for agent or rules functionality at the client or scheduled on the server. The formula language is highly optimized and tailored for building group-collaboration and workflow applications.
Notesí formula language will be supplemented with LotusScript in Notes 4.0. LotusScript is an object-oriented, BASIC-compatible language which is an important part of Lotusí programmability strategy for the entire suite of desktop and communication products. It interacts with each host product (Notes, for example) by having the hosts publish various classes of object, which can be created and manipulated with LotusScript code. In Notes, developers will be able to write LotusScript programs, executed on forms (in buttons and other form elements) or as server-based agents. These programs will be able to manipulate Notes documents and data items in a very high level.
The combination of the Compute engine for optimized formula evaluation, and the LotusScript compiler for procedural and object-oriented scripting, results in a highly efficient and powerful development environment. The developerís code is stored in the same Notes databases which it defines and manipulates.
Indexing and searching
The application designer can create multiple views of the data in a Notes database. Users have the ability to create private views where they want to see a particular selection or presentation of the documents.
Views are defined by the selection of documents which match a formula; by the hierarchy of parent and response documents, if this is required; and by columns which structure summary information. A column displays the contents of a field, or the result of a formula (which may involve one or more fields). Columns can be categorized in multiple levels, to group related documents together; they can be sorted, by date or alphabetically or by the results of any formula; and they can be totaled.
Views operate internally by the creation of a view definition note (which replicates with the database) and the maintenance of an index by the NIF subsystem. NIF uses B-tree index algorithms to hold coherent, browsable collections of summary buffers with the column calculation results, collated according to specifications in the view definition note.
If a column is sorted, it must be collated in the index. This is done in three ways.
∑ KEY sorting is collation by the value of a column, based on a summary item's value.
∑ TUMBLER is collation of list item data types. Each value of the list corresponds to a level in a hierarchical (outline) index. As collation is performed, only the Nth list value is collated if collating the Nth level of the hierarchical index. This causes the new index entry to be placed as many levels deep as there are list values. For example, a number list value of 1:2:3 places the index entry three levels down in the hierarchy. If the new index entry requires a subtree which does not yet exist, a ghost entry is created to act as a parent for the new entry at intermediate levels. The result is a hierarchical outline where index entries are created at a variety of different levels in the index depending on the number of values in their list data type.
∑ CATEGORY is collation of any data types (list or non-list) to create a hierarchical index. For each CATEGORY in the collating spec, ghost entries are created for each unique value of the specified item (only as many ghost entries as there are unique values), and all duplicate values are placed at the next lower level.
Unlike TUMBLER, CATEGORY collates list data types exactly the same way as KEY: Each list value is compared; if equal, the next one is compared to break ties; and so on until the list is exhausted.
As an example, if a collating spec consists of CATEGORY "Folder", CATEGORY ""Author", and KEY "Date", the top level of the index only contains as many ghost entries as there are unique Folder names. Below it, the next level contains all unique Author names within the folder, and below each Author, the next level contains all the index entries for each Author sorted by Date. The result is a three-level hierarchical outline where all index entries are always at the Nth level, and all intermediate category levels always contain only ghost entries.
The Topic Full Text Retrieval engine, licensed from Verity Inc., creates associative indexes of the full text of Notes documents. Full text search lets users search for words, phrases, numbers, and dates, as well as perform queries using wildcards, logical and proximity operators, and other advanced features. Search results are ranked for relevance and displayed in the view.
Full text search also allows Query By Form, entering logical queries into specific fields on a form.
Sequential search and replace is also available within a data-base, a view or a single document. This may be performed on the server or the workstation as efficiency dictates.
These flexible methods provide users with many ways to manage and arrange the contents of multiple databases. Their generic nature allows powerful applications to be built utilizing their various features.
Structure of the Notes database
A Notes database is the container for notes: documents, form designs, view definitions, and all the other classes of note previously described (figure 6). The flexible structures and features of the notes are tailored to the need to transport and identify them: unique identifiers, last-modified timestamp, summary information and so on. The container database also reflects its function: the need to quickly retrieve appropriate information, the need for extreme reliability, and the need to support replication.
Figure 6 -- Structure of a Notes database.
The Notes database, a file that usually has the extension .NSF, begins with some header information and an allocation map. Important parts of the header include:
∑ Database ID -- Uniquely identifies the file.
∑ Replica ID -- Uniquely identifies all replica copies of this file on all servers.
∑ Creation timestamp.
∑ Last Modification timestamp -- The last time that any of the databaseís contents were modified.
∑ Category -- For easy identification.
∑ Design Class -- Indicates that the design elements should be inherited from a central template database and maintained in synchronization with that template.
The replicator task and others use this information to efficiently decide whether a database should be replicated, and which database replicas exist on any particular server. The database contents can be scanned extremely quickly for the notes which have been added or modified since a certain time. This again is optimized for the replicator.
The rest of the database contains notes and other objects; summary buffers stored for rapid access, and non-summary items in another area of the file. There is another internal abstraction: Notes themselves are a specialized class of object. Other object classes include file attachments, OLE packages, and packed lists of documents (unread lists, for example). Every object is identified by a Record relocation vector (RRV) which is a file-position pointer; a NOTEID is simply the noteís RRV.
Some objects, such as the collections which constitute an index of the database, are never replicated, since they must be built appropriate to each local copy of the database.
Certain Notes databases are used by the system for specific tasks. These are no different from other databases, but the system configuration indicates that they have special uses. The major system databases are:
∑ Name & Address Book -- Usually called NAMES.NSF, this database holds the user directory, which is modeled on X.500 specifications. It contains various classes of document, named by their forms:
∑ Person -- Information about a user, such as name, e-mail data-base or forwarding address, public key, and so on.
∑ Group -- Named list of people, used for access control lists and for electronic mail.
∑ Connection -- Information about the ports and connection details (phone number or other network address) for other Notes servers and remote systems; also, scheduling specifications to connect between particular servers.
∑ Mail Router Mailbox -- Usually called MAIL.BOX, this database is the forwarding store for Notes mail on a server. The ROUTER task (the mail transfer agent) is responsible for polling this database and forwarding the documents to other serversí mailboxes or to usersí mail files. In version 4 of Notes, the router task will consist of various MTAs for native X.400 and SMTP/MIME connectivity, in addition to the existing Notes and cc:Mail transfer protocols. Gateway add-in tasks usually have their own mailbox, where mail is deposited by the Notes router according to definitions of the gateway in the Name & Address Book. The gateway task polls its mailbox and transfers mail accordingly.
∑ Database Catalog -- Usually called CATALOG.NSF, this database is a catalog of all databases on the server and is the policy document from each database. If replicated through a Notes domain, it becomes the companyís catalog of Notes databases. It is maintained by a server task called CATALOG, which usually runs once per day.
∑ Notes Log --Usually called LOG.NSF, this database contains the activity log of the Notes server. Server tasks report their activity to the database, where they can be summarized and reported in views for the administrator. This is the most basic level of system administration log. Other administration utilities include admin-istrator-defined events and alerts, and system-wide network management tools from Lotus and other vendors.
Notesí inherent flexibility and extensibility ensures that the core databases can accommodate all present and future needs (as mandated in the X.500 specification, for example).
Relationships between documents
Lotus Notes recognizes that an object container only becomes useful when its contents can be indexed and related. The concept of summary fields, used to collate documents in views, is one part of the solution. The other part is to create structured methods of relating documents to one another.
One key relationship is between members of a hierarchy. The parent, response, and response-to-response hierarchy in Notes is used to relate topics in a discussion, to relate documents in a correspondence database, and so on for all the major types of workgroup application.
Other Notes applications can define their own structures for document relationships. To avoid duplication of storage, forms can perform lookups into other views to retrieve a columnís contents, and documents to retrieve individual values from key documents, using the @DbLookup and @DbColumn commands.
Version control is included in Notes through the ability to store prior versions, or the latest version, of a document as a response to the current, or previous, version.
Doclinks are hypertext links between documents and (recently, with the introduction of SmarText 3.0) from outside Notes altogether. They give users some ad-hoc ability to knit related documents together where there is no formal relationship.
If the structure of an application mandates referential integrity or normalized data structures, the design of the application is crucial. This might be the case if the application is to be your single database of all customer contact, or if the database must use key values to lookup into a relational database system. The flexible, diverse structure of Notes databases need not make your data any less structured that the data in a relational system -- everything is a consequence of the design. Notes is as suitable for those applications requiring standardized data structures, such as inter-enterprise workflow and EDI, as it is for those needing little structure.
Relationships outside the database
External linkages are essential to integration with transactional or legacy databases and other data sources. There are a number of these methods included in Notes, and open interfaces for third-party providers.
The highest level of integration is found at the platform-specific compound document format. Notes is the perfect container for OLE embedded objects, allowing live application objects to be embedded seamlessly in documents for sharing with other users.
In addition to its inter-application data exchange facilities, a second level of data integration is achieved through use of Notesí rich set of import/export filters. By using these filters, users can copy information from application files directly into Notes databases or documents, retaining much of the original application's formatting information, or from Notes documents to application formats, again retaining the original look.
From Notes forms (field formulas, buttons and other elements), smarticons and macros, the @DbLookup and @DbColumn commands allow lookup into Notes databases. They also support the DataLens driver technology, which allows retrieval from external databases including Oracle, Sybase and Microsoft SQL Server, Informix, IBM Database Manager, Paradox, and dBASE. The @DbCommand function allows database queries in SQL and proprietary command languages, and it allows access to stored queries on the database server. DataLens also supports ODBC. These functions can be run from the workstation or, in background macros, on the server for agent-type integration.
The NotesSQL driver provides an ODBC interface to Lotus Notes data. Thus a wide range of third-party applications have access to Notes databases without special APIs.
Using the Notes API, many developers have created specific integration tools for their customers. Powerful tools available include Trinzicís InfoPump, a scripted database integration utility which can migrate and synchronize data from a wide variety of sources, on an event-driven or scheduled basis.
Front-end development tools which can access data from Notes and integrate with other data sources include Lotus Notes ViP, Powersoft PowerBuilder, Gupta SQLWindows, Revelation OpenInsight, and many others. Lotus Development Corp. also sells a range of companion products, including the image-capable Lotus Notes:Document Imaging product, incoming and outgoing FAX gateways, PhoneNotes for developing interactive voice response applications integrated with the Notes environ-ment, and an optical character recognition (OCR) server for scanning faxes or images into Notes documents.
Using these open tool sets, the application developer and the systems integrator can relate Notes information with other systems for integrity across the enterprise.
The combination of form and view design tools means that Notes applications are developed in much less time than with traditional 3GL and 4GL tools. Design is flexible even after the application has been placed into production because changes roll out through replication, at the same time as the data documents replicate. This unique feature, inherent in the design of Notes, means that applications can be refined continually to conform to user requirements, with no prototype-test-deploy-evaluate cycle; rather, continuous proto-cycling iterations can be used until everyone is happy.
The database templates that ship with Notes can be used to create identical applications or customizable applications. Design inheritance from standard templates and shared field definitions within a database provide data dictionary capabili-ties, enabling leverage and reuse of design elements, maintainability of database design over time, and the ability to set corporate design standards.
Figure 7 -- Overall Notes architecture.
Lotus Notes and the Lotus Communications Server
Notesí robust and scaleable server environment is also the basis of Lotusí next generation of messaging servers. cc:Mail users will have the option of keeping the shared-file PostOffice environment or of migrating to a client-server system if this is more suited to their long-term needs. The integrated platform is called the Lotus Communications Server. It includes cc:Mail integration (with all the advantages of the cc:Mail system), multiple standards support in APIs, and a single platform for external gateways. It is also the fourth generation of the Lotus Notes server. This gives a uniquely powerful environment for communication.
Lotus Notes is a strategic system. Businesses deploy the Notes product for reasons which dramatically affect their performance. With any strategic software purchase, it is important for IT planners to understand the foundations of that software. The architecture of Lotus Notes is of long-term importance to the way people collaborate and do business. With this information you can make truly informed decisions about the enterprise strategies of Lotus Development Corp. and other software vendors.