*** Description of the project *** The project is the construction of a BNF like parser which would read a set of files from a list of specified directories (*1) into a set of structures (at program start-up) that all remain in memory (RAM) thru all the program execution for faster execution. The program is a mixed perl/(C or C++) program (*2). It is called as a perl program that in turn calls a C/C++ function each time it receives a query-string from a user on the Internet. The program uses 4 types of global structures described below. Thoses structures are global in the sense that they are accessible both from the C/C++ part and the perl part of the program. Structures of the program: There are 5 structure types in the program (discribed below) structure_GLOBAL structure_SPEC_STRING structure_CACHE_STRING structure_TIME_STRING structure_MEM_STRING The structure_SPEC_STRING (http://cppcode.angelfire.com/diagram/structure_SPEC_STRING.zip) type of structures hold the specification strings. Each file read from a set of files from a set of specified directories (*1) is load into 1 node of the structure in alphabetical order i.e. each line of those file contains a specification string. For example after loading 3 files (method.txt, function.txt, procedure.txt) from the cpp directory the content of the structure is shown in http://cppcode.angelfire.com/diagram/diagram_00.zip. All 3 nodes (whose respective 'fn' Fields are labeled 'method', 'function', 'procedure') are linked together. All 3 nodes 'lang' field is 'cpp'. The 'fn' field hold the name of the file. Each file name goes to the 'fn' field. There can be N lines in each file therefore there can be N specification strings. There can be N files read from a set of directories. The grammar of those string is described below (see specification string grammar). Once files loaded the structure does not change. It can be any kind of structure (b-tree, avl-list, ect ...). The 'lang' label holds directory (*9) name minus '/' at the end (some Linux/Unix always append '/' to directory (*9) names). The structure_SPEC_STRING is made up of a set of node_SPEC_STRING (http://cppcode.angelfire.com/diagram/node_SPEC_STRING.bmp). It can be any kind of structures (b-tree, avl-list, ect ...) Each node in node_SPEC_STRING structure is made up of: 2 fields: 'fn' : the type of specification string. Ex: (function, method, procedure) It corresponds to the file name read. 'lang' : language of the specification string. It corresponds to the directory name read. 1 pointer: 'ptr' : pointer to a another node (node_SPEC_STRING) a set of specification strings (spec_str_0, spec_str_1, spec_str_2, ... spec_str_N). It can be any kind of structures (b-tree, avl-list, ect ...) The structure_CACHE_STRING (http://cppcode.angelfire.com/diagram/structure_CACHE_STRING.zip (see *4)) type of structures is used to cache strings returned from the call to the 'make_spec' function (C/C++). This structure DOES change during the program execution. When the program begins those structures are empty. Those structures DO change if the specification string (str) is not found by the 'lookup_structure_CACHE_STRING' function. In this case the output string generated by the 'make_spec' function is added to the structure. It can be any kind of structures (b-tree, avl-list, ect ...). For illustration purpose I'v chosen the b-tree in http://cppcode.angelfire.com/diagram/structure_CACHE_STRING.zip. The structure has a field 'fn' whose label corresponds to a file name read. (*1). The structure has a field 'lang' whose label corresponds to a directory (*9) name read (*1). The program searches in the node node_CACHE_STRING_2 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_2.bmp) in which the 'fn' field corresponds to the file name and the 'lang' field corresponds to the directory (*9) name. For example after reading 3 files (method.txt, function.txt, procedure.txt) from directory_00 at program start-up and being called (from a user on the internet) for all the method, function and procedure listed in directory_00.txt the program contains 3 of those nodes whose respective root node 'fn' field contain the strings 'method', 'function', 'procedure' respectively. A more detailed description of structure_CACHE_STRING is given below. This can be any type of structure (b-tree, avl-list, ect ...). The structure_CACHE_STRING is made up of N node of type node_CACHE_STRING_2 (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_2.bmp). There can be N nodes of this type in the structure_CACHE_STRING structure. This type of node is the top node in the structure_CACHE_STRING diagram (see http://cppcode.angelfire.com/diagram/structure_CACHE_STRING.zip). This node is made up of: 2 fields: 'fn' : file name corresponding to the file read at the program start-up into structure_SPEC_STRING 'lang' : directory (*9) corresponding to the directory (*9) name from which the file was read during the program start-up. Each programming language is in a different directory (*9) A pointer 'ptr' which is a pointer to a node of type node_CACHE_STRING_1 (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_1.bmp). The node_CACHE_STRING_0 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_0.bmp) type of node is made up of: 2 fields: 'tm' : the creation time of the cached string (time since EPOCH). (tm in the http://cppcode.angelfire.com/diagram/structure_CACHE_STRING.zip) 'str_ch' : the cached string. One cached string which can be any length. A more detailed description of this cached string is given just below (*5) The node_CACHE_STRING_1 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_1.bmp) type of node is made up of: 2 fields: 'str' : item string corresponding to the string to be searched in insert_structure_CACHE_STRING function. 'sz' : the total size (in bytes) of all cached strings (below this node) added together. NOTE: The 'L' and 'R' fields are pointers to the left and right node (of the b-tree) I chose in structure_CACHE_STRING structure for illustration prupose only. They don't need to be present if you choose another type of structure. N extents pointers to nodes of type node_CACHE_STRING_3 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_3.bmp). For each extent pointer there is an 'ext' field holding the number of the extent. These can be a linked-lists or a b-trees which is label 0 to N. For example in the http://cppcode.angelfire.com/diagram/structure_CACHE_STRING.zip diagram there are 3 nodes. (left, middle, right) The node_CACHE_STRING_2 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_2.bmp) type of node is made up of: 2 fields: 'fn' : corresponds to the file name 'lang' : field corresponds to the directory A pointer: 'ptr' : which is a pointer to another node of type node_CACHE_STRING_2 The node_CACHE_STRING_3 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_3.bmp) type of node is made up of: 1 field: 'seq_num' : sequence number of the cached string 1 pointer to the to nodes of type node_CACHE_STRING_0 (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_0.bmp). These can be a linked-lists or a b-trees which is label 0 to N. The structure of type structure_GLOBAL (http://cppcode.angelfire.com/diagram/structure_GLOBAL.zip) hold the globals of the program. This structure hold the globals used in the specification strings. The 'name' label is the name of the global minus the '.txt' extention prepended by the '%' character. Ex. NUM.txt is %NUM global, INT.txt is %INT global. It corresponds to the file read. The 'lang' label holds directory (*9) name minus '/' at the end (some Linux/Unix always append '/' to directory (*9) names). Those files are read from a set a directories and therefore the same filename may be dupplicated without collision. Then there may be N number of items (item-strings) of type node_GLOBAL_STRING (http://cppcode.angelfire.com/diagram/node_GLOBAL_STRING.bmp) It can be any kind of structure (b-tree, avl-list, ect ...). NOTE: The 'L' and 'R' fields are pointers to the left and right node (of the b-tree) I chose in structure_GLOBAL structure for illustration prupose only. They don't need to be present if you choose another type of structure. Each node in structure_GLOBAL structure is made up of: 2 fields: 'name' : name of the global. It corresponds to the file name read. 'lang' : language of the global. It corresponds to the directory name read. 1 pointer: 'ptr' : pointer to a set of nodes (node_GLOBAL_STRING) that contain the item-strings of the global It can be any kind of structures (b-tree, avl-list, ect ...) node_GLOBAL_STRING hold the nodes of the structure_GLOBAL structure. Each node has: 1 field: 'item' : item name in string format. 1 pointer: 'ptr' : pointer to the next node_GLOBAL_STRING The structure of type structure_TIME_STRING. (http://cppcode.angelfire.com/diagram/structure_TIME_STRING.bmp) hold the creation time of the cached string. This can be any type of structure (b-tree, avl-list, ect ...). Each node in structure_TIME_STRING structure is made up of: 1 field: 'tm' : the creation time of the cached string: number of seconds since EPOCH. (int) 1 pointer: 'ptr' : pointer to the corresponding node in structure_CACHE_STRING. A diagram of the node is shown in http://cppcode.angelfire.com/diagram/node_TIME_STRING.bmp A diagram of the structure is shown in http://cppcode.angelfire.com/diagram/structure_TIME_STRING.bmp NOTE: The 'L' and 'R' fields are pointers to the left and right node (of the b-tree) I chose in structure_GLOBAL structure for illustration prupose only. They don't need to be present if you choose another type of structure. The structure of type structure_MEM_STRING. (http://cppcode.angelfire.com/diagram/structure_MEM_STRING.zip) This structure hold the memory used in bytes for the cached strings decendant of node_CACHE_STRING_1 (http://cppcode.angelfire.com/diagram/node_CACHE_STRING_1.bmp) type of nodes. This is total size of all cached strings added together. This can be any type of structure (b-tree, avl-list, ect ...). Each node in structure_MEM_STRING structure is made up of : 1 field: 'sz' : total size in bytes of all cached strings added together that are decendant of node_CACHE_STRING_1 type of nodes 1 pointer: 'ptr' : pointer to the corresponding node in structure_CACHE_STRING. A diagram of the node is shown in http://cppcode.angelfire.com/diagram/node_MEM_STRING.zip A diagram of the structure is shown in http://cppcode.angelfire.com/diagram/structure_MEM_STRING.zip NOTE: The 'L' and 'R' fields are pointers to the left and right node (of the b-tree) I chose in structure_MEM_STRING structure for illustration prupose only. They don't need to be present if you choose another type of structure. Program execution: when the program starts-up it reads files which contain the globals of the program. Those files are read into the structure structure_GLOBAL. Those files are read from a set of specified directories contained in a perl command line arguement $ARGV[1]. It then reads a set of files from a set of specified directories in a perl command line arguement $ARGV[0] into a structure of type structure_SPEC_STRING. Each file corresponds to one node labeled 'fn' in the structure. The 'lang' label holds directory (*9) name minus '/' at the end (some Linux/Unix always append '/' to directory (*9) names). When the program start-up is finished it then wait for query-strings coming from the Internet. It must be possible to dynamicly load (in the middle of the program execution) new specification string into structure_SPEC_STRING and new globals into structure_GLOBAL files from specified directories which would be in a query-string. Receiving the query-string (from the Internet) the program would load (add) the files into thier respective structures. It must be possible to debug the program during its execution. In debug mode offer a menu to print 1 of the 5 structure (structure_GLOBAL, structure_SPEC_STRING, structure_CACHE_STRING, structure_TIME_STRING and structure_MEM_STRING) thru a HTML web-page. Query-strings: There are 8 query-strings: 'ext' : extent 'fn' : file name 'lang' : programing language name 'mult' : use to tell make_spec function if it should use the Numbering application. (see *7) 'nbr_seq' : sequence number 'nbr_str' : number of strings to generate. 'nom' : nomenclature (name of the function, method, statement, ...) 'num_sch' : numbering scheme to apply (*7 (numbering scheme)) (legal values 0 - 4) For each of those strings to generate the perl (side of the program) look up 'nom' (corresponding to 'str' field in structure_CACHE_STRING), 'fn', 'lang', 'nbr_seq' and 'ext' in structure_CACHE_STRING type of structures to see if it is already cached. If it is found the corresponding string in structure_CACHE_STRING type of structure is simply returned, if it is not, the specification strings structure (structure_SPEC_STRING) is looked-up using 'nom' (corresponding to 'spec_str_0', 'spec_str_1', 'spec_str_2', ... specification string up to the first space/tab (or end of string)) , 'lang' (which corresponds to 'lang' field) See lookup_structure_SPEC_STRING (*8). The returned string (from a call to lookup_structure_SPEC_STRING) is used as parameter to 'make_spec' function, inserted into structure_CACHE_STRING structure (for latter use) and returned to the user (which sits in front of his/her browser on the internet). If ever the function (make_spec) thorws an 'OUT_OF_BOUND' exception this means that (it was call with 'nbr_seq' that was out of bound) we simply stop appending the result strings and send the result to the user. sample-code: [code] while receiving requests from Internet initialize number_of_string_responded to 0 if number_of_string_to_serve > max_sv_str then number_of_string_to_serve = max_sv_str end if while number_of_string_responded < number_of_string_to_serve loopup (fn, lang, item_str, seq_num, ext) in structure_CACHE_STRING (*3) and give it to str0 if found then append str0 to gen_page_buffer else lookup 'fn', 'lang' and 'item_str' in structure_SPEC_STRING and give it to str1 (*8) translate nbr_seq from (alphabetic or alpha-numeric) numbering scheme (base 26 or 36) depending on the value of query-string 'num_sch' to decimal numbering scheme (base 10) call make_spec(str1, nbr_seq, ext) and give it to str0 (*7) insert the generated string (str0) in structure_CACHE_STRING (for latter use) (*6) append str0 to gen_page_buffer end if add 1 to number_of_string_responded end while append a 'next' and 'prev' link for the next and previous pages (respectively) end while [/code] NOte: for the next and previous links the program would generate the a) next sequence number for the specification string (if not at the last case of the specification string) b) prev sequence number for the specification string (if not at the first case of the specification string i.e. nbr_seq = 0) For the prev case we want to generate the specification string starting at the top of the previous page therefore we would have to subtract from nbr_seq number_of_string_to_serve or less (if there are less available). For example if the program started to generate the current page nbr_seq = 10 and generated 20 elements (concrete strings) then the last element number (converted to base 10) would be equivalent to 29 and the next page would start at 30 but in the next page the link to the previous page (the next page previous page) would start at 10 i.e. the same as the current page. The previous page (of the current page) start element would have to be substracted 20 from the current page which would be -10 (10 - 20). i.e. the top (nbr_seq - number_of_string_to_serve) This is not possible since negative number in nbr_seq are not allowed so in this case it would simply start at 0. Same of the next page (of the current page) if it is not possible to generate number_of_string_to_serve (because there is not that much left according to the specification string) we simple generate the concrete string that are left. variable correspondance: descr : boolean variable to decide which to choose between creation time or size of cache strings is case non enought memory (PARSER_DESCR environment variable) ext : extent (only relevant in a specification string with ellipses) fn : specification string gen_page_buffer : generated page buffer item_str : item string lang : the programming language directory (*9). Each directory (*9) is specific to a programming language max_mem : maximum available memory max_sv_str : maximum value allowed for number_of_string_to_serve (to prevent DOS attacks) number_of_string_responded : (loop variable) number_of_string_to_serve : nbr_str seq_num : sequence number str0 : the returned string from looking-up (fn, lang, seq_num, ext) in structure_CACHE_STRING (*4) str1 : the returned string from looking-up fn in structure_SPEC_STRING structure_CACHE_STRING : cache string structure structure_GLOBAL : hold the globals of the program of all languages structure_MEM_STRING : size (total number of bytes) of each node structure_CACHE_STRING structure_SPEC_STRING : specification strings structure structure_TIME_STRING : creation time of each cached string of structure_CACHE_STRING used_mem : total size of all cached strings added together Exceptions: CACHED_STRING_NOT_FOUND : lookup_structure_CACHE_STRING function throws this exception if there is no cached string found. ELEMENT_NON_EXISTANT : When reading a specification string in structure_SPEC_STRING and an excluded element that isn't part of the global from which to exclude it in structure_GLOBAL. GLOBAL_NON_EXISTANT : When reading a specification string in structure_SPEC_STRING that isn't part of structure_GLOBAL. GLOBAL_NON_SUBSET : trying to excluded a non-proper subset from a set of elements in structure_GLOBAL. OUT_OF_BOUND : make spec throws this exception if 'nbr_seq' parameter specifies a sequence number that couldn't be generated by the function for the specification string and extent. SPEC_STRING_NOT_FOUND : lookup_structure_SPEC_STRING function throws this exception if there is no specification string found UNBALANCED_ROUND_BACKET : number of '{' not matching the number of '}' for a specification string UNBALANCED_SQUARE_BACKET : number of '[' not matching the number of ']' for a specification string Environment Variables: PARSER_DESCR : boolean variable to decide which to choose between creation time or size of cache strings is case non enought memory MAX_MEM : don't create new node in structure_CACHE_STRING when the number of byte in the program is above this threadhold Specification string grammar: There are 9 special characters in this syntax: '|' '%' '!' '\' '{' '}' '[' ']' '.' . They've got to be preceded by '\' if they have to be represented literally. For example to output a literal '\' you're got to write '\\'. Tokens are seperated by one of more spaces/tabs. Lines are seperated by one of more newlines/formfeeds. Newlines any be (unix/windows/mac). 1) alternative elements are specified by the '|' char and have to be grouped with the '{' and '}' chars Ex: foo or bar or baz => {foo | bar | baz} (see Ex. 0010) 2) grouped elements are preceded and followed by the '{' and '}' char those type of elements HAVE to be grouped - those part of an alternative (Ex. 0010) - those part of an exclusion (Ex. 0014) - those part of an group repeted (Ex. 0011) Extra '{' or '}' don't cause error/exception even if they are not significant (don't change the sementics). For example "{{foo | bar | baz}}" or "{{{foo | bar | baz}}}" is the same as "{foo | bar | baz}". But the '{' and '}' have got to be balanced. "{{foo | bar | baz}", "{foo | bar | baz}}", "{{{foo | bar | baz}", "{foo | bar | baz}}}", ect... raises UNBALANCED_ROUND_BACKET error. 3) optional elements are preceded and followed by the '[' and ']' char Ex: foo or foo bar => foo [bar] (see Ex. 0012) Extra '[' or ']' don't cause error/exception even if they are not significant (don't change the sementics). For example "foo [[baz]]" or "foo [[[baz]]]" is the same as "foo [baz]". But the '[' and ']' have got to be balanced. "[[foo | bar | baz]", "[foo | bar | baz]]", "[[[foo | bar | baz]", "[foo | bar | baz]]]", ect... raises UNBALANCED_SQUARE_BACKET error Note: The ellipses have to be included inside the option ([]) (Ex. [foo ...]) i.e. it cannot be [foo] ... this would preclude an empty element ("") being extended N times 4) global elements are preceded by the '%' char Ex: foo %NUM => foo integer_0 foo real_0 foo double_0 * when %NUM global hold the elements (integer, real, double) (see Ex. 0013) 5) exclusion elements are preceded by the '!' char and have to be grouped with the '{' and '}' chars Ex: foo {%NUM ! real} => foo integer_0 foo double_0 (see Ex. 0014) 6) escape elements are preceded by the '\' char Ex: foo \%NUM => foo %NUM (say i want to include '%' char as part of the string and therefore not to be interpreted) Note: the escape character '\' has to precede a character that can meaningfully escaped ('|' '%' '!' '\' '{' '}' '[' ']' '.'). for example one cannot write "\foo" but can write "\\", "\%", "\!", "\|", "\{", "\}", "\[", "\]", "\..." (see single concrete string group of examples) the nbr_seq parameter is the sequence number (0 based). Lets say we have str = "{foo | bar | baz}" if nbr_seq equals 0 then the element would be foo if nbr_seq equals 1 then the element would be bar if nbr_seq equals 2 then the element would be baz the ext parameter is the extent. This tell us to which extent to produce element string with ellipses. (...). For the purpose of make it clear I'll show 9 examples with specific values of the function's input parameters str, ext and nbr_seq. I'll show 3 cases par global element type. ("foo ...", "baz foo ...", "foo %NUM ...") For illustrating purpose only I'v expanded the global element array to all its cases in the bellow examples. The value returned by the function is the string correcponding to the numerical position corresponding to the letter for example if nbr_seq = 1 we return 'b', if nbr_seq = 25 we return 'z' Note: Examples (1,2,3) are with simple element (alone) with ellipses. Examples (4,5,6) are with simple element ellipses. Examples (7,8,9) are with global element ellipses. see http://cppcode.angelfire.com/input_files/input_cases.txt for a list of input cases. Group of examples: The examples are grouped to illustrate different aspect of the syntax. Each group of example are links to examples of the illustrated concept. Note: applying numbering, non-applying numbering and exception are mutually exclusive - alternative : http://cppcode.angelfire.com/ge/ge_00.txt Those examples illustrate the use of the '|' operator - ellipse : http://cppcode.angelfire.com/ge/ge_01.txt Those examples illustrate the use of the '.' operator - escaped : http://cppcode.angelfire.com/ge/ge_02.txt Those examples illustrate the use of the '\' operator - exclusion : http://cppcode.angelfire.com/ge/ge_03.txt Those examples illustrate the use of the '!' operator - global : http://cppcode.angelfire.com/ge/ge_04.txt Those examples illustrate the use of the '%' operator - group-related : http://cppcode.angelfire.com/ge/ge_05.txt Those examples illustrate the use of the '{}' operator - optional : http://cppcode.angelfire.com/ge/ge_06.txt Those examples illustrate the use of the '[]' operator - unnumbered concrete string (non-applying numbering) : http://cppcode.angelfire.com/ge/ge_07.txt Those examples only generate one concrete string. So we don't number them. - numbered concrete string (applying numbering) : http://cppcode.angelfire.com/ge/ge_08.txt Those examples only generate more than one concrete string. So we number them. - exception : http://cppcode.angelfire.com/ge/ge_09.txt Those examples generate exceptions. It describe by the exception is raised. - extra notes : http://cppcode.angelfire.com/ge/ge_10.txt Those examples need extra explainations (to make them clearer) - simple : http://cppcode.angelfire.com/ge/ge_11.txt Specification string on which the positioning count from the first element. (All other cases start from the second) - more than 1 '{' '}' : http://cppcode.angelfire.com/ge/ge_12.txt - more than 1 '.' : http://cppcode.angelfire.com/ge/ge_13.txt ==================== (*1) Command line arguements: - ARGV[0] specifies a list of directories seperated by ';' char from to which read the specification string files. - ARGV[1] specifies a list of directories seperated by ';' char from to which read the global files (*2) this is done using the perl Inline module http://search.cpan.org/CPAN/authors/id/S/SI/SISYPHUS/Inline-0.45.tar.gz this module allows you to put source code from other programming languages directly "inline" in a Perl script or module. The code is automatically compiled as needed, and then loaded for immediate access from Perl. doc/readme : http://search.cpan.org/~sisyphus/Inline-0.45/ (*3) looking-up for the (fn, lang, item_str, seq_num, ext) in the structure_CACHE_STRING is a function that takes 6 parameters. prototype : lookup_structure_CACHE_STRING(structure_CACHE_STRING, fn, lang, item_str, seq_num, ext) parameters: 'ext' : extent (1 to N) 'fn' : file name corresponding to the file read at the program start-up into structure_SPEC_STRING 'item_str' : string to be searched 'lang' : directory (*9) corresponding to the directory (*9) name from which the file was read during the program start-up. 'seq_num' : sequence number of the cached string (0 to N) 'structure_CACHE_STRING' : the structure into which we are inserting, The function throws a CACHED_STRING_NOT_FOUND exception if there is no cached string found ('str_ch' field in the structure_CACHE_STRING) found for item_str. sample-code: [code] while 'fn' field in the structure not NULL if 'fn' field in the structure equals fn parameter and 'lang' field in the structure equals lang parameter if field in the structure 'str' equals item_str parameter then while 'ext' field in the structure not NULL if 'ext' field in the structure equals ext parameter while 'nbr_seq' field in the structure not NULL if 'nbr_seq' field in the structure equals seq_num parameter if in the node_CACHE_STRING_3 the 'ptr' pointer is not NULL (pointing to a node_CACHE_STRING_0 node) break else raise CACHED_STRING_NOT_FOUND exception end if else raise CACHED_STRING_NOT_FOUND exception end if go to the next 'nbr_seq' node end while else raise CACHED_STRING_NOT_FOUND exception end if go to the next 'extent' node end while else raise CACHED_STRING_NOT_FOUND exception end if else raise CACHED_STRING_NOT_FOUND exception end if go to the next 'fn' and 'lang' node end while [/code] (*5) The cached string is made up of succesive token elements numbered from 0 to N. Those elements are seperated by a space or another character and a space (see below (Ex 10-29)) space seperated: (Ex.0 : foo integer_0) (Ex.1 : foo integer_0 integer_1) (Ex.2 : foo integer_0 integer_1 integer_2) (Ex.3 : foo integer_0 integer_1 integer_2 integer_3) They can be mixed (Ex.4 : foo integer_0 real_1) (Ex.5 : foo integer_0 real_1 integer_2) (Ex.6 : foo integer_0 real_1 integer_2 integer_3) They can contain string and other types of elements (Ex.7 : foo integer_0 real_1) (Ex.8 : foo integer_0 real_1 string_2) (Ex.9 : foo integer_0 real_1 string_2 symbol_3) comma and space seperated: (Ex.10 : foo integer_0) (Ex.11 : foo integer_0, integer_1) (Ex.12 : foo integer_0, integer_1, integer_2) (Ex.13 : foo integer_0, integer_1, integer_2, integer_3) They can be mixed (Ex.14 : foo integer_0 real_1) (Ex.15 : foo integer_0 real_1, integer_2) (Ex.16 : foo integer_0 real_1, integer_2, integer_3) They can contain string and other types of elements (Ex.17 : foo integer_0 real_1) (Ex.18 : foo integer_0 real_1, string_2) (Ex.19 : foo integer_0 real_1, string_2, symbol_3) ampersand and space seperated: (Ex.20 : foo integer_0) (Ex.21 : foo integer_0& integer_1) (Ex.22 : foo integer_0& integer_1& integer_2) (Ex.23 : foo integer_0& integer_1& integer_2& integer_3) They can be mixed (Ex.24 : foo integer_0 real_1) (Ex.25 : foo integer_0 real_1& integer_2) (Ex.26 : foo integer_0 real_1& integer_2& integer_3) They can contain string and other types of elements (Ex.27 : foo integer_0 real_1) (Ex.28 : foo integer_0 real_1& string_2) (Ex.29 : foo integer_0 real_1& string_2& symbol_3) NOTE: The other character and the space are never appended if there are no more tokens to add to the string. i.e. they serve as seperators not as terminators. (*6) inserting (fn, lang, item_str, seq_num, ext, str_ch) in structure_CACHE_STRING is a function that takes 7 parameters. prototype : insert_structure_CACHE_STRING(structure_CACHE_STRING, fn, lang, item_str, seq_num, ext, str_ch) parameters: 'ext' : extent (1 to N) 'fn' : file name corresponding to the file read at the program start-up into structure_SPEC_STRING 'item_str' : string to be searched 'lang' : directory (*9) corresponding to the directory (*9) name from which the file was read during the program start-up. 'seq_num' : sequence number of the cached string (0 to N) 'str_ch' : the string to be cached. One cached string which can be any length. A more detailed description of this cached string is given just below (*5) 'structure_CACHE_STRING' : the structure into which we are inserting, sample-code: [code] if used_mem < max_mem then while 'fn' field in the structure not NULL if 'fn' field in the structure equals fn parameter and 'lang' field in the structure equals lang parameter if field in the structure 'str' equals item_str parameter then increment 'sz' field in the structure of the node by the size of the string to be cached (str_ch parameter) while 'ext' field in the structure not NULL if 'ext' field in the structure equals ext parameter while 'nbr_seq' field in the structure not NULL go to the next 'nbr_seq' node end while else a) create a new node of type node_CACHE_STRING_3 with 'nbr_seq' label equal to seq_num parameter b) create a new node of type node_CACHE_STRING_0, cache the string str_ch parameter and give 'tm' label the value of time since EPOCH c) link a and b together d) insert the value of time since EPOCH in structure_TIME_STRING e) insert size of the cached string (str_ch parameter) in structure_MEM_STRING structure f) link d and b together h) link e and b together end if go to the next 'extent' node end while else a) create a new node of type node_CACHE_STRING_1 (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_1.bmp) b) create a new node with 'ext' label equal to ext parameter c) create a new node with 'nbr_seq' label equal to seq_num parameter d) create a new node of type (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_0.bmp) and cache the string str_ch parameter into it and give 'tm' label the value of time since EPOCH e) link c and d together f) link b and c together g) insert the value of time since EPOCH in structure_TIME_STRING h) insert size of the cached string (str_ch parameter) in structure_MEM_STRING structure i) link g and d together j) link h and d together end if else a) create a new node with 'ext' label equal to ext parameter b) create a new node with 'nbr_seq' label equal to seq_num parameter c) create a new node of type (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_0.bmp) and cache the string str_ch parameter into it and give 'tm' label the value of time since EPOCH d) link b and c together e) link a and b together f) insert the value of time since EPOCH in structure_TIME_STRING g) insert size of the cached string in structure_MEM_STRING structure h) link f and c together i) link g and c together end if else a) create a new node with 'fn' label equal to fn parameter b) create a new node with 'extent' label equal to ext parameter c) create a new node with 'nbr_seq' label equal to seq_num parameter d) create a new node of type (see http://cppcode.angelfire.com/diagram/node_CACHE_STRING_0.bmp) and cache the string str_ch into it and give 'tm' label the value of time since EPOCH e) link c and d together f) link b and c together g) insert the value of time since EPOCH in structure_TIME_STRING h) insert size of the cached string in structure_MEM_STRING structure i) link g and d together j) link h and d together end if go to the next 'fn' and 'lang' node end while else if desc equals 0 then replace the oldest cached string by str_ch else delete the set of nodes corresponding to the nodes pointed by the (watch-out memory leek/null pointers we're in C/C++ !) update structure_MEM_STRING end if end if [/code] (*7) make_spec function is the most import function of all the program. Its job is to transform specification string into concrete strings. the concrete strings are then store (for latter use) and send to the user (over the internet). I takes 3 parameters spec_str, nbr_seq, ext. it is called as follows : make_spec(spec_str, nbr_seq, ext) spec_str : specification string nbr_seq : sequence number ext : extent Numbering style: this function retunrs the concrete string in which the return elements are prepended by a letter and the numbering scheme are (alphabetic,alpha-numeric or numeric) in the perl style ((ex a,b,c,...z,aa) corresponding to (0,1,2,...25,26) or (ex 0,1,2,...9,a,b,c,...z,10) corresponding to (0,1,2,...,9,10,11,12,...,35,36)). When it is called nbr_seq is in decimal numbering scheme (base 10). It's got to be translated into (alphabetic or alpha-numeric) numbering scheme (base 26 or 36) before being prepended to the output string depending on the value of 'num_sch' query-string. Numbering application: The numbering scheme apply in the following cases a) ellipses (Ex.: 'foo ...') b) explicite globals (Ex.: '%NUM') c) alternatives (Ex: '{foo | bar | baz}') d) optionals (Ex: 'foo [bar]') e) the 'mult' query-string has a value of 1 f) exclusions (Ex: '{%NUM ! real}') Except in the case where the exclusion only leaves one element and that element is not combined with another case or is follow by any other element that might generate more than one case. For Example '{%NUM ! {real | double}}' only leaves 'integer' therefore its a simgle element and MUST NOT be preceded by a numbering scheme. But if we have '{%NUM ! {real | double}} {integer | double}' the numbering scheme applies because we have 2 cases ('integer_0 integer_1' and 'integer_0 double_1'). Also if we have '{%NUM ! {real | double}} %NUM' the numbering scheme also applies because we have 3 cases ('integer_0 integer_1' 'integer_0 real_1' and 'integer_0 double_1'). Also the case '{%NUM ! {real | double}} [string]' because we have the cases ('integer_0' and 'integer_0 string.' Finally when combined with the ellipses at always applies because in the worst-case senario we have for example '{%NUM ! {real | double}} integer ...' with an extent of 1 which would produce 2 cases ('integer_0 integer_1' and 'integer_0 integer_1 {...} integer_N') Any other case MUST NOT be preceded by a numbering scheme. Numbering scheme: there are 5 type of numbering schemes depending on the value of 'num_sch' query-string - numerical numbering scheme (num_sch = 0) (0 - 9) - uppercase alphabetic numbering scheme (num_sch = 1) (A - Z) - lowercase alphabetic numbering scheme (num_sch = 2) (a - z) - uppercase alpha-numeric numbering scheme (num_sch = 3) (0 .. 9 - A .. Z) - lowercase alpha-numeric numbering scheme (num_sch = 4) (0 .. 9 - a .. z) Ellipse: In the special case of the specification string with an ellipse (...) the extent parameter becomes important: it tell's us how many strings to generate for each particular case. In the above example (Ex. ellipse group of examples) the ext parameter has a value of 3 so we generate 3 expantion of foo, 3 expantion of bar, 3 expantion of baz, in each of the expantion it has one more element than the preceding one. i.e. we start with 'a) foo' which has 1 element. then we produce 'b) foo foo' which has 2 elements then we produce 'c) foo foo foo' which has 3 elements and finally we produce an extra forth element that contains 'd) foo foo foo {...} foo' this element as we see contains ' {...} ' before the last foo meaning that there cas be N foos. The same goes for bar and baz ('d) foo foo foo {...} foo', 'h) bar bar bar {...} bar ', 'l) baz baz baz {...} baz'). This is to give the user a visual cue that there may be N number of cases. The minimum value ext parameter can have is 1 so with the extra case (the case with '{...}') its always at least 2 elements. Ellipse with globals: When specification string has a explicite global and is followed by ellipses (Ex: '%NUM ...') each item in structure_GLOBAL would be expanded to the following concrete strings a) integer_0 b) integer_0 integer_1 c) integer_0 integer_1 integer_2 d) integer_0 integer_1 integer_2 integer_3 e) integer_0 integer_1 integer_2 integer_3 {...} integer_N f) real_0 g) real_0 real_1 h) real_0 real_1 real_2 i) real_0 real_1 real_2 real_3 j) real_0 real_1 real_2 real_3 {...} real_N k) double_0 l) double_0 double_1 m) double_0 double_1 double_2 n) double_0 double_1 double_2 double_3 o) double_0 double_1 double_2 double_3 {...} double_N Notice that the last element is N (e,j,o) Globals: The element that are present in globals (structure_GLOBAL) must be appended with thier respective numbering position within the string. Starting from 0 and incrementing by 1 at each position. They must be appended with a '_' and a number even if they are not noted as globals in the specification string. Example if structure_GLOBAL contains on of its nodes which field 'name' label has the value of '%NUM' and the NUM.txt file contain 3 elements (integer, real, double) then whenever a integer, real or double is present in specification string it is appended with the positional number up to the extent. For example an extent of 4 would produce 'foo double_0 double_1 double_2 double_3 {...} double_N'. Note the extra ' {...} double_N'. Note we drop from the dollar sign '$' from this spec unless of course it is explicity writen in the specification string of course. For example 'foo integer real' would be output 'foo integer_0 real_1'. As another example 'foo %NUM'. Numbering position start at 0 and increment by 1 at each position as in the above example the specification string is '%NUM ...' would generate the contrete string 'double_0 double_1 double_2 double_3 {...} double_N' if the first element is part of a global if its not then as in the case of 'foo %NUM' then we start numbering from 0 from the second element of the string in this case the specification string 'foo %NUM' would generate the concrete string 'foo double_0 double_1 double_2 double_3 {...} double_N'. Ellipse with alternatives: When specification string has a alternative and is followed by ellipses (Ex: '{integer | real} ...') each item in the alternative would be expanded to a) integer_0 b) integer_0 integer_1 c) integer_0 integer_1 integer_2 d) integer_0 integer_1 integer_2 integer_3 e) integer_0 integer_1 integer_2 integer_3 {...} integer_N f) real_0 g) real_0 real_1 h) real_0 real_1 real_2 i) real_0 real_1 real_2 real_3 j) real_0 real_1 real_2 real_3 {...} real_N Notice that the last element is N (e,j) Ellipse with grouping elements: When element are grouped inside '{' and '}' and those are not alternatives (there is no '|' within '{' and '}') then those element group-repeat as a block. For example '{integer real} ...' would be expanded to a) integer_0 real_0 b) integer_0 real_0 integer_1 real_1 c) integer_0 real_0 integer_1 real_1 integer_2 real_2 d) integer_0 real_0 integer_1 real_1 integer_2 real_2 integer_3 real_3 e) integer_0 real_0 integer_1 real_1 integer_2 real_2 integer_3 real_3 {...} real_N-1 integer_N Note: In the last line the preceding last elements (real_N-1) is numbered 'N-1'. (*8) looking-up for the (fn, lang and item_str) in the structure_SPEC_STRING is a function that takes 4 parameters. prototype : lookup_structure_SPEC_STRING(structure_CACHE_STRING, fn, lang, item_str, seq_num, ext) parameters: 'fn' : file name corresponding to the file read at the program start-up into structure_SPEC_STRING 'item_str' : string to be searched 'lang' : directory (*9) corresponding to the directory (*9) name from which the file was read during the program start-up. 'structure_SPEC_STRING' : the structure into which we are inserting, The function throws a SPEC_STRING_NOT_FOUND exception if there is no specification string found sample-code: [code] while 'fn' field in the structure not NULL if 'fn' field in the structure equals fn parameter and 'lang' field in the structure equals lang parameter while 'spec_str' field in the structure not NULL if field in the structure 'spec_str' (up to the first space/tab) equals item_str parameter then set found_elem to true return spec_str (the whole string (this string is space/tab seperated)) end if end while go to the next 'fn' and 'lang' node end while if found_elem equals false then raise SPEC_STRING_NOT_FOUND end if [/code] (*9) a directory can be: - relative (ex: foo/bar) this directory is relative to the current directory. Say we're in directory /baz then foo/bar is /baz/foo/bar - absolute (ex: /foo/bar) this directory is always /foo/bar. Note: it starts with '/' - user (ex: ~/foo/bar) this directory is relative to the current user's directory. Say the user if Joe then ~/foo/bar is /Joe/foo/bar Note: 1) some Linux/Unix always append '/' to directory names so foo/bar/ is the same as foo/bar 2) ./ is the current directory (Unix) 3) ../ is the parent directory (Unix) 4) .\\ is the current directory (Windows) (Notice the double back-slash (back-slash must be escaped)) 5) ..\\ is the parent directory (Windows) (Notice the double back-slash (back-slash must be escaped))