Li Zhang's Homepage

The publication of "Equidistant Letter Sequences in the Book of Genesis" by Witzum, Rips and Rosenburg in Statistical Science 1994 generated the phenomenon of Bible Code Fever: the hunt for hidden messages in sacred (and not-so-sacred) texts encoded as equidistant letter sequences. The fallout included a New York Times bestseller (that you can buy in hardcover at McNally-Robinson for $6.90) and many many statisticians and computer scientists deploying computer cycles to this pursuit.

Given a string S, an equidistant letter sequence in S is a set of letters at positions n, n + d, n + 2d, n + 3d, ..., n + md = k that spells out a word of interest. n is called the start, d is called the skip, and k is the end of the word.

For example, in the following string:

THEQUICKBROWNFOXJUMPEDOVERTHELAZYDOG

the word "coo" is an equidistant letter sequence with N=7,D=4. "he" is one with N=28, D=-3.

The article mentioned above found that equidistant letter sequences for the names of prophets and their birthdates in a Hebrew edition of the Book of Genesis were improbably close together.

This generated "Bible Code Fever", a euphemism of statisticians for a lot of nonsense about hidden messages in sacred texts. One chap made even made a fortune with a New York Times bestseller full of such "findings". When criticized by statisticians that such "findings" were "hidden" everywhere, he replied that he would believe them when someone found the prediction of a modern assassination in Moby Dick.

This will be your first C++ assignment! Below, you are provided with the raw text of Melville's novel. First, write a program that takes the text and converts all capital letters to lower case and strips it of all blanks, punctuation and numerals.

Then, write another program that searches for equidistant letter sequences. Your program should ask the user for a string and then print the start location and skip for the first occurence of the string. Only use positive skip values.

Here is some hypothetical output:

Enter a search String (null to terminate):sandy

Length is 5

Match starts at 22, skip is 32027

Please ensure you do all of the following:

Because the search is costly, trim the first 90000 characters from the original text, before stripping out spaces, digits, etc. (You can include this logic in Part1.)
You can declare fixed arrays for the text and search string, but all code must use POINTER logic.
Submit three files: part1.cpp, part2.cpp (your two programs) and some captured output giving the first location of the target strings.
DO NOT SUBMIT TO THE I:DRIVE IN ALIBABA. Use the WebCT dropbox!!!

The target strings are

"ladydiana" (there is only one occurence)
"dodi"
"royal"
"henri"