News Press Release for immediate release -- all media

Microsoft Excel Corrupts Its Spreadsheet Files

by Maj. Hog

"The popular Excel spreadsheet component of Microsoft Office corrupts its own data files," said Doctor Electron at the Net Census lab, in what may be a somewhat shocking revelation to many who rely on Microsoft Excel. Since the corruption schemes used by Microsoft Excel appear to be designed to prevent software writers from creating applications that read and process Excel files, this discovery may place another card on the table in considerations of monopolistic practices by the company. Another explanation might be the "time-bomb theory" described below, where rogue programmers within Microsoft may have planted instructions in Office software to corrupt the disk storage of user data.

While writing a program to read the "xls" files produced by Microsoft Excel, Doctor Electron found what appears to be a deliberate scheme to systematically corrupt the data in certain Excel files. For example, files containing macros or basic language programs are corrupted by Excel if they exceed a certain file length. "I deciphered the corruption schemes used by Excel which seem to be implemented precisely to prevent non-Microsoft programs from reading these files," said Doctor Electron. The full technical details are available in qxls.zip in two subroutines which undo the corrupted data written by Excel.

The meat of spreadsheet data is stored in segments of the file using BIFF8 standards for Microsoft Excel 97 up to more recent versions. The corruption occurs in this core BIFF8 portion of the Excel data file.

First, in some files, the order of 2048 bytes of BIFF8 information may be reversed at the beginning of the workbook segment of the files. Doctor Electron wrote a subroutine called Uncorrupt to undo this apparent effort to make Excel files readable only by Excel itself. The Uncorrupt routine simply copies the reversed segments to the correct order.

Second, at regular and predictable intervals in longer files, 512 bytes of trash information is inserted using several schemes which have now been successfully analyzed. The RemoveTrash routine will undo these forms of corruption by Excel. So far two separate corruption schemes in this category have been identified as well as the kinds of Excel files in which these schemes are deployed.

Even a child could recognize that something is wrong if the top of a picture appeared somewhere in the middle and horizontal sections of another picture were inserted at intervals in the image. Similarly, imagine a simple story where the first lines are actually cut from the text below, pehaps beginning and ending in the middle of sentences. Then, later in the story, lines of text from another story are inserted at bizarre intervals. In both examples, the picture and the story, the presence of corruption of the material is clear and obvious.

Third and perhaps less serious are various changes in BIFF8 standards which are found in Microsoft Excel files. For example, another scheme to make life difficult for programmers is the changing of BIFF8 record type codes by Microsoft Excel. Several changes have already been identified. Without knowledge of these changes, programmers would not be able to find certain types of key records.

Excel is widely used around the world. Thus, it is likely that software companies and independent programmers might want to read Excel files in their own applications. "The various measures by Microsoft to corrupt certain Excel files would definitely hamper these creative efforts," said Doctor Electron. "However, the Net Census documentation might remedy that problem. It is a pity that Microsoft would use such low-class and crude methods to prevent even a little success by others, especially when third-party programs are based on the premise that users will need Excel to create the files."

Users of Micosoft Office report that the program does not notify them that their data will be saved to disk in a corrupted manner, nor provide users with the option that disk files will contain a true and faithful version of their data. Meanwhile, Excel users await an explanation from the company concerning the corruption schemes applied to user data.

According to the "time-bomb theory," rogue programmers within Mircosoft may have used the special OLE format of Office files to deploy the data corruption schemes while recording in the FAT (file allocation table) section of the file the information necessary to undo the corruption if Excel reads the file again. The theory would explain how the corruption schemes could be hidden, provided that no one looked at the actual contents of the files, as might occur in forensic or disk failure recovery work. Since the OLE format goes back to at least 1997, the amount of corrupted user data in computer systems around the world could be truly enormous. The time-bomb exploded, so to speak, when the corruption schemes were eventually discovered and reported by Net Census. In this "time-bomb theory," there are two targets and victims: users of Office software and Microsoft Corporation itself.

An internal investigation within Microsoft might be required to determine when user data was first subjected to corruption in disk storage. When did management know that user data was being systematically corrupted by Mirosoft Office? How did these corruption schemes become part of this widely used product? What system does Microsoft use to check the work of its programmers to prevent this type of "time-bomb" insertion in other Microsoft products? Will Microsoft release a fixed version of Office software with the user data corruption schemes removed?"

Copyright © 2002 Global Services

Original Publication: October 25, 2002

Back to Net Census