Disk Utility Programming
© 1998, 1999 By Colin Davis
Contents
Introduction
This document describes common data structures, concepts and programming techniques you need to know in order to write and debug disk utility software. A disk utility program could be anything from a program like FDISK to a virus detector to a complete disk imaging or backup application. In the last section I present a few simple disk utilities to demonstrate the use of my PDISK library. Throughout this document pieces of code appear in order to precisely explain a key point or just to illustrate a good way to program disk software.
Most of the code examples use Borland style Pascal, using the Free Pascal compiler, or Delphi, or TMT. The differences between these implementations are trivial but I will discuss them. In addition I will show how to write a similar library to PDISK for C using DJGPP. It is slightly simpler because some of the work has been done in BIOS.H. Remember that the language you use is un-important compared with the ease of accessing the system hardware and operating system calls that your compiler's libraries offer you.
I assume the reader has some experience with basic PC programming including assembly language and using BIOS and DOS functions from an assembler or high level languages.
0
Disk Access nd Addressing
0.1 LBA and CHS
All the data on disks is divided into sectors which are blocks of 512 bytes. Any address in this documentation refers to a sector’s location on a disk. A sector address or Logical Block Address (LBA) needs only one number to specify a unique location on a disk, the logical distance in blocks from the start of the disk. While in reality a disk system may use head, track and sector locations to form an address you can use LBA to simplify thinking about partition and file system layout and it eliminates an unneeded layer of complexity.
Addressing a disk using Cylinder, head, sector information is called CHS addressing. You would only need to know about it if you write programs for older hard disks or need to use the PC BIOS functions to access disks. In those cases you can usually find or write libraries to allow you to do most of your work in LBA. In the next sections I will discuss how to convert CHS to LBA and how a disk library may do this for the programmer.
TO say something comes "after" something else on a disk just means it starts at a higher LBA. This simplicity makes it easy to think about the arrangement of data on a disk. It’s harder to talk about a sector’s location using CHS addressing unless you keep in mind the geometry of the particular drive you’re dealing with. You can think of the cylinder as the distance of the data from the center of the disk, the head is the side of the disk the data is on (when more than two heads exist, think of multiple disks in the same drive), and the sector is the distance around the disk to turn to read that sector. The cylinder is the most significant part of the CHS address, then the head, then the sector. If you keep this in mind you won’t get higher and lower addresses mixed up, on the same drive but when you are comparing two drives with different numbers of sectors per track (a track is a cylinder on only one side of a physical disk) then it’s easier to think in terms of LBA.
0.2 BIOS Disk Services and LCHS
The BIOS disk services use logical CHS addresses, which can be translated into an actual CHS location on a disk called a PCHS or physical CHS address. In fact it's not even that easy, because the CHS address will be in the format presented in the BIOS setup utility which may need further translation at the hardware level to get the actual sector location inside the drive. When you give an address to the BIOS disk functions in terms of LCHS parameters the BIOS has to translate it to a CHS address, which the drive or drive controller then translates into the actual physical address. The PC BIOS allows LCHS parameters in the range of 0-1023 cylinders, 0-255 heads, and 1-63 sectors. Obviously no drive has 256 heads—the BIOS and then the drive translates these numbers into the physical location of the sectors you want to address. They are just a way to let you continue to use the antiquated addressing scheme. In the BIOS setup utility program you may notice that the number of heads only goes up to 16 and the number of cylinders goes up to 65536. This actually allows for addressing more sectors than the 1024/256/63 system the BIOS gives to the programmer (LCHS), but for compatibility reasons the BIOS disk services do not use an addressing system with more than 1024 cylinders. One consequence of this is the 8.4GB size limit on drive addressing with LCHS addresses.
You can find the number of cylinders, heads, and sectors a disk uses (the drive parameters) printed on the fixed disk itself, and usually your BIOS setup program will have an auto-detect function or it will automatically detect the drive parameters every time the system boots if the fixed disk is a SCSI drive, or you have a newer BIOS. For programming at the BIOS level these parameters won't help -- you need the LCHS parameters. These parameters don't typically reflect the true number of cylinders, heads, and sectors per track in the drive but for most programming purposes this doesn’t matter at all—you only need a logical equivalent set of parameters.
I have over-simplified CHS addressing in order to move on to the programming examples. There are in fact several intermediate implementations of LCHS addressing schemes in some BIOS versions which allow for addressing drives larger than 504 megabytes, but which do not address a full 8.4GB. For the most part you don't need to worry about these if you want to use only systems built after 1995. And before I move on, you should know there is a completely different system of disk addressing in BIOS for disks larger than 8.4GB which works using LBA. I will discuss this more later on.
0.3 Short Summary of BIOS Disk Services
You need to know about the BIOS disk services if you intend to write disk software that runs under DOS or a DOS extender. The following code examples show the use of some common disk services and how you can call them from your programs.
Before doing anything else, get the logical CHS parameters for the drive you want to work with so you can form addresses to it. BIOS service 13h function 8 returns the LCHS parameters of a drive.
Call with: AH=8, DL = BIOS drive number (80h, 81h …for hard disks, 0, 1, .. for floppies)
Returns: DH = heads, CX = cylinder/sectors, dl = number of drives attached
Here is an example in Borland's built-in assembler for Borland real mode Pascal:
function heads(drive:byte):byte;assembler;
asm
push ds
mov ah,08h
mov dl,drive
int 13h
mov @result,dh
pop ds
end;
This returns the number of heads. To get the number of cylinders and sectors you have to split up the CX register into the part with the sectors and the part with the cylinders result. The cylinder value is in the high byte of CX, CH, and the two high bits of CL. The sector value is in the six low bits of CL. Here is a function to do that:
procedure convertCX(valcx:word;var sect:byte;var cyl:word); var c,sec:byte; x:word; begin asm push ds mov cx,valcx mov bx,cx mov c,ch and cx,0000000011000000b shl cx,2 mov x,cx mov cx,bx and cl,00111111b mov sec,cl pop ds end; sect:=sec; cyl:=x+c; end; Don't get the idea that you must use assembly language. Sometimes it makes things easier but relying on it may make code less portable and harder to read.
Remember if you're working with a 32 bit protected mode compiler like TMT or DJGPP or FPC you probably need to use the package's DPMI library to access the BIOS services and to read and write the lowest megabyte of memory. The principle of using the BIOS is the same -- you still use the same registers.
Here's the same function written for TMT Pascal:
function heads(drive:byte):byte; var regs:trmregs; begin clearrmregs(regs); with regs do begin ah:=$08; dl:=drive; realmodeint($13,regs); heads:=dh; end; end;
The Realmodeint() procedure and similar functions in other DOS 32-bit development packages lets you use the BIOS services and most other real mode interrupts such as MSDOS function calls. With the DJGPP package you already have the BIOS.H library available to do the dirty work for you. All you need to know is what the arguments to biosdisk() mean and how to use it. Look at BIOS.H for how to use it.
Function 03 writes up to an entire track at once.
Call with: AH = 3, ES:BX = segment:offset of buffer to write from, DL = drive, AL = number of sectors to write, CX = sector + cylinder, DH = head
Returns: AL = number of sectors written, AH = 0 if no error, otherwise an error occurred
procedure writeTrack(drive, num:byte; track, head:word; sector:byte; dest:pbuf); var sc: word; begin sc:= makeCX(sector, track); asm push ds mov ah, 3 mov al, num mov dl, drive mov cx, sc mov bx, head mov dh, bl les bx, dest int 13h pop ds end; end;
Here is the same function for TMT Pascal:
procedure writeTrack(drive, num:byte; track:word; head, sector:byte; dest:pbuf); var r : trmregs; sc:word; begin clearRMregs(r); sc:= makeCX(sector,track); with r do begin ah:=3; al:=num; dl:=drive; cx:=sc; bx:=head; dh:=bl; bx:=0; es:=dosbufseg; {Predetermined segment for temporary DOS/BIOS work } move(dest^,dosbufptr^,num*512); {Put contents of dosbufseg where you want them } realmodeint($13,r); end; end; If you don't understand the use of the ES:=dosbufseg never mind. It's something you have to do when accessing memory from the first megabyte in a way that the BIOS function can read and write to the segment DOSBufSeg. BIOS code uses regular real mode segment:offset addressing only and it can only access memory in the lowest megabyte because that's how the BIOS code is written -- to be compatible with MS-DOS. You use a DPMI function to find a free block of DOS memory and return a segment pointing to it -- the offset is assumed to be zero. DOSBufSeg holds this segment. For more information read the examples at the end of this document.
0.4 History
The two different addressing forms exist because of an attempt to keep the BIOS disk functions backward compatible. At one time the cylinders never exceeded 1023 so the 10 bit field the BIOS used to pass the cylinder number to functions needing a CHS address could accommodate any drive’s parameters. Then drives got more than 1024 cylinders and drive capacity grew beyond 504 megabytes. Since the BIOS 13h disk services used an 8 bit field for storing the head number in it’s CHS addresses it was simple to allow the newer BIOS versions to use all eight bits for the head number instead of only four bits and this allowed for accessing drives up to 8.4GB. In fact, some BIOS versions had intermediate limits such as 2.1GB or 3.1GB, but for different reasons. In order to use a drive larger than 504mb on a computer with an older BIOS version, OnTrack designed a replacement set of BIOS routines for diskaccess which would load as soon as the computer booted off the hard disk. Another solution was to use a replacement disk controller that had replacement BIOS 13h services in ROM. During the transition from a 504mb limit to 8.4GB limits some BIOS makers used different translation systems to give access to larger drives. One system gave 12 bits to the cylinder field while keeping the head limit to 4 bits thus giving a 2.1GB limit. You can read all about this problem in other places but this writer doesn't want to think about it. For more information on this subject see the "How it Works" series.
Recent BIOS versions such as Phoenix 4.05 contain extensions to the BIOS function 13h (disk access) that allow for LBA addressing of disks of more than 8.4GB, which is the size limit using LCHS. If you're not going to need compatibility with pre-Windows 95 era computers you could probably get away with only coding for the LBA BIOS extensions, but it's a bad idea because of how much old hardware is still out there.for
For information on CHS and LBA and disk addressing read the "How it Works" series part 1.
1
Partitions
Hard disks may contain more than one file system or more than one logical drive using the same file system. Each file system or logical drive resides on a single contiguous block of space on the disk. These blocks are known as partitions. Floppy disks and removable media in general don’t use more than one partition. Since a fixed disk can be expected to contain more than one partition commonly, most operating systems come with utilities to partition a drive and place a table or tables on the drive describing where those partitions start and end on the disk. A partition is defined as a start and end address or a start address and a size in sectors (which you use to calculate the end address). A partition table doesn’t need much space since a constant amount of information will describe any partition. The standard table takes 64 bytes. A table has four records which take 16 bytes each. Usually only one or two of these records are actually used.
The first sector of a hard disk usually contains a master boot record. The MBR has some executable code and a partition table with four entries. When you turn on the PC or reset it, the PC BIOS reads this sector and executes the code. The code examines the table and looks for a partition marked as active. The first sector of the active partition gets read into memory and begins executing it as code. This code in turn attempts to load some sort of operating system based on information found in the first sector of the active partition such as where the kernel loader program resides on the active partition. If no active partition exists the code calls interrupt 18h which invokes ROM BASIC on old PCs and on most others just gives a message about no operating system.
Since partition tables and the MBR aren’t contained within an operating system they must be used and acted on by many different operating systems. While there is no formal standard everyone usually follows Microsoft’s way of doing things. Every disk partitioning program uses this structure for partition entries in the partition table:
TpartRec = record active , StartHD:byte; StartCylSect:word; PartType, EndHd:byte; EndCylSect:word; StartLBA, size:dword; end; Figure 1.1 A partition table record (Dword is an unsigned longint)
The MBR sector has four of these tables starting at offset 446 decimal. Each partition table takes 16 bytes. The last two bytes of the MBR are AA55h, which is the system signature. Without this signature the BIOS won’t properly identify the sector as the MBR, and so won’t boot the system without it.
The Active field can either be set to 0 for inactive or 80h for active. 80h is also the BIOS number of the first hard disk, which is normally the boot drive. 0 is the BIOS drive number for floppy A: which is the other default boot device.
The PartType field indicates the file system type, version and operating system from which the partition originated. Here’s a partial list of partition types:
0:Unknown 1:FAT12 2:XENIX 4:DOS FAT16 5:Extended DOS; 6:FAT16 7:NTFS 11:FAT32 12:Win95b+ FAT32 using LBA 14:FAT16 using LBA 15:Win95 Extended DOS using LBA 51:Ontrack extended partition 64:Novell 75:PCIX 160:Phoenix Save To Disk $db:CP/M $E0:DBFS $FF:BBT Figure: 1.2: Common partition types The STARTHD and STARTCYLSECT fields in the partition record make up the CHS (in LCHS) address where the partition starts. The STARTLBA field does the same in LBA form. If you're writing a disk utility it's a good idea to fill both of these fields with correct values when possible even though you may not think one or the other is needed, because the operating system may use the LBA values in the partition table. When a DOS system boots it looks at only the CHS addresses in the partition records. This is one reason a boot drive in DOS/Windows can not exceed 8.4GB. 2 The FAT File Systems
In this section I discuss some concepts important to understanding the FAT file system, both FAT16 and FAT32, as well as FAT12. The boot sector, FAT, and root directory are the important parts of this type of file system. Understanding these subjects will help you better use the utilities here and help you read the sample source code.
2.1 Overview
The boot sector contains information needed to locate everything else in the FAT partition. The boot sectors for FAT12, FAT16, and FAT32 file systems differ slightly, because their layouts also differ slightly and locating the root directory works differently, among other things.
FAT stands for File Allocation Table. Every FAT type file system uses at least one, and usually has one as a back-up as well. The different varieties of FAT file systems, FAT12, FAT16, FAT32 get their names from the size of each location in the table. For instance in FAT12 each location is 12 bits wide. That also means the maximum sized FAT in FAT12 can have 2^12 locations because no two locations may contain the same number. Every location in a FAT corresponds to a different area on the partition This is how FAT allocates space for files on the disk. Each file has a cluster number (disk location) that it starts, and this also equates to a location in the FAT. That location will either contain another cluster number where the next part of the file is, or a number indicating this is the last cluster of the file. The last cluster marker is FFFFh in FAT16, and FAT32 uses 0FFFFFFF. FAT32 actually only uses 28 bits of the 32 bit wide available space, so the end of file marker is only a 28 bit number. According to Microsoft the other four bits should be left alone and always stay at zero. A program reading or writing 32 bit numbers to a FAT32 FAT should ignore the four high bits of the entry.
The most important concept to understand about a FAT file system is that the FAT directly corresponds to the contents of the Fat partition. A FAT partition is divided into clusters. A cluster is just a certain number of sectors. All clusters on a FAT partition are the same size. The clusters start after the Fats in the partition. If you are reading cluster 10 on the partition you should go to location 10 in the FAT to find out whether or not that cluster has been allocated for use yet, or if it has been marked as a defective area of the disk. Think of the FAT as a low resolution snapshot of the rest of the FAT partition.
2.2 The BPB and The Boot Sector
The boot sector in a FAT type partition sits at the first sector in the partition. In the FAT32 variety the boot sector has a back-up copy somewhere in the first few sectors in the FAT partition, usually at sector 6, before the first FAT starts. The boot sector contains information about the FAT partition such as its total size in sectors, the number of sectors per track on the disk, the number of heads, and the volume name. This table is called the Boot Partition Block, or BPB. The sector also contains code to load the start of the file IO.SYS. This code uses a crude method of finding IO.SYS—it looks for the first entry in the root directory and loads the first three sectors of that file. For this reason you can’t write a batch file to rename and copy IO.SYS and MSDOS.SYS to a back-up directory and then copy in different versions of those files in order to change the version of DOS. Aside from all the version conflicts that would come up, IO.SYS could get relocated in the root directory and then nothing would work since the boot sector loader program would load garbage. Perhaps this code isn’t technically part of a FAT file system but part of the bootable DOS partition instead. In any case you should know about it.
An OS on a FAT partition boots by first executing the FAT boot sector's code to load the rest of the OS. In DOS this is IO.SYS and MSDOS.SYS, which run COMMAND.COM. COMMAND is more or less the shell for DOS. The first three bytes of the boot sector contain a jump instruction and an address to the code further on in the boot sector. The sector gets moved from the disk to memory by the code in the master boot record, or the BIOS automatically reads it in, in the case of a floppy disk. A table, the BPB, follows these three bytes. The jmp instruction gets executed and then the boot sector loader program refers back to the table to find the root directory to load the rest of the OS. It calculates the root directory's location by adding the number of reserved sectors, the sizes of the FAT in sectors by the number of FATS, and the absolute address of the FAT partition. In FAT32, the root directory starts at a cluster defined in the BPB. So its offset has to be calculated and added to this address to get the location of the FAT32 root directory.
The format for a FAT12 or FAT16 BPB is given in the table below.
br_OEMname =3; br_bytesPerSector =$b; br_sectPerCluster =$d; br_reservedSectors =$e; br_numFAT =$10; br_numRootDirEntries =$11; br_numSectors =$13; br_mediaType =$15; br_numFATsectors =$16; br_sectorsPerTrack =$18; br_numHeads =$1a; br_numHiddenSectors =$1c; br_numSectorsHuge =$20; br_driveNum =$24; br_reserved =$25; br_signature =$26; br_volumeID =$27; br_volumeLabel =$2b; br_fileSysType =$36; Figure 2.1 First part of FAT12 or FAT16 boot sector (BPB) These are the offsets in bytes from the start of the boot sector ($ is Pascal hex notation). Here is a run down of the important parts of the BPB.
You will notice a field for the volume label. All those utilities which change the disk label change this field. It's easy -- you can do it too. The field is 11 bytes wide and padded with spaces. Put the ASCII values of the letters you want to make up the label in there.
The br_filesystype field also has an ASCII string. It may contain something like "FAT16" but there is no requirement. Some utility software uses the value of this field to determine what kind of FAT system it is dealing with but this isn't a good programming practice. There are other ways to figure out what variety of FAT a program is looking at which I'll discuss later.
Similarly, the br_OEMname field is really just an arbitrary string and shouldn't be used for anything other than information. Depending on the format program used it can have different values, but usually it is set to MSWIN4.1.
BR_bytespersector should be set to 512 most of the time but you shouldn't assume this. Some types of media can use 1024 or 2048 bytes per sector and Microsoft claims all its OS code will correctly handle these other values.
BR_numsectorshuge was added to accommodate larger FAT partitions which used more than about 2^16 sectors. All FAT12 file systems use only the br_numsectors field, but the reverse is not true. A FAT12 partition can only have up to 4096 clusters (2^12). You can have the clusters in a FAT12 partition use more than one sector but when it gets larger than 16 megabytes ( > 32768 sectors) you're using 16 sectors per cluster and FAT16 is obviously much more efficient. According to Microsoft you determine the type of FAT by looking at the number of sectors in the partition and the number of sectors per cluster. If you want to stay true to this standard you would have to always use FAT16 whenever you had more than 4096 sectors to allocate.
One of the entries in the above table gives the number of reserved sectors. The OS uses this to figure out where the FAT starts. The FAT starts right after the last reserved sector. To calculate the absolute address of the start of the FAT, add the address of the FAT partition found in the partition table and the reserved sectors. On a floppy just use the reserved sector number. Another entry in the table gives the number of FATs on the partition. Usually a FAT file system uses two FATs, one right after the other on the disk. You will also notice the sectors per FAT entry. To find the LBA of a cluster in the partition you add the sectors per FAT times the number of FATs, the reserved sectors, and the total space taken up by the root directory.
This table provides enough information to figure out where the root directory starts, where the FAT is, where the first piece of regular file data starts, and where the last piece of data on the partition is. The exact method for calculating these things depends on the type of FAT. With FAT16 and FAT12, the root directory entries value is important because it indicates the size of the root directory. In FAT12 and FAT16 the root directory immediately follows the last FAT, and following the root directory is the rest of the file data. Knowing the size of the root directory is crucial. With FAT32 the root directory doesn’t occupy a fixed location on the disk except the first cluster of it. The table in the boot sector for FAT32 contains an entry for the first cluster of the root directory. Usually the root directory starts at cluster 2, the first data in the FAT partition after the FATs.
br32_OEMname =3; br32_bytesPerSector =$b; br32_sectPerCluster =$d; br32_reservedSectors =$e; { Normally 32 } br32_numFAT =$10; {use 2, for compatibility } br32_numRootDirEntries =$11; { 0 in FAT32 } br32_numSectors =$13; br32_mediaType =$15; br32_numFATsectors =$16; br32_sectorsPerTrack =$18; br32_numHeads =$1a; br32_numHiddenSectors =$1c; { 4 bytes } br32_bigTotalSectors =$20; {4 bytes } br32_bigSectorsPerFAT =$24; br32_extflags =$28; br32_fs_version =$2a; br32_rootdirstrtclus =$2c; {4 bytes, usually 2 for data recovery } br32_fsinfosec =$30;{use 1 for data recovery} br32_bkupbootsec =$32; {use 6, for data recovery software access} Figure 2.2 First Part of FAT32 boot sector