SELF-MODIFYING CODE PROJECT 2 OBJECTIVE To learn the basic techniques required to modify the code segment of a DOS EXE file and to write a program with a practical application that does so. PROGRAM DESCRIPTION The program DOSGUARD.COM is a DOS application which modifies the code segment of every COM and EXE file in the same directory. DOSGUARD adds code to each of these programs that requires the user to enter a password in order to continue execution of the program. STRUCTURE OF EXE FILES The EXE file format is much more complicated than the COM format. The big differnece is that EXE files allow the program to specify how it wants its segments to be laid out in memory, allowing programs to exceed one 64k segment in size. Most EXEs will have seperate code, data, and stack segments. All of this information is stored in the EXE Header. Here's a brief rundown of what the header looks like: Offset Size Field 0 2 Signature. Will always be 'MZ' 2 2 Last Page Size. Number of bytes on the last page of memory. 4 2 Page Count. Number of 512 byte pages in the file. 6 2 Relocation Table Entries. Number of items in the relocation pointer table. 8 2 Header Size. Size of header in paragraphs, including the relocation pointer table. 10 2 Minalloc 12 2 Maxalloc 14 2 Initial Stack Segment. 16 2 Initial Stack Pointer. 18 2 Checksum. (Usually ignored) 20 2 Initial Instruction Pointer 22 2 Initial Code Segment 24 2 Relocation Table Offset. Offset to the start of the relocation pointer table. 26 2 Overlay Number. Primary executables(the ones we wish to modify) always have this set to zero. Following the EXE header is the relocation pointer table, with a variable amount of blank space between the header and the start of the table. The relocation table is a table of offsets. These offsets are combined with starting segment values calculated by DOS to point to a word in memory where the final segment address is written. Essentially, the relocation pointer table is DOS's way to handle the dynamic placement of segments into physical memory. This isn't a problem with COM files because there is only one segment and the program isn't aware of anything else. Following the relocation pointer table is another variable amount of reserved space and finally the program body. To succesfully add code to an EXE file requires careful manipulation of the EXE header and relocation pointer table. BRIEF DESCRIPTION OF DOSGUARD DOSGUARD is a small DOS utility that adds code to all DOS EXE and COM files within the same directory it inhabits. DOSGUARD will skip over windows and OS/2 executables and will only modify those COM files which have enough available space. Also, it avoids modifying itself and files which it has already changed. Successfully modified files will prompt the user to enter a password before running the program. DOSGUARD serves as a good example of the steps necessary to add modify the code of a DOS EXE file. Also, it demonstrates one practical application of code modification. BRIEF OVERVIEW OF DOSGUARD'S EXECUTION DOSGUARD begins by finding and infecting all the COM files in the same directory. This document will ignore the details of COM file modification as that subject is covered sufficiently in the documentation of DOSGUARD's little brother COMGUARD. DOSGUARD borrow its COM file code directly from COMGUARD. Next, DOSGUARD has to search for EXE files and determine which ones it can safely modify. First of all, it checks the first 2 bytes of the file to make sure that they are 'MZ', the signature which all EXE files have in common. After that, DOSGUARD ensures that the file hasn't already been infected. If the following caluation is true, then the file has already been modified by DOSGUARD: (initial CS * 16) + 9Fh + size of EXE header in bytes == filesize 9Fh is the length(in bytes) of the code that DOSGUARD adds to the end of each EXE file it infects. So, the initial CS which is stored in the EXE header, combined with the initial IP(which is always zero in this case so it is left out of the calculation) is exactly 9Fh bytes from the end of the file. Those 2 figures plus the size of the EXE header(which is ignored by the program when it determines segment offsets) equals the size of the file if it has already been infected by DOSGUARD. Also, DOSGUARD only infects primary executables, so it checks to make sure the Overlay Number in the EXE header is zero. DOSGUARD also must avoid non-DOS executables like those for Windows or OS/2. DOSGUARD does this by checking the offset to the relocation pointer table. If the offset is greater than 40h, then the EXE could possibly be for windows or OS/2. The problem with this method is that it also causes DOSGUARD to skip some valid DOS executables. Once a file has been deemed safe to modify, DOSGUARD copies its code to the end of the file and determines the starting CS and IP for this code. These values will become the new starting segment values, so the new code will be executed before the main program. When the infected program is finished executing, it will jump to the original starting CS and IP if the proper password was given and execute the main program. STEP-BY-STEP MODIFICATION OF AN EXE FILE 1. Check the relocation pointer table to make sure there is room. DOSGUARD has to add 2 entries to the relocation pointer table. Each of these pointers is a double word(4 bytes). Since the relocation pointer table is part of the EXE header and the header can't be a fraction of a paragraph in size, there is a chance that the header will have to be extended one paragraph in order to fit the extra 8 bytes. Extending the header requires reading in the entire file below the header and writing it back out one paragraph down. Also, the header will have to be modified appropriately(Last Page Size, Page Count, and Header Size will need to be updated). In either case, the number of relocation table entries will need to be increased by two in the header. 2. Save original ss, sp, cs, and ip. These four values must be copied from the EXE header and stored within the code which will be added to the EXE file. 3. Adjust file length to paragraph boundary. In order to simplify the new code's starting IP, the file's length is extended to a paragraph boundary(multiple of 16). This causes the new code's staring IP to always be zero and makes it easier to calculate the starting code segment. 4. Write code to the end of the file. Write the code we want to add to the end of the EXE file. 5. Adjust the EXE header and write it out to the file. Make modifications to the EXE heder to reflect the changes we've made: initial CS = (file size before we added our code) / 16 - (header size in paragraphs) initial IP = 0 initlal SS = same as the initial CS (all of our code operates in one segment) initial SP = size of the code we added + 100h recalculate Last Page Size and Page Count increase relocation table entires by 2 6. Modify relocation table We'll be adding two 4 byte pointers to the end of the relocation table. The segment for both of these pointers will be the same as the initial CS of our code. The offsets will point to the initial SS and initial CS. In DOSGUARD they correspond to the offsets "hosts" and "hostc+2". So the end result are two pointers which point to the location of the initial SS and the initial CS of our code. RESPONSIBILITIES OF INSERTED CODE There are several items which the code module we added must take into consideration. First of all, when its finished, the state of registers, etc. must be exactly what the original program would expect them to be. For instance, ax is set by DOS to indicate whether or not the Drive ID stored in the FCBs is valid. So, the value of ax must be preserved by our code. Also, the original program may expect other registers to be set to initial values of zero. And of course, the segment registers need to be restored after our code's execution. Another thing is that inserted code can't be dependent on absolute addresses for its data. Therefore, DOSGUARD accesses all data by its offset from the end of the file. REFERENCES The Giant Black Book of Computer Viruses, 2nd Edition. Mark Ludwig DOS Programmer's Reference. Terry R. Dettmann