|
Description
The goal of a machine code decompiler is to analyze executable files (like .EXE or .DLL files in Windows or ELF files in Unix-like environments) and attempt to create a high level representation of the machine code in the executable file: the decompiler tries to reconstruct the source code from which the executable was compiled in the first place.
Since compilation is a non-reversible process (information such as comments and variable data types is irretrievably lost), decompilation can never completely recover the source code of a machine code executable. However, with some oracular (read "human") assistance, it can go a long way towards this goal. An oracle can provide function parameter types, the locations of otherwise unreachable code, and user-specified comments.
The decompiler is designed to be processor- and platform-agnostic. The intent is that you should be able to use it to decompile executables for any processor architecture and not be tied to a particular instruction set. Although currently only a x86 front end is implemented, there is nothing preventing you from implementing a 68K, Sparc, or VAX front end if you need one.
The decompiler can be run as a command-line tool, in which case it can be fed either with a simple executable file, or a decompiler project file, which not only specifies the executable file to decompile but also any oracular information that assists its work. The decompiler also has a graphical front end, which lets an operator specify oracular information while examining the decompiled executable.
The outputs of the decompiler are a C source code file containing all the disassembled code and a header file in which type-reconstructed data types can be found.
Status
The decompiler project is in a pre-alpha stage. As it stands, it is able to load MS-DOS and PE binary files, disassemble their contents, rewrite the disassembled instructions into intermediate code, and perform the analysis phase mentioned above. Currently work is focussed on type analysis, while code structuring is on the back-burner as it's considerably less complex than type recovery.
|