Asmex
Asmex is a viewer for the internals of .NET assembly files. While the world is not particularly short of .NET assembly viewers, Asmex has some unique features and the source might prove useful in various contexts. Asmex's features include:
- Extract resources from assemblies
- View raw metadata tables
- Open assemblies as files or as Global Assembly Cache entries
- View disassembly (by cheating and spawning ILDASM)
- View PE file structures
- Browse types, namespaces, method parameters etc
There is no particular reason to read the rest of this page; just download Asmex and see if it's any use to you. The rest of this page is taken from the article about Asmex on Codeproject.
Rationale
Asmex was an educational project; the idea was to make an application that involved knowledge of the very lowerst possible level of .NET, yet also took advantage of the clean GUI model of WinForms. It is used in our company for training and debugging purposes.
In terms of low-level .NET, Asmex contains code to read raw metadata tables and suchlike. I was generally impressed by the efficient and ingenious way .NET metadata is stored.
The elegance (relative to MFC) of the WinForms model is demonstrated by fitting the heteregenous data obtained by reflection and binary file parsing into a common tree format for display. Again, I was impressed by how much less work this was than the MFC equivalent. A generic object properties viewer (taken from another project) is also shoehorned into Asmex -- it uses .NET's interesting Attribute funtionality to provide a properties list for each item in the tree.
Asmex was not intended to win prizes for canonically correct design, and that is why the data is held in classes derived from the GUI tree node. Sorry :)
This article will discuss the (hopefully) more reusable and interesting areas of the Asmex source code.
PE File Reader / .NET Metadata Reader
TheFileViewer namespace contains the most useful part of ASMEX. A very short and vague description of the structures dealt with in FileViewer follows. If you are an expert on Micrtsoft executable formats, you will want to skip it. Otherwise, you might find it too vague, in which case you can either look at Asmex's source or ask me about it. I love talking about file formats. Incidentally, if there is any demand for a brief overview of PE/.NET file formats and .NET type, resource and metadata concepts, I would love an excuse to write one.
Background -- PE Files
Almost every Windows executable, DLL or EXE, is a Portable Executable (PE) format file. Although there is little in the PE format that lends itself to .NET, in the current implementation of .NET all assemblies are contained in special PE format files, which have some traditional bits left out and quite a lot of new bits put in.
Very generally, a PE file consists of a PE header, which contains a list of Data Directory entries, and a number of Sections which are defined just after the PE header. Not all the Data Directories have meaning in a .NET file, and not many Sections are present either. Nevertheless, those that remain are still important -- in particular, the last Data Directory entry points to the start of .NET information.
Background -- .NET PE Files
The real starting point of a PE file, from the .NET point of view, is the COR20 Header, which tells the .NET runtime where to find the metadata. The COR20 header, like the PE header, specifies some Data Directories, as well as the entry point for the assembly. Most of these Data Directories point to things like fixup information which is not useful for examining the assembly, but one of them points to the start of the Metadata Streams.
Background -- Metadata streams
.NET holds metadata in streams (usually four of them). Each of these streams has a different format:
- The #Blob Stream holds binary data, which includes method signatures; also strings in UCS-2.
- The #US Stream stores strings in UCS-2.
- The #GUID Stream contains a list of all the GUIDs that are used in the assembly, end-to-end. GUIDs are referenced by 1-based indexes, rather than offsets, just to make life harder.
- The #String Stream holds UTF-8 strings which contain the names of types, methods etc. that are in the code.
- The #~ Stream contains the Metadata Tables.
- The #- Stream is used instead of #~, under rare circumstances. I think it contains uncompressed tables.
Background -- Metadata tables
Metadata tables are just regions of data, lying end-to-end inside the file. There is a fixed, known number of tables, and each table has a fixed, known range of tables that it's tokens (see below) can refer to. Tables do not actually contain things like strings, method signatures, etc.; rather, they contain either:
- Tokens that refer to a row in another table
- Numbers that refer to an entry in one of the streams
Background -- Types
There are two types which it is very important to understand when looking at .NET files at a binary level:
- RVAs -- These are a PE concept, and MOST pointers within a PE file are in this format. An RVA is the Relative Virtual Address of an item -- the address it will have relative to the base address at which the PE file is loaded, AFTER the file is loaded into memory. This is not the same as the offset within the PE file. Asmex converts between RVAs and actual file offsets in the
ModHeaders.Rva2Offsetmethod. - Coded Tokens -- These are a .NET concept, and represent an entry in a metadata table. They are fairly complex and functions to interpret them are provided in the
MDTablesclass. Generally, they specify both the table and the row within the table of a particular data item. Todo: a CodedToken class that supports all coded token operations.
The Classes
Generally, each class in FileViewer represents some chunk of the information described above. Where possible, each class describes an actual physical range of bytes in the file, and is therefore inherited from Region, which is an abstract base class with 'start' and 'length' properties. Even
though the information about where a given structure is physically located is not that useful in Asmex's treeview, we keep track of it in case we ever want to create a visual PE file examiner or a PE file emitter.
There are also some classes that do not represent a particular range of bytes, but encapsulate other information; these include the Metadata table related classes Table and TableCell, and also the classes related to PE import and relocation tables.
Each class takes a BinaryReader in its constructor. This reader is assumed to refer to the assembly file and to be 'wound' to the right offset. In some cases, it was necessary to adjust the reader's offset by hand, because some arithmetic is required in converting RVAs and so on.
These classes should serve as documentation for a wide range of PE and .NET structures. For comprehensive documentation, please see the Bibliography below.
Reflection Tree
A simple system for representing hierarchical data obtained from the PE file parser or by reflection. Each item is represented by a BaseNode-derived class, which holds a reference to a data object. Each node then GenerateChildren method
to populate the items below it, creating new data objects of various types as necessary. The logic for viewing .NET types by reflection is contained in these tree node classes. This logic is not very complicated and has been described in many places already, so there's no need to go through it here.
It is easy to add new data items to Asmex by deriving a new node class, and modifying the GenerateChildren method of another node so that your new node is sometimes generated. You can also override your node's GetMenu method to add context menu operations for that node type.
This design is not a work of genius, but it does the job of presenting the data in a unified way and generating nodes only on demand. In MFC it would probably have been necessary to build a large tree infrastructure and connect it to a CTreeCtrl by some horrible tangle of messages.
Property Viewer
The ObjViewer namespace contains a few classes that define a generic property-viewer control. ObjViewer is a UserControl that presents a list of name-value pairs for any given object. Of course, the properties available on an object don't necessarily add up to a freindly view of the object, so you can use the
ObjViewerAttribute attribute to modify the properties of a target object -- for instance, to specify that a property be shown in hex or not shown at all.
GAC Browser
The GACPicker class allows the user to select an assembly from the Global Assembly Cache. It does this by looking at the filesystem representation of the GAC, since there appears to be no actual API in the current .NET environment.
Ridiculous Star-Wars Writing
The HintDlg class presents hints in preposterous Star-Wars style perspective scrolling text. It uses GraphicsPath.Warp to apply a pseudo-perspective transformation to the text. Annoying, but I felt it had to be done.
Bibliography
For PE/.NET file format information, I would suggest reading sections 21-24 of ECMA-335 Partition II, available all over the web. Inside Microsoft .NET IL Assembler is also a good book, despite the occasional inaccuracy.
If you want to go further and understand the actual CIL instructions in your assembly, Compiling for the .NET Common Language Runtime is an excellent book.
If you want to examine your binary files in comfort, may I humbly plug my own AXE program.