|
libMSL 1.0
|
  |
How MSL (and Windows PE) loader works?
MSL loader uses the same principles as Windows EXE/DLL loader works. If you want to know how exactly does it work, this page is
for you :)
The first thing that LoadLib needs to do is to open and load the whole file in memory. It loads the whole file, because I was
too lazy to optimize it somehow. I am pretty sure, that Windows PE loader doesn't need to load the file to memory at this moment.
Then, MSL loader checks if the supplied file is really a MSL file. This is done by checking the MSL_HEADER.Signature.
If the signature is valid, the loader continues his work. He now determines the ImageSize (size of the loaded file in memory) and
ImageBase (the base address at which the loaded file wishes to be placed).
The loader now calls VirtualAlloc function to commit a memory range from ImageBase to ImageBase+ImageSize. Sometimes happens that
the requested memory block is already occupied. This is not a big problem for the loader - it will just ask VirtualAlloc again to
commit a memory block with size of ImageSize and this time, it does not provide the ImageBase. If this happens, the image will be
placed on a different location than it was supposed to be placed - a relocation will be required.
Now is time to copy the actual data from MSL file to memory. The data is separated in so-called sections, exacly how it is separated
in PE files.
Every section has it's own MSL_SECTION_HEADER which tells the loader where is the section data stored in file, what is
the size of the section and where to place the section in memory. It is important to understand, that the structure of executable
is different when saved in a file and when mapped to memory. This is for performance reasons and if you want to know why does this
happen, look at a PE reference.
After every section has been placed to it's position, the import handling routine is being called. This routine fills the import
table with adresses of imported functions from imported DLL files. This process is called Dynamic Linking (that's why DLL's are
called DLL's - it is a short form of Dynamic Linked Library).
Now it's time to handle relocations. They need to be handled only if the image was not placed on it's ImageBase. The relocation
handling routine will repair all absolute offsets in the image, which are at this point invalid, because the image is not at it's
requested ImageBase. A nice example:
Let's have a single line of Assembler code:
mov eax, dword [$0400100]
Let's say that this code was supposed to be placed on ImageBase=$400000. This memory location was already occupied, so
loader placed it at $1000000. The line of assembler code is now invalid - it tryes to read data from an address which doesn't belong
him and can possibly cause an access violation (the better case), or read that data and work with it as if it was correct (can
have unforseeable consequences).
The relocation handling routine will now patch the line to this:
mov eax, dword [$1000100]
And the code works as it was supposed to work again.
Now is the file finally ready to run. MSL_HEADER.EntryPoint points to a place in memory (a RVA address, to be exact),
which contains the DllMain function. The DllMain function is called with it's ImageBase and
DLL_PROCESS_ATTACH as parameters. If it returns 1, the file is finally loaded and ready for
GetProcAddr calls (or whatever else).