This article will cover such a trivial (as it may seem) subject as DLL injection. For some reason, most of the tutorials on the web only give us a brief coverage of the topic, mostly limited to invocation of LoadLibraryA/W Windows API function in the address space of another process. While this is not bad at all, it gives us the least flexible solution. Meaning that all the logic MUST be hardcoded in the DLL we want to inject. On the other hand, we may incorporate all the configuration management (loading config files, parsing thereof, etc) into our DLL. This is better, but still fills it with code which is only going to run once.
Let us try another approach. What we are going to do, is write a loader (an executable what will inject our DLL into another process) and a small DLL, which will be injected. For simplicity, the loader will also create the target process. Being a Linux user, I used Flat Assembler and mingw32 for this task, but you may adjust the code for whatever environment you prefer.
A short remark for nerds before we start. The code in this article does not contain any security checks (e.g. checking correctness of the value returned by specific function) unless it is needed as an example. If you decide to try this code, you'll be doing this at your own risk.
So, let the fun begin.
Creation of target process
Let's assume, that the loader has already passed the phase of loading and parsing configuration files and is ready to start the actual job.
Windows provides us with all the tools we need to start a process. There are more then one way of doing that, but let us use the simplest and use CreateProcess API function. Its declaration looks quite frightening, but we'll make it as easy as possible:
We only have to specify half of the parameters when calling this function and set all the rest to NULL. This function has two variants CreateProcessA and CreateProcessW as ASCII and Unicode versions respectively. We are going to stick with ASCII all way long, so, our code would look like this (due to the fact that "CreateProcess" is rather a macro then function name, we should explicitly specify A version as some compilers tend to default to W versions):
BOOL WINAPI CreateProcess(
__in_opt LPCTSTR lpApplicationName,
__inout_opt LPTSTR lpCommandLine,
__in_opt LPSECURITY_ATTRIBUTES lpProcessAttributes,
__in_opt LPSECURITY_ATTRIBUTES lpThreadAttributes,
__in BOOL bInheritHandles,
__in DWORD dwCreationFlags,
__in_opt LPVOID lpEnvironment,
__in_opt LPCTSTR lpCurrentDirectory,
__in LPSTARTUPINFO lpStartupInfo,
__out LPPROCESS_INFORMATION lpProcessInformation
Don't forget to set the cb field of startupInfo to (DWORD)sizeof(STARTUPINFO), otherwise it would not work.
CreateProcessA(nameOfTheFile, NULL, NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &startupInfo, &processInformation);
If the function succeeds, we get all the information about the process (handles and IDs) in the processInformation structure, which has the following prototype:
By now, the process has been created, but it is suspended. Meaning that it has not started its execution yet and will not until we call ResumeThread(processInformation.dwThreadId) telling the operating system to resume the main thread of the process, but this is going to be the last action performed by our loader.
typedef struct _PROCESS_INFORMATION
HANDLE hProcess; //Handle to the process
HANDLE hThread; //Handle to the main thread of the process
DWORD dwProcessId; //ID of the new process
DWORD dwThreadId; //ID of the main thread of the process
One may call it a shellcode, but it has nothing to do with the viral payload or any other malicious intent (unless, someone would say that breaking into address space of another process is malicious by definition). It is the code, that we are going to inject into the target process. It, theoretically, may be written in any language as long as it may be position independent and compiled into native instructions (in our case x86 instructions), but I prefer to do such things in Assembly language.
It is always a good idea, to think of what your code is intended to do before writing a single line of it, in this case it is a golden idea. The code needs to be small, preferably fast and stable as it is a bit of a headache to debug once it has been injected.
There are two basic tasks that you would want to assign to this code:
- Load our DLL
- Call the initialization procedure exported by our dLL
and one unavoidable condition - it has to be a function declared as ThreadProc callback, due to the fact that we are going to use the CreateRemoteThread function in order to launch it. The prototype of a ThreadProc callback function looks like this:
which means that it has to return a value of type DWORD (which is actually unsigned int). It accepts one parameter, which may either be an actual value (but you have to cast it to LPVOID type) or a pointer to an array of parameters. One more thing about this function (the last but not the least!) it is an stdcall function - WINAPI macro is defined as __declspec(stdcall). This means that our function has to take care of cleaning the stack before return. In our case it is quite easy, simply use ret 0x04 (assuming that size of LPVOID is 4 bytes).
DWORD WINAPI ThreadProc( __in LPVOID lpParameter);
Another important thing to mention - you will, obviously need to know how many bytes your function occupies in order to correctly allocate memory in the address space of the target process and move your code there. In addition to allocation of one block of executable memory for our function, you will also need to allocate one block for data - configuration settings to be passed to the injected DLL. It is easy to pass the address of the parameters as an argument to our ThreadProc.
The skeleton of the function would look like this:
The last line gives us the exact size of the function in bytes. The following is the source file template:
mov ebp, esp
sub esp, as_much_space_as_you_need_for_variables
mov esp, ebp
lancet_size = $-lancet
So, what are we going to insert into the "function body"? First of all, as our code, once it is injected, has no idea of where in the memory it is, we should save our "base address" and calculate all the offsets relative to that address. This is done in a simple manner. We call the next address and pop the return address into our local variable.
format MS COFF ;as we are going to link this file with our loader
public lancet as '_lancet'
section '.text' readable executable
;our function goes here
;followed by data
loadLibraryA db 'LoadLibraryA',0
init db 'name_of_the_initialization_function',0
ourDll db 'name_of_our_dll',0
kernel32 db 'kernel32.dll',0
lancet_size = $-lancet
public lsize as '_lancet_size'
section '.data' readable writeable
lsize dd lancet_size
that's it. Now the variable at [ebp-4] contains our "base address". Each time we want to call another function or access our data (strings with names, remember?) we should do the following:
pop dword [ebp-4]
sub dword [ebp-4], @b-lancet
The code above is an equivalent of LoadLibraryA("name_of_our_dll") .
mov ebx, [ebp-4]
add ebx, ourDll-lancet
mov ebx, [ebp-8] ;assume that we stored the address of LoadLibraryA at [ebp-8]
call dword ebx
Now about the execution itself. Although, we now know where we are, we have no idea of what the address of LoadLibraryA is. There are, at least, two ways to get that address nicely. First has been described in my "Stealth Import of Windows API" article. The second is also interesting - PEB. Yes, we are going to access the Process Environment Block, find the LDR_MODULE structure which refers to KERNEL32.DLL and get its base address (which is also a handle to the library). Some may say that this way is not reliable, not stable and even dangerous, but I will say, that statements like these are not serious. We are not going to change anything in those structures. We are only going to parse them.
How do we find the PEB? This is quite simple. It is located at [FS:0x30]. Once we have it, we are on our way to PEB_LDR_DATA address, which is at PEB+0x0C. In order to parse the PEB_LDR_DATA structure, we should declare the following in our Assembly code:
I leave the implementation of the module list parsing function up to you. You just have to keep in mind that the string you are going to check are represented by the UNICODE_STRING structure (described in the article referenced above). Another thing to remember, is that it is better to implement case insensitive string comparison function.
.flink dd ? ;pointer to next list_entry structure
.blink dd ? ;pointer to previous list_entry structure
.length dd ?
.initialized db ?
.ssHandle dd ?
.inLoadOrderModuleList list_entry ;we are going to use this list
.inLoadOrderModuleList list_entry ;pointers to previous and next modules in list
.baseAddress dd ? ;This is what we need!
.entryPoint dd ?
.sizeOfImage dd ?
.fullDllName unicode_string ;full path to the module file
.baseDllName unicode_string ;name of the module file
.flags dd ?
.loadCount dw ?
.tlsIndex dw ?
.timeDateStamp dd ?
Once you find the LDR_MODULE wich baseDllName is "kernel32.dll" you have its handle (simply in the baseAddress field). You may use the _get_proc_address function from the same article (mentioned above) in order to get the address of the LoadLibraryA function. Having that address, you are ready to load your DLL (do the actual injection). Personal suggestion - do not put lots of code into the DllMain function.
LoadLibraryA returns a handle to the newly loaded DLL, which you can use in order to locate you initialization function (remember it has to be exported by your DLL and preferably use the stdcall convention). After you _get_proc_address of your initialization function, call it and pass the address of the data block as a parameter (it was passed to our lancet function as a parameter on stack):
That's it. Your code may now return. The DLL has been injected and initialized.
push dword [ebp+8] ;parameter passed to lancet is here
call dword [ebp-12] ;assume that you stored the address of the initialization
somehow, we have missed the exciting process of injection of our lancet code. Don't worry, I have not forgotten about it.
As I have mentioned above, we have to allocate two blocks - for code and data. This can be done by calling the VirtualAllocEx function, which allows memory allocations in the address space of another process.
LPVOID WINAPI VirtualAllocEx(
__in HANDLE hProcess,
__in_opt LPVOID lpAddress,
__in SIZE_T dwSize,
__in DWORD flAllocationType,
__in DWORD flProtect
Use MEM_COMMIT as flAllocationType and PAGE_EXECUTE_READWRITE and PAGE_READWRITE for allocation of code and data block respectively. This function returns the address of allocated block in the address space of the specified process or NULL.
The WriteProcessMemory API function is used to copy your code and data into the address space of the target process.
Once you have copied both the data and the code, you will want to call your thread function. The only way to call a function which resides in the memory of another process is by calling the CreateRemoteThread API.
BOOL WINAPI WriteProcessMemory(
__in HANDLE hProcess,
__in LPVOID lpBaseAddress,
__in LPCVOID lpBuffer,
__in SIZE_T nSize,
This function returns a handle to the remote thread, which, in turn, may be passed to the WaiForSingleObject API function, so that we can get notification on its return.
HANDLE WINAPI CreateRemoteThread(
__in HANDLE hProcess, //the handle to our process
__in LPSECURITY_ATTRIBUTES lpThreadAttributes, //may be NULL
__in SIZE_T dwStackSize, //may be 0
__in LPTHREAD_START_ROUTINE, //the address of our code block
__in LPVOID lpParameter, //the address of our data block
__in DWORD dwCreationFlags, //may be 0
__out LPDWORD lpThreadId //may be NULL