Beginner's Guide to x86 Assembly and Debugging Apps
This guide will explain exactly what is necessary to begin cheat creation for generally any online computer game, including both fields to study, and tools to use.
Before this tutorial begins, it should be noted:
- I'll make great use of footnotes to fill in anything the reader may not understand.
- I'm going to assume the general audience is very technologically inept, especially pertaining to the forementioned fields.
- This tutorial concerns mostly Windows games - there's not much of a market for cheating on other platforms.
Fields of Study
When it comes to cheating in games, it will be heard that you must know either assembly, C++, or both, while in fact, neither are necessarily true. However, if you're going to work alone every step of the way, in almost every scenario, knowledge of Intel-syntax assembly will be necessary.
Assembly is considered the bottom of the barrel of programming languages - it's considered as low-level as you can go with a programming language. But, as all executables must utilize assembly one way or other, this is also why it is considered very powerful when attempting to learn what is done in a specific executable. For example, if one program encrypts certain types of files, and you need to learn how the encryption algorithm is done, then you would disassemble the program. From there, assuming you know assembly, you may be capable of understanding what the program does (More importantly, what that algorithm is, which would allow you to write a decryption algorithm).
Assembly uses hexadecimal numbers, so it should be understood the number system is organized as follows:
(The above shows numbers from base 16, the hexadecimal system, to base 10, the standard decimal system)Code:0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6, 7=7, 8=8, 9=9 A=10 B=11 C=12 D=13 E=14 F=15
Firstly, assembly is entirely about data manipulation (In general, that's all programming is - manipulating data, effecting hardware to do what you want). To be put simply, usually three things are being modified:
- The stack
- The memory of a program
Now, to explain what the above][*]The stack is a large stack of numbers, manipulated for handing off parameters to functions, storing the registers, and storing other miscellaneous data.[*]Registers are typically used for completing varying operations (Comparing data, arithmetic functions, logical operations, etc; these type of registers are dubbed «general purpose registers»). Usually, they'll store certain types of numbers/addresses, from as low as 4-bits, all the way up to 32-bits (It's possible to go higher than 32-bits, but, most users won't encounter situations where that will be necessary to know). Flags are used for marking registers for different purposes (e.g.: The overflow flag, or OF, will set itself to the number 1, from 0, if an operation using that register is larger than the space that the register can handle; so if you're using a 4-bit register to handle 32-bit data, the OF flag would be set to 1).[*]Varying data in the program is constantly being modified, as the stack and registers can handle only so much data at once, in many cases, it's more efficient to leave some data modification in the program itself (Though it should be noted, this is only done in memory; meaning, if you were to modify the program to display a random popup every 15 minutes while it was running, the moment the program were exited, when you re-open it later, the popup would no longer appear).[/list]
Modifying the stack is done through a number of ways, the most common being using PUSH and POP instructions.
In assembly, each line is an instruction, limited to at most three parameters, and as little as none.
The PUSH instruction accepts one parameter, which is added to the top of the stack. For example:
The above would push the value 5 onto the stack, so that it would look like this:
Now, it should be mentioned, usually a stack base pointer (Another type of register, which will be explained further later on) is pushed onto the stack, to act as a reference point for modifying the stack. Therefore, in the beginning of most functions/programs, you'll find the following line:
Which simply causes the stack to start looking like this:
From there, if I can push my data onto the stack:
Or, I can save one of my registers by using POP:
(NOTE: EAX is an example of a 32-bit register - a full list of available registers and what each one is used for will be covered later).
Assuming the value of EAX was 7C90FFDD, the stack will look like:
That covers standard modification of the stack - we'll cover more later, such as how functions access certain portions of the stack for parameters being handed off, etc.
There are many varying types of registers, but to explain the bare basics, we'll start with the general purpose registers. It's necessary to note, the following are all prefixed with the same letter to represent that they are extended registers (32-bit). Therefore, the 16-bit register for EAX is AX:
- EAX - Accumulator Register
EBX - Base Register
ECX - Counter Register (Used for looping)
EDX - Data Register (Used in multiplication and division)
ESI - Source (Used in memory operations)
EDI - Destination (Used in memory operations)
The above registers can also be accessed in different portions by their 16-bit and 8-bit equivalents; for EAX, as the 16-bit is AX, the 8-bit registers are AH and AL - therefore, for (E)BX the 4-bit registers are BH and BL, etc. When referencing pointers, it may be important to keep in mind the different registers.
Modifying registers is essential for loading data from/to the stack or from/to data in the program memory. The most used instruction for loading data into a register is the MOV instruction.
To load what's stored at the address 01009000 into register EAX:
MOV EAX, DWORD PTR DS:
One new thing was introduced on top of the MOV instruction and the EAX register: DWORD PTR DS]
DWORD is a 32-bit value. PTR stands for «pointer», meaning that the data at address 01009000 is being loaded, not the number 01009000. DS stands for «data segment», meaning the loaded value is from the .data section.
To expand, there are four «segment registers», pointing to the segments in the executable:
- CS - Code Segment (References anything in the .code section)
DS - Data Segment (References anything in the .data section)
SS - Stack Segment (References the stack)
ES - Extra Segment (Rarely used)
There are also three pointer registers (One of them earlier was already referenced, EBP):
- EBP - Base Pointer
ESP - Stack Pointer (Offset to the EBP - «points» to the EBP)
EIP - Instruction Pointer (Points to the address of the next instruction)
Now, apart from the MOV instruction, there is also the LEA instruction. The LEA instruction (Load Effective Address) is slightly slower, and ends with slightly larger code. It's used in preparing the loading of pointers into registers, allowing even math operations to be used (NOTE: Where as MOV can load data into memory, LEA is limited to only modifying registers).
The use is identical to MOV:
LEA EAX, DWORD PTR SS:[EBP-4]
Note the use of the stack being referenced - [EBP-4] means to go to the stack pointer and access the line directly above it.
A better example of LEA would be:
LEA EAX, [EAX+EBX*4+256]
Note the use of multiplication via the asterisk, and even addition between registers.
Now, onto the easy math operations:
- ADD destination, source - Adds the «destination» and «source», leaving the result on the «destination»
SUB destination, source - Subtracts the «destination» and «source», leaving the result on the «destination»
SAL destination, source - Shifts the destination to the left source times (e.g.: 15 shifted once to the left would turn into 5, but shifting once to the right, and the number would still be 5).
SAR destination, source - Shifts the destination to the right source times (e.g.: 15 shifted once to the left would turn into 1, but shifting once to the left, and the number would be 10).
INC destination - Increment the destination (Add one to the given value)
DEC destination - Decrement the destination (Subtract one to the given value)
The final important factor in the basics of assembly are conditional statements (If condition then statement, if not condition then statement, etc) and looping.
For comparing data, the CMP instruction is used:
CMP EAX, 1
Now, the comparison has to end up somewhere, and the possible outcomes are different types of jumps. If EAX is greater than (Or equal to), less than (Or equal to), and equal to (Or not) the number 1, then a jump to a specific address is made. If not, nothing is done.
CMP EAX, 1
- jge -Jump if they're greater or equal ; This will not work on negative registers
jg - Jump if they're greater than ; Neither will this
jle -Jump if they're less or equal ; ..this..
jl - Jump if they're less ; ...Or this
jne - Jump if they're not equal ; This conditional jump and all the following will work with both negative and positive numbers alike
je - Jump if they're equal
jne - Jump if they're not equal
jae - Jump if they're above/greater than or equal
ja - Jump if they're above/greater than
jbe - Jump if they're below/less than or equal
jb - Jump if they're below/less than The other operation for comparing two numbers is the TEST instruction, which is identical to an AND, but rather than storing the result, the next instructions will check if the result of the AND was zero or one.
Code]JZ - Jump if the result was zero
JNZ - Jump if the result was not zero (Meaning it was one)
Assume EAX is 00000001
TEST EAX, 1
Since the value of EAX is 1 and the comparison value is 1, the jump will not occur.
Now, these tactics can also be used to repeat steps, for example:
0100739D MOV EAX,0
010073A2 CMP EAX,5
010073A5 JE 010073B1
010073AB INC EAX
010073AC JMP 00401000
The EAX register is set to zero, then EAX is compared to 5 - if EAX has the value 5, it jumps to the RETN instruction, to exit the function. Otherwise, the executing continues, and INC EAX is called, to add 1 to EAX repeatedly, until eventually, EAX is 5, and will jump to the RETN.
And that's the basics of assembly.
At this point, your skills of assembly can be put to the test. I recommend using the following applications
- IDA Pro
Understanding how to use debuggers is key to the creation of game cheats. Once you have mastered the basics of understanding what is being done in an executable, through a debugger, you'll be ready to start understanding how cheats can be made on poorly protected games (Protected meaning games with no anti-cheat, no anti-debugger techniques, etc). After the segment on using a debugger, the next step is working around the protection mechanisms put in place to prevent debugging, and at the core of it all, cheating.
The above is a picture of OllyDBG loaded with Notepad. If you notice the «C» button with the cyan background, between the «H» and «/» buttons, that's the «CPU» section. And to explain what is in the CPU section:
- 1) This is the disassembled output - anything look familar?
2) This is the registers window - what is loaded into each register will be updated with this window
3) This is the current stack of the program
4) This is the assembled input of the program, or the «dump» of the program. You'll notice the ASCII column resembles what the program may look like if you were to open the program in a word editor.
A debugger allows you to manipulate how the executable is ran - you can modify the registers by double clicking on the value to the right of each register. You can modify the stack by right clicking in the stack window and PUSH/POP'ing values, or right clicking on a specific value and selecting «Edit» or «Modify». At this point, you can watch as Notepad is initialized by stepping (Executing instructions one at a time) through it (Select menu --> «Step into» or «Step over»).the «Debug»
There are many other features of this particular debugger -
you can view the sections of the program by clicking on the cyan «M» (Memory) button, which will bring up a list of all the varying sections (Some that haven't been explained yet, such as the .text and .rsrc sections).
The status of each window can be viewed by clicking on the cyan «W» button. Open file handles can be seen by clicking on the cyan «H» button.
Threads window, seen by clicking the «T» cyan button.
The last window of importance would be the software breakpoints window.
This next part of debugging is done using the version of Notepad released with Windows XP (Home/Professional). If you're using a new version of Windows, such as Windows 7, or even a newer release, where Notepad was either removed or dramatically changed, then you may just have to read through, following without physically using OllyDBG.
Now that you understand the importance of the varying explained windows, you can start debugging. Launch OLLYDBG.EXE, and if you receive a popup relating to «PSAPI.DLL» being outdated, I recommend selecting the «No» option. Click on the «File» menu, then «Open», and enter «%systemroot%notepad.exe» in the «File name:» text area. Click the «Open» button.
To test out using the debugger, I recommend you do a «Step Into» or «Step Over» by navigating to the «Debug» menu (Or simply press F7 for «Step Into» and F8 for «Step Over»). If you notice, the stack window changed - now the value «70» is on top of the stack. If you step into/over again, a new value is on the stack now.
Now, you can test out setting a breakpoint. You can manually jump to the address that is about to be called by the «CALL 01007568» instruction, by pressing the box directly to the left of the «L» cyan button, and entering the address. Or, if the grey background is highlighted over the CALL instruction (As it should be, if you've stepped into/over twice), you can simply press enter.
If you did either of the two suggested, you should end up at something that looks similar to this:
If so, then you've followed this guidance correctly (If not, you can reload the instance of Notepad but clicking the gray box with two arrows pointing to the left, which is the box to the right of the «Open» box). Now, you can set a breakpoint by right clicking on the prementioned instruction, navigating to «Breakpoint» and selecting «Toggle» (Or click your F2 key while the gray line is over the prementioned opcode). You can step into/over again, or attempt to execute the program by clicking the blue play button, fourth to the right of the first «Open» button (The F9 key can also be clicked to accomplish the same). Execution will pause at the this instruction due to the breakpoint - if you attempt to execute again, Notepad will be running, and the CPU window will no longer be up-to-date, due to the startup being completely done.
From here, you can try pause the execution by clicking on the «pause» button, directly to the right of the play button, which will land you at a «RETN» line, below a «SYSENTER» instruction. Setting a breakpoint on a call expected to be used can cause the program to pause in the CPU section again, giving you direct control over the flow of the program. For example, if you go to the «ExitProcess» function (Click the button directly left of the cyan «L» button and type in «ExitProcess») and set a breakpoint here, then when you run the program and attempt to exit, the window will disappear, but execution will pause at this function. This is an example of one way you can gain control over a program.
Another commonly checked area is the strings in an executable. Right click on the disassembled area, select «Search For» then select «All Referenced Text Strings». If you scroll down toward the end of the newly opened list, in the References window (Which can be opened by the «R» cyan button, for future reference), you may see something such as:
Text strings referenced in notepad:.text, item 248
In Notepad, this is a list of functions that are being imported. Some executable will list other strings of interest. For example, if you're attempting to modify the attributes of a weapon in a loaded game, the weapon name may be listed in the strings window. You can check where that string is referenced (If it's referenced in a MOV/LEA or a PUSH, odds are, it's being used as a parameter for a function), set a breakpoint, then run the game again. Then the first time where the name of that weapon is used as a parameter is where the executable will be paused, which may lead you to functions you will be interested in.
One more instruction not mentioned in the assembly portion is «NOP» or «no-operation». While that isn't an actual Intel instruction, the actual opcode for NOP is «XCHG EAX, EAX» - many debuggers convert the line «XCHG EAX, EAX» to NOP. If you want to remove a line in a program while debugging it, you have to «NOP» it out - replace the bytes that line takes with nothing but NOP's, until the line is full. If you want to replace a line that takes up 4 bytes, and the replacement is only 2 bytes, you'll have to use NOP instructions to fill up the remaining space.
Lastly, to modify an instruction in OllyDBG, double click on the instruction in the CPU window, and replace to your heart desires.
Debugging is a very tricky game, filled with a fair bit of guessing and checking. Gradually, as you become more comfortable with your debugging environment, you'll become better, and eventually, you'll be very comfortable in navigating through executables.
IDA is an extremely powerful environment tool for analyzing executables, and with that power, comes complexity. I recommend it be used by you as your experience grows, but there is too much to be said about how to utilize all the capabilities of it in this single guide.
Assuming you find yourself fully capably of working with executables, the next segment in the guide is going to cover protection schemes used to prevent debugging.
One very commonly used call to detect debuggers is the «IsDebuggerPresent» call. For example:
If a debugger is not being ran for the program calling IsDebuggerPresent, 0 is the value that ends up being given back (Or «returned») - otherwise, anything not equal to 0 is returned.
To bypass checks of these sort, navigate to the «Plugins» window, then «OllyAdvanced», and select «Anti-Debug 2». Check the «IsDebuggerPresent» box, and hit «Ok».
But, of course, there are many other anti-debugger features: many executables are made to exploit bugs in OllyDBG to make it crash if the program is loaded. I recommend opening the OllyAdvanced window and checking the three following bugfixes:
Kill %s%s bug (full fix in string-routine)
Kill NumOfRva Bug
Kill little Analysis-Crash-Bug
Of course, these aren't the only things that are used to protect an application. One common scenario you may run in is when you see this popup dialog:
«Quick statistical test of module «notepad-» reports that its code section is either compressed, encrypted, or contains large amount of embedded data. Results of code analysis can be very unreliable or simply wrong. Do you want to continue analysis?»
This usually means the executable was packed, as means to prevent debugging, and analysis of what's in the executable. There are a number of methods used to unpack executables, but in most cases, if you let the program run, it will unpack itself entirely into memory, allowing you to pause the program, then navigate your way through, using previously mentioned tactics.
Once you identify what an executable is packed with, I would recommend Googling tutorials on how to unpack that specific executable. For example:
- unpack [packer] tutorial
- Replacing [packer] with the packer PEiD identified.
Of course, what if the executable you're working with isn't packed or protected in any way at all, but it still won't run in OllyDBG? Another possible scenario is that the application was made with the .NET framework, which OllyDBG is not capable of working with. If that's the case, I recommend downloading the free .NET decompiler tool, «Reflector».
The next segment is a short one, covering resources. Under Windows, there's a method of adding pictures, sound, executables, and all other types of files to an executable, by adding them to the .rsrc section of an executable. Sometimes, some protection schemes will consist of adding the original program as a resource to a new program, then having the new program load the original from the .rsrc area. The great news about resources, is that they're viewable by anyone.
If you open an executable in CFF Explorer, and select the option on the left toolbar «Resource Editor», a full list of everything attached to the executable is returned. At the least, there is usually an «Icons» folder. But, with CFF Explorer, if you right click on any of the files or folders, it is shown you can remove, replace, add, and even save resources, essentially extracting them from the executable.
And that's all there is to using resources.
The final portion of the guide, is working against anti-cheat mechanisms, where many common methods are discussed, such as the commonly known «DLL injection».
To start, many anti-cheat engines are known for being very aggressive. They hide themselves from the process list, keep track of any newly made processes, etc.
Usually, to gain control over an executable, a call to the WinAPI function «OpenProcess» is made. OpenProcess is almost always hooked to prevent touching of a process. However, even if OpenProcess were not hooked, the process list table has to be repaired so that you can find the proper process ID, so OpenProcess would know which process to open.
It's considered very complex attempting to write a bypass for such aggressive anti-cheat systems. Rather, if you're dealing with an anti-cheat such as GameGuard, DLL injection is used to make changes to the executable before GameGuard loads.
However, it isn't always that easy. GameGuard will constantly check the .code section so you cannot modify that section. Rather, DLL injection involves allocating space, then adding code to that new area, which GameGuard will not check. Usually, this new area will make changes in the .data section - if the HP of a character is stored in a particular spot in the .data section, then one cheat may modify the HP, setting it to the maximum possible value every millisecond, to imitate the «god mode» cheat.
Yet, some games will do another check - they'll check if there is execution occurring outside the .code section by checking the last called function. Others may attempt to do entire checks on the executable in memory, etc.
But, if that's all there is to it, how do you know where HP is stored? Or how do you know what is possible with that anti-cheat in place? As stated earlier, debugging is a game of guess and check. One great place to check is the community - every now and then, some communities may list the offset where sensitive data, such as the HP of a character, can be found.
 - Hooking a function involves intercepting data from a hooked function, and usually acting upon that action. There are a few varying methods of hooks, but usually, you'll find more aggressive anti-cheat systems overwrite the first few bytes of a function to intercept any potentially dangerous calls to the game being protected.
 - There are 8 bits in a byte - bytes are a measurement of data. In the .code section of a program, there's a certain number of bytes for each instruction. To see what opcodes do what, check out ProView's x86 Disassembler.
 - The .code section is where instructions are executed. In a standard portable executable (P.E.; The most common type of available executable for Windows NT 5.X), there's a .data section (For all data - pictures, video, text, variables, etc), a .code section,
 - Instructions are commands to be executed by the CPU of a workstation. As you study assembly more, you'll learn more about the varying type of available instructions. It should be noted, instructions are also called «operation codes», or «opcodes».
 - Windows NT 5.X is the kernel name for Windows XP (The X representing the exact version number).
 - In math, variables are numbers represented by characters. In programming, variables are types of data represented by a space in memory
 - In math, there are varying types of numbers: Integers, natural numbers, rational, irrational, etc. In programming, there are varying data types, for strings of text, integers, binary data (e.g.: Usually used for pictures, video, encrypted data, etc), ...
 - This does not include programs made under the .NET framework, for a new section is added (CLR), which is for matters outside the extent to which this guide reaches.
 - In math, functions are formulas used to manipulate numbers as needed. Usually, you'll plugin a number or two (Or more) to represent variables to be used in the function. In math, the numbers being handed off as variables are referred to as «parameters». In programming, it's the same, though you're not limited to just numbers for the parameters, and not all functions need to be given parameters. For example, there's a function, exit, used to usually shutdown a user-mode program. Exit, in C/C++ takes one parameter in the syntax of, «exit ( int )», where int stands for «integer».
 - C and C++ alike are two also low-level programming languages, which are usually used in the creation of game cheats, as the syntax is considered easier to understand, and the compilers used for C(++) are known to create more optimized programs/libraries than hand-written assembly code.
 - C++ is an extension of C, carrying many features (Primarily, it is Object-Oriented) making it more commonly used than C in most current-day projects.
 - If you use an object-oriented language, you'll learn later what it is, and the importance of it. For now, it's not necessary.
 - Compilers are a tool used with linkers to create programs; generally, it is the compiler's responsibility to handle code generation and optimization. The result output of a compiler is an object, which from there, is handled by a linker to create a program.
 - Optimized code is usually written to perform fastest on the CPU, by taking up less cycles (Via either using less instructions, or using instructions that use less cycles). Usually, optimized code may end up taking more space on a hard drive than unoptimized code (More optimized programs will usually use more instructions that use less cycles to complete a simple task.
 - Libraries are used to add functioniality to programs - for example, there may be a codec library for playing an MP3 file, which is used by a media player for communicating with MP3 files. Without the executable, the library file is just data, without the library, the executable will fail to play the MP3 file.
 - Linkers, to be put simply, take objects and library files, and «links» them to create a single executable.
 - CPU cycles are the measurement of a computer's speed
- for example, a 2.0GHZ CPU is capability of completing 2 billion clock cycles per a second (Coming out to 2 clock cycles per a nano second).
 - Logical operations are used for further data manipulation. The AND operator (Represented by the ampersand symbol) will check that two pieces of data are true (If they both are true, the return value is true - otherwise, the return value is false). Then the OR operator (Represented by a pipe symbol), states that if both pieces of data are not both false, the return value is true (Otherwise, it is false). The XOR operator (Represented usually by a carrot symbol) ensures two peices of data are different (If they are both true or both false, the return value is false - otherwise, if one is true and the other false, the return value is true).
 - Assembly references to spaces in an application by addresses, which are catalogued by how many bytes are used per an instruction, or per a piece of data; for example:
The following instruction takes up 2 bytes:
MOV EAX, EAX
If that piece of data is stored at 010070D8, then the next instruction would be stored at 010070DA. It should also be noted, addresses are usually 32-bits (If you're in such a rare situation where you're working with a 64-bit program, then addresses will go up to 64-bits).
Addresses are also referred to as «offsets».
 - Looping is a method of repeating a certain number of instructions for a specific amount of runs - this can range from zero loops to infinite (Infinite usually implies the loop will continue until the program shuts down).
 - The RETN instruction is used to return execution to the main code. For example, if a function is called using the CALL instruction,
 - The CALL instruction is for calling functions in the executable. For example, if you want to call the exit function, which accepts one parameter, it would be called like so:
CALL Exit ; Assume Exit stands for the location of the Exit address
But, it should be noted, parameters must be placed in reverse order. Therefore, if you're calling a function, «Divide» that takes two parameters, the first being the dividend, the second being the divsor, and you're attempting to divide 100 by 10, then the following would be the corresponding code:
 - Platforms generally including other operating systems, apart from all versions of Windows, counting from Windows 2000 and later (XP, Vista, 7, etc). Examples of some platforms would be Mac OS X, all distributions of Linux (e.g. Debian, ArchLinux, Gentoo, Red Hat/CentOS), all distributions of UNIX (e.g. FreeBSD, OpenBSD, [open]Solaris, NetBSD), etc.
 - Low-level languages are, in simple terms, very closely related to the computers hardware. In the case of assembly, it's considered to be as low as one can go on a Windows platform.
 - An algorithm is a certain number of steps used to process data. It may be used to encrypt/decrypt data, to organize data in files (e.g.: In a file with a list of random words, an algorithm could be used to identify words longer than 10 characters, then place them into a special location for later access), etc.
 - Disassembly is the process of «un-assembling» a program. For now, all that needs to be known, is that disassembly is always possible when dealing with programs (Executables, libraries [For Windows, files ending with the .DLL extension are one type of library], etc).
 - This just means any basic math function - adding, subtracting, multiplying, etc.
 - Pointers are a reference to data in memory (Pointers hold offsets to data)
 - When a file is opened, a unique identifier must be marked, so when you decide to read or write from that specific file, the machine knows which opened file you're talking about. This identifier is known as a «file handle». Also, communication to hardware and kernel drivers
 - Kernel drivers are an interface to the kernel.
 - The kernel is the last component to bridge hardware to software. Whenever software needs to manipulate the hardware, the kernel is involved.
 - Threads are a method of executing multiple instructions at the same time. For example, if are playing a computer game that has to keep track of multiple users playing at once, including yourself, all the active windows it has open, etc, then you need threads - otherwise, only one thing will be done at a time (You move, then one other player, then one other, etc).
 - Software breakpoints are a method of stopping execution once a certain instruction is reached. For example, if I put a software breakpoint at the beginning of the call to the Exit function, as soon as program being debugged attempted to call Exit, the debugger would pause program execution at that point.
 - Allocating space means reserving parts of memory for a program, to be used for a particular purpose.