Reverse Engineering Software with Assembly Language

3 minute read

WHY REVERSE ENGINEER?Permalink

So you have some program/software and you don’t know exactly what it does, maybe the code is proprietary thus you cant read the code and figure it out, you want to inspect a piece of malicious software/virus. Here is why you might want to do some RE:

  • To achieve Interoperability Make some system work with software or hardware you already have

  • To figure out how it works

  • Keygen/Cracks

  • Exploit Development

  • Proprietary File Formats

AssumptionsPermalink

Before Proceeding It is important that the reader have some knowledge of the following concepts:

  1. Endianness

  2. Data structures

  3. Hexadecimal notation

  4. Intel Architecture

Assembly LanguagePermalink

A very low level programming language generated by compilers and translatable to machine code directly. Offers more control but less abstraction and would require alot of typing.

The goal is not to write programs in assembly code though i have written some basic bootloaders with assembly you can check out the project here but to be able to read and understand disassembled code from a piece of software you want to reverse engineer.

The Stack Data StructurePermalink

  • LIFO data structure
push ebp
mov esp,ebp
sub esp,0x08
mov eax,45
mov ebx,43
add eax,ebx
call sym_add
ret

The HeapPermalink

Associated with Dynamic memory allocation

The BSS SectionPermalink

Contains all Uninitialized variables

The Text SectionPermalink

Contains the actual executable instructions( code )

The RegistersPermalink

General PurposePermalink

  1. EAX return values
  2. EBX Base register for memory access
  3. ECX Loop Counter
  4. EDX Data Register for I/O

Segment RegistersPermalink

Usually are named with Two letter abbreviations

  1. CS Stores code segment

  2. DS Stores Data segment
  3. ES,FS,GS Far addressing (video mem etc)
  4. SS Stack segment usually same as EDX

Indexes and PointersPermalink

  1. EDI Destination index register for array ops
  2. ESI Source index register array ops
  3. EBP Base Pointer bottom of stack frame
  4. ESP Stack Pointer top of stack frame
  5. EIP Instruction Pointer to next instruction to be executed

The E prefix is for 32-bit, 16-bit and 8-bit are without the E prefix and finally for 64-bit the prefix is R instead of E however, forward compatibility is maintained

Flags RegisterPermalink

Holds 32 registers in total One bit values

  1. ZF Zero Flag Set to 1 if result of previous op is 0
  2. SF Sign Flag Set to 1 if result of previous op is negative -

Calling ConventionsPermalink

CDECLPermalink

Arguments are passed on the stack in Right-to-Left order

Return Values are passed to EAX

The Calling Function cleans the stack

Allowing for Variadic functions as caller knows the number of arguments

STDCALL (AKA WINAPI)Permalink

Arguments are passed on the stack in Right-to-Left order

Return Values are passed to EAX

The Called Function cleans the stack

FASTCALLPermalink

The first 2 or 3 (32-bit or smaller ) arguments are passed directly in registers with the most commonly used registers being EDX, EAX, and ECX .

The Calling Function (usually) cleans the stack

THISCALL (C++)Permalink

Only Non-Static Member Functions. Also Non-Variadic

The Pointer to the class object is passed in ECX, and return value is passed to EAX.

The Called Function cleans the Stack

OPERAND TYPESPermalink

  • Immediates :03xf

  • Registers :EAX,.…ECX the values themselves

  • Memory adrresses [0x80542a], [eax]

  • Offset Types by bytes [eax + 0x4]

  • Sibs which are offsets by multiplication and addition [ eax * 4 + ecx ] , [eax * 2 + ecx]

OPSPermalink

  • mov Move destination, source reg,mem,immediate any combination

  • add,sub addition and subtraction

  • cmp compare subtract source from destination and assign a flag if ZF is 1 the destination and source are equivalent

  • test test does bitwise and of source and destination and assigns a flag to ZF or SF depending on the result

  • jcc/jmp conditional and regular jumps jz/jnz if ZF is zero or not jump ja/jae jump above and jump above equal jb/jbe/bjnb jump below and jump below equal

  • push/pop one operand and operate on stack

  • bitwise ops : and, or , xor, not

Recognizing Programming ConstructsPermalink

Function Prologue and Epilogue

push ebp
mov ebp, esp
sub esp, N

...
mov esp,ebp
pop ebp
ret

About CALL and RETPermalink

Have an Implicit Operation

Call will push EIP onto the Stack while Return will pop the EIP pointer from the stack

LOOPSPermalink

ECX is usually loop counter conditional jumps based on loop counter easier to spot in call graphs

SWITCH STATEMENTSPermalink

jmp dwords endian formatted mem addressess with controlled offsets

Winding UpPermalink

Hopefully now you are familiar with some basic assembly language syntax and are now able to identify common programming structures like loops, switches, functions etc from some disassembled binary code.

In the future we will look at how to get this assembly code from compiled binaries as well as how to do some practical reverse engineering.