Saturday, February 19 2005 @ 04:15 PM CET Contributed by: zeph Views: 15255
Tuesday, July 15, 2003, 2:17:48 PM
A little guide for wannabe cracker/reverser.
Hello wannabe cracker/reverser.
if you read this text, i assumed you're zero knowledge in reversing/cracking.
but i dont guarantee you'll become a reverser after reading this text.
btw, in reversing, there is no quickstart to be a good reverser.
but, this guide tends to be a quickstart for you if really want to be a reverser.
you must ready to take a lot pains and great pleasure too (after you find the true path).
there are two term in this lore, reversing and cracking. but it's not same.
not all crackers are reversers but all reversers are crackers.
First at all, i want to warn you before started to read this guide. IMHO, to be a
reverser, its not a good way if you going deep directly to the subject without any
programming language knowledge especially the higher level language like C, C++,
Pascal or even BASIC. It is because most of our target are not written in assembly,
thus the perfect programming language that need to be learnt before reversing
is C. The C programming language is slightly higher level than the
computer core language, assembly language (translated machine language).
Fortunately, C is just a small programming language and not too hard to be learnt.
I guess it take just in 21 days :P. Go learn C and then come back again. :)
I am by myself, not an expert in this subject. Be honest, i am just slightly
above from beginner level. So, we could assume this guide is 'from a beginner to beginners'.
Therefore, i could understand what are the problems of reversing in beginning days.
I hope the readers could understand easily this text.
The basic of reversing or some people call it 'reverse code engineering' is
investigation of compiled binary computer codes or machine codes.
in the early of computer era, computer programs are written by 'hand'. no compiler
exist in the days. the programmer must learn & know the internal languague of any computer
they want to program. the process is very pain and very buggy.
so they though about a compiler to translate human language to computer language.
And today, most computer programs are compiled or at least assembled. thus, the
task will be taken by reverser is to study the compiled or assembled (both in binary)
by disassembling the binary code using a disassembler.
the binary is in 0 and 1 digit maybe look like 100100100101010010101010010100001100111001
its very hard to be read so hexadecimal number being used for easier reading.
hexa number base on 16 digit, 0,1,2,3,4,....,9,A,B,C,D,E,F
the code above is converted to hexa, because its more readable than binary number
and below is the mnemonix of the intel processor's opcodes.
CMP DWORD PTR SS:[EBP+C],110 ; 81h is opcode for mnemonix cmp (compare)
JNZ 00002A ; 75h is opcode for mnemonix jnz (jump not zero)
PUSH 0C8 ; 68h is opcode for mnemonix push (push)
* opcodes are computer instruction in binary digit(bit).
** xxh - h mean in hexadecimal, sometimes we use 0x prefix for hexa number
assembly language is computer nature language but the computer still cant understand
because its just mnemonix code and still readable by human. thats why,
codes in assembly lang. must be assembled to binary opcodes or machine codes.
in this guide, i wont tell you how to convert binary to decimal or to hexa or vice versa
manually. you simply can ask your maths teacher or just google to find the tutes.
it is because we have all have calc.exe to do the jobs. but the knowledge is knowledge
and knowledge is power. you should learn how to convert the base number later.
2. The power of Assembly
in cracking or reversing, we dont reverse the machine code but the disassembled machine
codes in assembly language. so, just forget about this course if you dont want to learn
it or too lazy to learn a new programming language.
since machine codes is formed by binary digit (bit), we should take a lesson how the bit is
formed. because bit is special number, it must be aligned because of it natures.
They are bits, nibbles, bytes, word, dword.
bit | a digit, 0 or 1 | 0 or 1 |
nibble | four bits, | 1111, 1001 |
byte | eight bits, | 11111111 |
word | sixteen bits | 1111111111111111 |
dword | thirty-two bits | 11111111111111111111111111111111 |
most at all, bits are aligned using bytes or 8 bits. why?
i'll show you something
hexadecimal | binary | decimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
A 1010 10
B 1011 11
C 1100 12
D 1101 13
E 1110 14
F 1111 15
from above you could see a nible can transform up to 16 number from 0h to Fh
or from 0 to 15 in decimal. meditate by yourself why hexadecimal number is
chosen instead of decimal number to REPRESENT the binary number.
by the way, byte is using double of nibble's bits, 11111111 == 0xFF == 255
from 0 to 0xFF ..... 0 to 11111111 ..... 0 to 255.
a 256 combination including 0 of numbers, a byte is enough for replacing
ascii characters in binary.
now run your calc.exe with scientific mode.. try to play with base number.
so, 8 bits (byte) number are 0 -> 0xFF, 16 bits(word) numbers are 0 -> 0xFFFF
and 32 bits (dword) 0 -> 0xFFFFFFFF.
(2^32) - 1 == 0xFFFFFFFF
i assumed you use intel compatible processor now. and it is a 32 bits processor.
32 bits technology by today pentium 4 is came from ancient 386 processor.
but, do you know what the hell 32 bits thingy played around with the processor?
and did you ever think how the computer/calculator doing the calculation?
ok, now i'll give little explanation for both question above. and dont worry,
i wont go deeply into the processor architecture, just simple things that
we have to understand to be a reverser or at least, an asm coder.
every processors ever built, they all have REGISTERs to do arithmethic or
bitwise operation (AND, OR, XOR, etc).
for our lovely intel x86 (pentium 4 probably 786 :P), its has several register
to do its operation. these registers will hold any number value according to
its architecture whether 8 bits, 16 bits, or 32 bits.
as we all have known before, intel x86 is a 32 bits processor, its registers will
hold any value from 0 -> 0xFFFFFFFF
80x86 processor has 8 general purpose registers, and these register
mostly used while doing the reversing task.
they are EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP.
Accumulator [E]AX (AH/AL) Multiply, divide, I/O, fast arithmetic.
Base [E]BX (BH/BL) Pointer to base address (data segment).
Count [E]CX (CH/CL) Count for loops, repeats, and shifts.
Data [E]DX (DH/DL) Multiply, divide, and I/O.
Source Index [E]SI Source string and index pointer.
Destination Index [E]DI Destination string and index pointer.
Base Pointer [E]BP Pointer to stack base address.
Stack Pointer [E]SP Pointer to top of stack.
where [E] mean extended for 32-bit register. Read below carefully.
EAX is a 32bits register (a 'dword' can hold value up to 0xFFFFFFFF)
The lower part of EAX is AX, a 16 bits register
(16 bit register is inherited from 80286, an old 16 bit processor architecture)
AX can be decomposed in his higher and lower part, AH and AL. (that are 8 bits register)
here is the example:
if the value of EAX == 0xCF3922DB2
so AX == 0x2DB2 , AH == 0x2D, and AL == 0xB2.
but there is something weird with higher part of EAX, 0xCF39, it cant be access
directly. no worry, u will learn how to access it soon, after discovering bitwise
* - the example above also could be apply to EBX, ECX and EDX registers
** - the value in eax is dependant for each others, if you modified AX to
0x4F2A so, the new value of EAX == 0xCF3924F2A and vice versa.
The other registers are:
Flags [E]<Flags> Processor flags.
Instruction Pointer [E]IP Memory location of current instruction.
Code Segment CS Segment containing program code.
Data Segment DS Segment containing program data.
Stack Segment SS Segment for stack operations.
Extra Segment ES Extra program data segment.
Extra Segment FS Extra program data segment (386+).
Extra Segment GS Extra program data segment (386+).
Control Registers CR(0-3) Paging, caching, and protection (386+).
Debug Registers DR(0-7) Data and instruction breakpoints (386+).
Test Registers TR(3-7) Testing the TLB and cache (386+).
Global Descriptor GDTR Address and limit of GDT (286+).
Local Descriptor LDTR Address, limit, and selector of LDT (286+).
Interrupt Descriptor IDTR Address and limit of IDT (286+).
Task Register TR Address, limit, selector, and attributes of
btw, not all of registers above are useful for us a cracker/reverser.
the register that should be care are the flags, EIP, DS, and SS for
ok newbie reversers, the actual asm short class just started.
Basic ASM opcodes (command).
Before i start to explain these, find by yourself a disassembler like
w32dasm or built-in disassembler in ollydbg. but, i recommend ollydbg..
so, you can feel the code that i'll explain after this.
A. MOV - move instruction
This command prolly most used in a computer program. It simply can move
a value, to a register. since MOV command didnt wipe the source, we
should call it copy tho ;P.
MOV destination, source
btw, the destination could be any register or memory address.
but the source could be immediate value, register, or memory address.
NOTE: the memory address value is stored in a register.
before: EAX == 0x000FFF0F, ECX == 0x00000000
MOV EAX, 0x00965695
MOV ECX, EAX
after: EAX == 0x00965695, ECX == 0x00965695
firstly, an immediate value = 0x965695 being moved to EAX
after that, the value in EAX will be passed to ECX
thats why, both register are same. its easy to understand,
EAX == 0x0000000, ECX == 0x000011A0, EBP == 0x0012FCDC
value in memory for SS:[EBP] == 0x00FF00FF
hmmm... this kind of operation need a proper explanation since it is
very² important. while processing data, a cpu (processor) must have
a memory module to save any value. btw, memory space has it own size and
memory space also must be indexed to keep a lot of different values/datas
by addressing. in win32 platform, it has 32 bit of virtual memory adress.
so, the address or after this we'll call offset could be 0x0 to 0xFFFFFFFF
MOV EAX,DWORD PTR SS:[EBP]
now EBP == 0x0012FCDC, with this command, we will look 0x0012FCDC as memory
offset. thus, the value of the offset is 0x00FF00FF. so, the value will be moved
to EAX is 0x00FF00FF not 0012FCDC since it just a memory address or pointer to
the value. btw, DWORD mean the value in offset 0x0012FCDC is in DWORD (32 bit).
and DWORD also could be other like BYTE, or WORD.
MOV DWORD PTR SS:[EBP], ECX
this command is the reverse of command above. the value of ECX (0x000011A0) will
be moved to offset 0x0012FCDC. thus, SS:[EBP] == 0x000011A0.
EAX == 0x00FF00FF, ECX == 0x000011A0, EBP == 0x0012FCDC
value in memory for SS:[EBP] == 0x000011A0
note: in softice or ollydbg(with commandbar plugin), u can simply
run 'd ebp' to check the value of for the address. d = dump.
if the command like this
MOV EAX,DWORD PTR SS:[EBP+8]
simply type 'd ebp+8'
B. CMP, JMP, JNZ, JE,... - compare and jump
If you familiar with any high level programming language like c, basic, pascal
you must already familiar with these statement
if (a == b)
could you imagine how the cpu will do the comparison process? simply, it's just
a dirty trick.
* a - b = result. if the result = 0, aren't values the same?
in asm, and of course while doing reversing, it look like these...
CMP EAX, ECX
JE is 'jump if equal' sometimes people use JZ (jump if zero), both are same
zero mean equal.. read above *
otherwise, JNZ mean jump if not zero or not equal..
btw, after instruction CMP, theres a lot of jump commands like JGE
(jump if greater or equal), JA (jump if above), JB (jump if below), etc.
Every jump that is combined with a cmp, is a condicional jump. The only
one jump that is not condicional, is the jmp command, which happens
independently of any value or result.
in this text, i wont explain much about about assembly programming, because
mostly basic book of assembly have about 500 pages or 1000 and even more..
its very long reading/learning. you have to find a book.
if you want to know more deeply of intel 80x86 architecture, i suggest you read
a book, 'The Art of Assembly Language" (AoA) by Randall Hyde. The useful chapters
Volume One: Chapter 2.5 and 2.6, Chapter 3 and Chapter 4
Volume Two: All Chapters.
After that, Iczelion win32asm tutorials will turn you as win32asm coders.
In case you want to read it :P.
More intresting, in very sooner, AMD will distribute AMD64. i hope its not hard
for us, the reversers/asm coders to migrate on the new platform ;-).
3. The Reversing.
- Win32 API -
In these days, most popular and most used operating system is micro$oft windows
platform including win98, win2k, and lately winxp. all of those os's are typically
same for their programming interface, they all use win32 API (Application programming
interface). in the old DOS days, assembly programmer used to interact directly to
computer interrupt to communicate with computer system hardware. But, after win32 API
was introduced, application programmers no longer have to know much about the hardware,
they just have to know API collection to be used their program. as example, to display
a warning messagebox, theres an API for the purpose, which will be called.
in c programming language, the API function could be called like this.
MessageBox(hwnd, "Your input is wrong!", "Warning!", MB_OK);
as defined in Win32 API documentation.
HWND hWnd, // handle of owner window
LPCTSTR lpText, // address of text in message box
LPCTSTR lpCaption, // address of title of message box
UINT uType // style of message box
the MessageBox function was included in USER32.dll and could be imported by any
win32 applications. beside that function, there are thousands of win32 API
functions collection which implemented in kernel32.dll, GDI32.dll, etc.
DLL (Dynamic Linked Library)
the .dll files is a collection of ready made function which could be use by
programmers. in c, if we made a function for any routine, we can call it
many times so the functions in .dll files. the different is, the function in
c will be included in the .exe itself but not the fuctions in .dll, they in
other file and will be opened after being called. Thats why they called as
Dynamic Linked Library.
i think overview above about win32 api is enough. we will discuss more about
how to manipulate win32 api in reversing later.
- PE Files Format -
if you're from *nix/linux world, elf format for the executable files maybe
familiar to you. but we are in windows now, and it uses PE format.
i will not teach you about it. Just want to give simple explanation of it.
there are many good tutorials about PE by Matt Pietrek, Mammon, LUEVELSMEYER,
for windows, all executable file names' are in *.exe form. the code in the files
are segmented. if you had read assembly in DOS book, the *.exe also must be
segmented unless it is *.com. the segmentations must be done because of the
computer architecture (x86). segments in easy word is 'memory block'.
in windows, the segmentations are slightly different from dos.
because windows has memory management system all application level programs
will be loaded into their very own virtual memory locations. to make it simple,
the PE files (executables, .DLL, .OCX, .SYS, .CPL and .SCR files) structure
stored on disk is the same after windows load it into memory.
to get the big picture, get yourself a Hex Editor (i prefer Hex Workshop) and
OllyDbg. Open any PE files (*.exe) in both tools.
in ollyDbg, press Alt-M for memory map.
Look at address 0x00400000. OllyDbg will tell us it is address for PE Headers.
the segment size for PE Headers usually 0x1000 bytes or 4096 bytes.
after that, at 0x00401000 is code segment. the size of code segment depends
on the program.
later on, there are several segmentation for imports/exports, data, and resources.
if you're lucky, you'll find strange segments especially if it is
packed/protected. For example: zip self-extract or upx packed files. In windows,
segments are called as sections.
After that, in the hex editor. please check by yourself whether the first 4096 bytes
are PE Headers section. btw, 4096 bytes are too big for PE Headers data, thats why
you'll find many 0x0 value bytes.
for more, please try to open other executables files using OllyDbg. But dont be surprised,
all PE Files stored in virtual memory at 0x00400000 base address.
This is applied to all executables.
in hex editor, you will not find 0x400000. The files begin at address (or could call it offset)
0x0. for the sake of the windows memory management, unfortunately we must learn about
Relative Virtual Address(RVA). fortunately it very easy to understand.
for the example, we want to make a patch at offset 0x401290. as we knew before,
the offset dont exist in hex editor.
to make the patch, we can go to offset 0x1290 in hex editor.
please take note, above example to find physical offset
on disk true if only the size of PE Headers is 0x1000 bytes.
sometimes, optimized executables just took 0x400 bytes for PE Headers on disk.
thus the virtual size in virtual memory still 0x1000.
u can check size of headers using pe editor like procdump, lordPE, etc. Also
OllyDebugger can show you this information.
using pe editor, you find, let say, pe header's size is 0x400. so code section
starts at physical address (raw) 0x400 / virtual address at 0x1000.
lets do the maths again.
0x1000 - 0x400 = 0xC00
base address still 0x400000.
and the virtual offset that we want to patch is 0x404320.
to convert 0x404320 to disk raw offset,
0x4320 - 0xC00 = 0x3720
ok.. thats enough...non standard size of PE Headers make little mess.. just because
0x1000 - 0x_____ != 0.
The reversing itself.
For us newbie reversers, the most intresting things to be reversed are protected
sharewares. But, for advanced reverser, there are a lot more advanced things like
packing/protecting and unpacking/unprotecting of PE files(win32 .exe files), adding
and modifying function in an application, stealing codes, reversing tools coding,
Since the most target for newbie cracker/reverser are sharewares protected by
nag screen, or serial numbers, get rid off the protection then getting registrated
version of the shareware is a great victory for newbies.
before that, all cracker/reverser must use at least a tool to attack the targets.
the must used tool is called debugger. general purpose of any debugger is to pause
or break the processor of any sequence of executions. the problem here is, the
program code is too long, very long enough. its hard to find the exact location
where to break when program being executed. so we need a good disassembler to help
us to find them. an average disassembler i.e W32dasm is good enough for use newbie.
but theres also excellant disassembler called IDA.
btw, in win32 reversing, we're too lucky because we can manipulate win32 api calls
as a breakpoint.
for the example..
when a shareware request input from a user for a name and serial number, it will
calculate it to make sure the serial is correct and turn on the registered version
of the shareware. otherwise, it will display a message (commonly it's a messagebox
or a dialogbox) to tell us the serial we've entered is wrong. from here, we can use
messagebox or dialogbox API function as breakpoints to know from which location
the messagebox was called.
The Cracking Tools.
You'll find so many cracking tools coded by crackers/coders/reversers out there
from around the world. But we'll use just a few of them especially in the beginning.
The very basic tools are a debugger and a disassembler. And if necesarry, an unpacker
for packed/protected programs.
First at all, you have to know there are two kind of debugger, ring-0 level
(system level) and ring-3 level (application level). But now, we wont touch the
low level as we dont need them currently.
Although many ring-0/ring-3 debugger on the black market(you know what i mean), i just
recommended OllyDbg because its very simple, easy to use, yet also powerful.
I think i dont have to teach you to use it because you can learn it by yourself in
minutes. By the way, u can read several tutes about ollydbg from my website by
googling for "zephyrous ollydbg tutorial" or any tutes by other writers.
Biw-reversing site has very good tutorials collection.
The most popular are w32dasm and IDA(for advanced reverser). The usage for
w32dasm is pretty straight forward and we can use it for dead listing analysis.
All debugger is a disassembler actually, but they have different function.
It's depend on you the way of your codes analysis job.
After you're becoming better in reversing, you'll find IDA so powerful and very
useful in specific tasks.
Generally speaking, all protection(nag screen, serial) are easy to be removed
unless the code itself has been scrambled into wild binary code and cannot be blindly
patched. When the program will be executed, the internal descrambled engine will extract
true binary code for the program to run well. So, the unpacker's job is to extract the
right codes for patching and analysis.
How to Crack?
This question is too general because there are too many techniques can be used.
Its depend on us cracker to find the best method for every different problems. The most
used technique is by 'guessing', but we won't do it blindly (smart guessing) just like in
any puzzle games. We have to guess because we dont know where is the protection. So, we
have to learn about win32 API as had been discussed before.
For the example (again), a shareware is asking for a serial number. To received the serial
numbers from user, the program will provided an 'Edit Box'. After user has input the some
text in edit box, the program will copy the text to the computer memory for calculation whether
the serial is true or false. Then, some programs will warn if the serial is wrong with
a 'Dialog Box' or 'Message Box'.
The highlighted text (Edit Box,Dialog Box, Message Box) are parts of Win32 API. To read text
from Edit Box, programmer will call GetDlgItemTextA or GetWindowTextA. However, to display a
dialog box or message box, DialogBoxParamA or MessageBoxA will be called. From here, we will
use these API as breakpoints in debugger, so the debugger will pause the program execution
when the breakpoints have been called. After that, we can continue the program execution
line by line using F8 key(in ollydbg) for live tracing. If we're lucky, the codes for serial
calculations will be found after pressing F8 several times. But most times, its not easy
to find the exact location for serial calculations.
The way computer stored data especially name in memory.
For serial calculations, some algos will compute the serial from the user's name.
in memory its become
54 68 65 20 52 65 76 65 72 73 65 72 (in hexadecimal)
where T = 0x54, h = 0x58, e = 0x65, ..., r = 0x72
Computer naturally just could store numbers. So it used number from 0 - 255 or
0x0 - 0xFF to represent any characters and called ASCII codes. An ASCII character
takes a byte. The homework, get yourselves an ASCII codes table for full reference.
To calculate serial, these ASCII values usually will be used. But each character in bytes.
So, sometimes it more efficient to use lower part of the registers, i.e AL, AH, CL, CH.
As usual, maths operation will be used thoroughly like add, substract, multiply, division,
modulus, etc including bitwise in the algorithms.
I think it is enough for now. I wont go further because there are a lot of tutorials
about cracking for specific targets and/or using specific techniques.
a good start is finding the collection for the tutorials at krobar's collection site,
biw-reversing, anticrack.de, and google at it best. maybe i'll make another tutorial
based on this little guide but not in general. i will go specifically like patching,
serial fishing, keygenning, unpacking, etc.
Happy Reversing and Have Fun!
Greetings and respects:
Ancient_One, Kwai_Lo, Bengaly, snaker, Detten, chainie, cluesurf, Zerobyte,
BiW-Reversing team, Bor0, X-Lock, fuss, cik siti, iesha, and all reversers out there.
Big thanx to Mr_Geek for fixing errors in this tute :)