Saturday, September 08 2007 @ 09:48 AM CEST Contributed by: ColdT Views: 21127
This essay consists of a CRC tutorial and a way of how to reverse it. Many
Coders/Reversers don't know exactly how CRC works and almost no one knows how to
reverse it, while this knowledge could be very usefull.
First the tutorial will
learn you how to calculate CRC in general, you can use it as data/code
protection. Second, the reverse part will learn you (mainly) how to reverse
CRC-32, you can use this to break certain CRC protections in programs or over
programs (like anti-virus).
There seem to be utilities who can 'correct' CRCs
for you, but I doubt they also explain what they're doing.
I'd like to warn you, since there is quite some math used in this essay. This
wont harm anyone, and will be well understood by the avarage Reverser or Coder.
Why? Well. If you dont know why math is used in CRC, I suggest that you click
that button with a X at the top-right of this screen. So I assume the reader has knowledge of binair arithmetic.
Part 1: CRC Tutorial, what it is and how to calculate it
Cyclic Redundancy Code or CRC
We all know CRC. Even if you don't recall, you will when you think of those
annoying messages RAR, ZIP and other compressors give you when the file is
corrupted due to bad connections or those !@#$% floppies. The CRC is a value computed over a piece of data, for example for each file at the
time of compression. When the archiver is unpacking that file, it will read the
CRC and check it with the newly computed CRC of the uncompressed file. When
they match, there is a good chance that the files are identical. With CRC-32,
there is a chance of 1/2^32 of the check failing to recognize a change in data.
A lot of people think CRC is short for Cyclic Redundancy Check. If indeed CRC
is short for Cyclic Redundancy Check then a lot of people use the term incorrect.
If it was you could not say 'the CRC of the program is 12345678'. People are also
always saying a certain program has a CRC check, not a Cyclic Redundancy Check
check. Conclusion: CRC stands for Cyclic Redundancy Code and NOT for Cyclic
How is the calculation done? Well, the main idea is to see the file as one
large string of bits divided by some number, which will leave you with a
remainder, the CRC! You always have a remainder (can also be zero) which is at
most one bit less then the divisor (else it still has a divisor in it).
(9/3=3 remainder=0 ; (9+2)/3=3 remainder=2)
Only here dividing with bits is done a little different. Dividing is repeatedly
substracting (x times) a number (divisor) from a number you want to divide, which
will leave you with the remainder. If you want the original number back you
multiply with the divisor or (idem) add x times the divisor with itself and
afterwards adding the remainder.
CRC computation uses a special way of substracting and adding, i.e. a
new 'arithmetic'. While computing the carry for each bit calculation is
Lets look at 2 examples, number 1 is a normal substraction, 2&3 are special.
In (1), the second column from the right would evaluate to 0-1=-1, therefore
a bit is 'borrowed' from the bit next to it, which will give you this
substraction (10+0)-1=1. (this is like normal 'by-paper' decimal substraction)
The special case (2&3) 1+1 would normally have as answer 10, where the '1' is
the carry which 'transports' the value to the next bit computation. This value
is forgotten. The special case 0-1 would normally have as answer '-1', which
would have impact on the bit next to it (see example 1). This value is also
forgotten. If you know something about programming this looks like, or better,
it IS the XOR operation.
Now look at an example of a divide:
The quotient of a division is not important, and not efficient to remember,
because that would be only a couple of bits less than the bitstring where you
wanted to calculate the CRC from. What IS important is the remainder! That's
the thing that says something important over about the original file. That's
basicly the CRC!
Going over to the real CRC computation
To perform a CRC calculation we need to choose a divisor, we call it the
'poly' from now on. The width W of a poly is the position of the highest bit,
so the width of poly 1001 is 3, and not 4. Note that the highest bit is always
one, when you have chosen the width of the poly you only have to choose a value
for the lower W bits.
If we want to calculate the CRC over a bitstring, we want to make sure all
the bits are processed. Therefore we need to add W zero bits to the end of the
bitstring. In the case of example 3, we could say the bitstring was 1111.
Look at a little bigger example:
There are 2 important things to state here:
1.Only when the highest bit is one in the bitstring we XOR it with the poly,
otherwise we only 'shift' the bitstring one bit to the left.
2.The effect of XORring is, that it's XORed with the lower W bits, because the
highest bit always gives zero.
Going over to a Table-Driven Algorithm
You all should understand that an algorithm based on bitwise calculation will
be very slow and inefficient. It would be far more efficient if you could
calculate it on a per-byte basis. But then we can only accept poly's with a
width of a multiple of 8 bits (that's a byte ;). Lets visualize it in a example
poly with a width of 32 (W=32):
3 2 1 0 byte
Pop! this is the poly, 4*8 bits
This is a register you use to store the temporary result of the CRC, I call
it the CRC register or just register from now on. You are shifting bits from
the bitstring in at the right side, and bits out at the left side. When the bit
just shifted out at the left side is one, the whole register is XORred by the
lower W bits of the poly (in this case 32). In fact, we are doing exactly the
same thing as the divisions above.
What if (as I said) we would shift in & out a whole group of bits at once.
Look at an example of 8 bit CRC with 4 bits at once shifted in & out:
The register just before the shift : 10110100
Then 4 bits (at the top) are shifted out at the left side while shifting 4 new
bits in at the right side. In this example 1011 is shifted out and 1101 (new)
is shifted in.
Then the situation is this:
8 bits currently CRC/Register : 01001101
4 top bits just shifted out : 1011
We use this poly : 101011100, width W=8
Now we calculate just as usual the new value of the register.
1011 01001101 the topbits and the register
1010 11100 + (*1) Poly is XORred on position 3 of top bits (coz there is a one)
0001 10101101 result of XORring
Now we still have a one on bit position 0 of topbits:
0001 10101101 previous result
1 01011100+ (*2) Poly is XORred on position 0 of top bits (coz there is a one)
0000 11110001 result of second XORring
Now there are all zero's in the topbits, so we dont have to XOR with the poly
anymore for this sequence of topbits.
The same value in the register you get if you first XOR (*1) with (*2) and the
result with the register. This is because of the standard XOR property:
(a XOR b) XOR c = a XOR (b XOR c)
1010 11100 poly on position 3 of top bits
1 01011100+ poly XORred on position 0 of top bits
1011 10111100 (*3) result of XORring
The result (*3) is XORred with the register
1011 01001101+ the top bits and the register
You see? The same result! Now (*3) is important, because with the top bits 1010
is always the value (*3)=10111100 (only the lower W=8 bits) bound (under the
stated conditions, of course) This means you can precompute the XOR values for
each combination of top bits. Note that top bits always become zero after one
iteration, this must be because the combination of XORring leads to it.
Now we come back to figure 1. For each value of the top byte (8 bits) just
shifted out, we can precompute a value. In this case it would be a table
consisting of 256 (2^8) entries of double words (32bit). (the CRC-32 table is
in the appendix)
In pseudo-language our algoritm now is this:
While (byte string is not exhausted)
Top = top_byte of register ;
Register = Register shifted 8 bits left ORred with a new byte from string ;
Register = Register XORred by value from precomputedTable at position Top ;
The direct Table Algorithm
The algorithm proposed above can be optimized. The bytes from the byte string
don't need to travel through the whole register before they are used. With
this new algorithm we can directly XOR a byte from a byte string with the byte
shifted out of the register. The result points to a value in the precomputed
table which will be XORred with the register.
I don't know exactly why this gives the same result (it has to do with a XOR
property), but it has the Big advantage you don't have to append zero
bytes/bits to your byte string. (if you know why, pleaz tell me :)
To make things more complicated there is a 'reflected' version of this
algorithm. A Reflected value/register is that it's bits are swapped around
it's centre. For example 0111011001 is the reflection of 1001101110.
They came up with this because of the UART (chip that performs serial IO),
which sends each byte with the least significant bit (bit 0) first and the most
significant bit (bit 7) last, this is the reverse of the normal situation.
Instead then of reflecting each byte before processing, every else is
reflected. An advantage is that it gives more compact code in the
implementation. So, in calculating the table, bits are shifted to the right and
the poly is reflected. In calculating the CRC the register is shifted to the
right and (of course) the reflected table is used.
byte string (or file) -->---+
| 1. In the table each entry is reflected
byte 3 2 1 0 V 2. The initial register is reflected
+---+---+---+---+ | 3. The bytes from the byte string aren't
| | | | |>---XOR reflected, because all the rest is.
| | | | | | Precomputed table
: : : : :
Some implementations in Assembly
To get everything settled here's the complete CRC-32 standard:
Name : "CRC-32"
Width : 32
Poly : 04C11DB7
Initial value : FFFFFFFF
Reflected : True
XOR out with : FFFFFFFF
As a bonus for you curious people, here's the CRC-16 standard: :)
Name : "CRC-16"
Width : 16
Poly : 8005
Initial value : 0000
Reflected : True
XOR out with : 0000
'XOR out with' is the value that is XORred with the final value of the register
before getting (as answer) the final CRC.
There are also 'reversed' CRC poly's but they are not relevant for this
tutorial. Look at my references if you want to know more about that.
For the assembly implementation I use 32 bit code in 16 bit mode of DOS...
so you will see some mixing of 32 bit and 16 bit code... it is easy to convert
it to complete 32 bit code. Note that the assembly part is fully tested to be
working correctly, the Java or C code is derived from that.
Ok. Here is the assembly implementation for computing the CRC-32 table:
xor ebx, ebx ;ebx=0, because it will be used whole as pointer
xor eax, eax ;eax=0 for new entry
mov al, bl ;lowest 8 bits of ebx are copied into lowest 8 bits of eax
xor cx, cx
test eax, 1
shr eax, 1
xor eax, poly
shr eax, 1
test cx, 8
mov dword ptr[ebx*4 + crctable], eax
test bx, 256
Notes: - crctable is an array of 256 dwords
- eax is shifted to the right because the CRC-32 uses reflected Algorithm
- also therefore the lowest 8 bits are processed...
Notes: - ds:si points to the buffer where the bytes to process are
- cx contains the number of bytes to process
- eax contains current CRC
- crctable is the table computed with the code above
- the initial value of the CRC is in the case of CRC-32: FFFFFFFF
- after complete calculation the CRC is XORred with: FFFFFFFF
which is the same as NOTting.
In Java or C it is like this:
for (int cx=0; cx>=8;
So now we landed at the end of the first part: The CRC tutorial
If you want to make a little deeper dive in CRC I suggest reading the document
I did, you will find the URL at the end of this document.
Ok. On to the most interresting part of this document: Reversing CRC!
Part 2: Reversing CRC
When I was thinking of a way to reverse it... I got stuck several times. I
tried to 'deactivate' the CRC by thinking of such an sequence of bytes that it
then shouldn't matter anymore what bytes you would place behind it. I couldn't
do it... Then I realized it could NEVER work that way, because CRC algorithm is
build in such a way it wouldn't matter which _bit_ you would change, the
complete CRC _always_ (well always... almost) changes drasticly. Try that
yourself (with some simple CRC programs)... :)
I realized I only could 'correct' the CRC _after_ the bytes I wanted to
change. So I could make such a sequence of bytes, that would 'transform' the
CRC into whatever I wanted!
Lets visualize the idea:
Bunch of bytes: 01234567890123456789012345678901234567890123456789012
You want to change from ^ this byte to ^ this one.
Thats position 9 to 26.
We also need 4 extra bytes (until position 30 ^) for the sequence of bytes which
will change the CRC back to its original value after the patched bytes.
When you are calculating the CRC-32 it goes fine until the byte on position 9,
in the patched bunch of bytes the CRC radically changes from that point on.
Even when pass position 26, from where the bytes are not changed, you never get
the original CRC back. NOT! When you read the rest of this essay you know how.
In short you have do this when patching a certain bunch of bytes while
maintainting the CRC:
1. Calculate the CRC until position 9, and save this value.
2. Continue calculating until position 27 and 4 extra bytes, save the resulting value.
3. Use the value of 1 for calculating the CRC of the 'new' bytes and the extra
4 bytes (this should be 27-9+4=22 bytes) and save the resulting value.
4. Now we have the 'new' CRC value, but we want the CRC to be the 'old' CRC
value. We use the reverse algorithm to compute the 4 extra bytes.
We can to point 1 to 3, below you learn to do point 4.
I thought, to make it more easy for you, first to calculate the reverse of
CRC-16. Ok. We are on a certain point after the patched code where you want to
change the CRC back to its original. We know the original CRC (calculated before
patching the data) and the current CRC register. We want to calculate the
2-bytestring which changes the current CRC register to the original CRC.
First we calculate 'normally' the CRC with the unknown 2 bytes naming them X
and Y, for the register I take a1 a0 , the only non-variable is zero (00). :)
Look again at our latest CRC algorithm, figure 3, to understand better what im
Ok, here we go:
Take a 2-bytestring 'X Y'. Bytes are processed from the left side.
Take for register a1 a0.
For a XOR operation I write '+' (as in the CRC tutorial)
Processing first byte, X:
a0+X this is the calculated topbyte (1)
b1 b0 sequence in table where the topbyte points at
00 a1 to right shifted register
00+b1 a1+b0 previous 2 lines XORred with eachother
Now the new register is: (b1) (a1+b0)
Processing second byte, Y:
(a1+b0)+Y this is the calculated topbyte (2)
c1 c0 sequence in table where the topbyte points at
00 b1 to right shifted register
00+c1 b1+c0 previous 2 lines XORred with eachother
Now the final register is: (c1) (b1+c0)
I'll show it a little different way:
a0 + X =(1) points to b1 b0 in table
a1 + b0 + Y =(2) points to c1 c0 in table
b1 + c0=d0 new low byte of register
c1=d1 new high byte of register
Wow! Let this info work out on you for a while... :)
Don't be afraid, a real value example is coming soon.
What if you wanted the register to be some d1 d0 (the original CRC) and you
know the value of the register before the transformation (so a1 a0)... what 2
bytes or what X and Y would you have to fed through the CRC calculation?
Ok. We will begin working from the back to the front. d0 must be b1+c0 and
d1 must be c1... But how-the-hell, I hear you say, can you know the value of
byte b1 and c0??? ShallI remember you about the Table? You can just lookup
the value of the word C0 C1 in the Table because you know C1. Therefore you
need to make a 'lookup' routine. If you found the value, be sure to remember
the index to the value because that's the way to find the unknown topbytes e.g.
So now you found c1 c0, how to get b1 b0? If b1+c0=d0 then b1=d0+c0! Now you
use the lookup routine to lookup the b1 b0 value too. Now we know everything
to calculate X & Y ! Cool huh?
a1+b0+Y=(2) so Y=a1+b0+(2)
a0+X=(1) so X=a0+(1)
Non-variable example for CRC-16
Lets look at an example with real values:
-register before: (a1=)DE (a0=)AD
-wanted register: (d1=)12 (d0=)34
Look up the entry beginning with 12 in the CRC-16 table in the appendix.
-This is entry 38h with value 12C0. Try to find another entry beginning with 12.
You can't find another because we calculated each entry for each possible value
of the topbyte and that's 256 values, remember!
Now we know (2)= 38, c1= 12 and c0= C0, so b1= C0+34=F4, now look up the entry
of B1 beginning with F4.
-This is entry 4Fh with value F441.
Now we know (1)= 4F, b1= F4 and b0= 41. Now all needed values are known, to
compute X and Y we do:
X=a0+(1) =AD+4F =E2
Conclusion: to change the CRC-16 register from DEAD to 1234 we need the bytes
E2 A7 (in that order).
You see, to reverse CRC you have to 'calculate' your way back, and remember the
values along the way. When you are programming the lookup table in assembly,
remember that intel saves values backwards in Little-Endian format.
Now you probably understand how to reverse CRC-16.... now CRC-32
Now we had CRC-16, CRC-32 is just as easy (or as difficult). You now work with
4 bytes instead of 2. Keep looking and comparing this with the 16bit version
Take a 4-bytestring X Y Z W , bytes are taken from the LEFT side
Take for register a3 a2 a1 a0
Note that a3 is the most significant byte and a0 the least.
Processing first byte, X:
a0+X this is the calculated topbyte (1)
b3 b2 b1 b0 sequence in table where the topbyte points at
00 a3 a2 a1 to right shifted register
00+b3 a3+b2 a2+b1 a1+b0 previous 2 lines XORred with eachother
Now the new register is: (b3) (a3+b2) (a2+b1) (a1+b0)
Processing second byte, Y:
(a1+b0)+Y this is the calculated topbyte (2)
c3 c2 c1 c0 sequence in table where the topbyte points at
00 b3 a3+b2 a2+b1 to right shifted register
00+c3 b3+c2 a3+b2+c1 a2+b1+c0 previous 2 lines XORred with eachother
Now the new register is: (c3) (b3+c2) (a3+b2+c1) (a2+b1+c0)
Processing third byte, Z:
(a2+b1+c0)+Z this is the calculated topbyte (3)
d3 d2 d1 d0 sequence in table where the topbyte points at
00 c3 b3+c2 a3+b2+c1 to right shifted register
00+d3 c3+d2 b3+c2+d1 a3+b2+c1+d0 previous 2 lines XORred with eachother
Now the new register is: (d3) (c3+d2) (b3+c2+d1) (a3+b2+c1+d0)
Processing fourth byte, W:
(a3+b2+c1+d0)+W this is the calculated topbyte (4)
e3 e2 e1 e0 sequence in table where the topbyte points at
00 d3 c3+d2 b3+c2+d1 to right shifted register
00+e3 d3+e2 c3+d2+e1 b3+c2+d1+e0 previous 2 lines XORred with eachother
Now the final register is: (e3) (d3+e2) (c3+d2+e1) (b3+c2+d1+e0)
I'll show it a little different way:
a0 + X =(1) points to b3 b2 b1 b0 in table
a1 + b0 + Y =(2) points to c3 c2 c1 c0 in table
a2 + b1 + c0 + Z =(3) points to d3 d2 d1 d0 in table
a3 + b2 + c1 + d0 + W =(4) points to e4 e3 e2 e1 in table
b3 + c2 + d1 + e0 =f0
c3 + d2 + e1 =f1
d3 + e2 =f2
(1) (2) (3) (4)
This is reversed in the same way as the 16bit version. I shall give an example
with real values. For the table values use the CRC-32 table in the appendix.
Take for CRC register before, a3 a2 a1 a0 -> AB CD EF 66
Take for CRC register after, f3 f2 f1 f0 -> 56 33 14 78 (wanted value)
Here we go:
First byte of entries entry value
e3=f3 =56 -> 35h=(4) 56B3C423 for e3 e2 e1 e0
d3=f2+e2 =33+B3 =E6 -> 4Fh=(3) E6635C01 for d3 d2 d1 d0
c3=f1+e1+d2 =14+C4+63 =B3 -> F8h=(2) B3667A2E for c3 c2 c1 c0
b3=f0+e0+d1+c2=78+23+5C+66=61 -> DEh=(1) 616BFFD3 for b3 b2 b1 b0
Now we have all needed values, then
X=(1)+ a0= DE+66=B8
Y=(2)+ b0+a1= F8+D3+EF=C4
Z=(3)+ c0+b1+a2= 4F+2E+FF+CD=53
Conclusion: to change the CRC-32 register from ABCDEF66 to 56331478 we need
this sequence of bytes: B8 C4 53 8E
The reverse Algorithm for CRC-32
If you look at the by-hand computation of the sequence of bytes needed to
change the CRC register from a3 a2 a1 a0 to f3 f2 f1 f0 its difficult to
transform this into a nice compact algorithm.
Look at an extended version of the final computation:
It is just the same as figure 4, only some values/bytes exchanged. This view
will help us to get a compact algorithm. What if we take a buffer of 8 bytes
that is, for every line you see in figure 5 one byte is reserved. Bytes 0 to
3 are filled with a0 to a3, bytes 4 to 7 are filled with f0 to f3. As before,
we take the last byte e3 which is equal to f3 and lookup the complete value in
the CRC table. Then we XOR this value (e3 e2 e1 e0) on position 4 (as in figure
5). Then we automatically know what the value of d3 is, because we already
XORred f3 f2 f1 f0 with e3 e2 e1 e0, and f2+e2=d3. Because we now already know
what the value of (4) is (the entry number), we can directly XOR the value into
position 3. Now we know d3 use this to lookup the value of d3 d2 d1 d0 and XOR
this on one position earlier, that is position 3 (look at the figure!). XOR the
found entry number (3) for the value on position 2. We now know c3 because we
have the value f1+e1+d2=c3 on position 5.
We go on doing this until we XORred b3 b2 b1 b0 on position 1. Et voila!
Bytes 0 to 3 of the buffer now contains the needed bytes X to W!
Summarized is here the algorithm:
1. Of the 8 byte buffer, fill position 0 to 3 with a0 to a3 (the start value of
the CRC register), and position 4 to 7 with f0 to f3 (wanted end value of CRC
2. Take the byte from position 7 and use it to lookup the complete value.
3. XOR this value (dword) on position 4
4. XOR the entry number (byte) on position 3
5. Repeat step 2 & 3 three more times while decreasing the positions each time
Implementation of the Reverse Algorithm
Now its time for some code. Below are the implementation of the reverse
algorithm for CRC-32 in Assembly (it is not difficult to do this for other
languages and/or CRC standards). Note that in assembly (on PC's) dwords are
written to and read from memory in reverse order.
crcBefore dd (?)
wantedCrc dd (?)
buffer db 8 dup (?)
mov eax, dword ptr[crcBefore] ;/*
mov dword ptr[buffer], eax
mov eax, dword ptr[wantedCrc] ; Step 1
mov dword ptr[buffer+4], eax ;*/
mov di, 4
mov al, byte ptr[buffer+di+3] ;/*
call GetTableEntry ; Step 2 */
xor dword ptr[buffer+di], eax ; Step 3
xor byte ptr[buffer+di-1], bl ; Step 4
dec di ;/*
jnz computeReverseLoop ; Step 5 */
-Registers eax, di bx are used
Implementation of GetTableEntry
crctable dd 256 dup (?) ;should be defined globally somewhere & initialized of course
mov bx, offset crctable-1
add bx, 4 ;points to (crctable-1)+k*4 (k:1..256)
cmp [bx], al ;must always find the value somewhere
sub bx, 3
mov eax, [bx]
sub bx, offset crctable
shr bx, 2
On return eax contains a table entry, bx contains the entry number.
Well... your reached the end of this essay. If you now think: wow, all those
programs which are protected by CRC can say 'bye, bye'. Nope. It is very easy
to make an anti-anti-CRC code. To make a succesfull CRCreverse you have to
know exactly from what part of the code the CRC is calculated and what CRC
algorithm is used. A simple countermeasure is using 2 different CRC algorithms,
or combination with another dataprotection algorithm.
Anywayz... I hope all this stuff was interesting and that you enjoyed reading
it as I enjoyed writing it.
Fnx go out to the beta-testers Douby/DREAD and Knotty Dread for the good comments on my work which made it even better!
For a sample CRC-32 correcting patcher program visit my webpages:
http://surf.to/anarchriz -> Programming -> Projects
(it's still a preview but will give you a proof of my idea)
For more info on DREAD visit http://dread99.cjb.net
If you still have questions you can mail me at firstname.lastname@example.org,
or try the channels #DreaD, #Win32asm, #C.I.A and #Cracking4Newbies (in that
order) on EFnet (on IRC).
CYA ALL! - Anarchriz
"The system makes its morons, then despises them for their ineptitude, and
rewards its 'gifted few' for their rarity." - Colin Ward
> A painless guide to CRC error detection algorithm
(I bet this 'painless guide' is more painfull then my 'short' one ;)
> I also used a random source of a CRC-32 algorithm to understand the algorithm
> Link to crc calculation progs... hmmm search for 'CRC.ZIP' or 'CRC.EXE' or something
alike at ftpsearch (http://ftpsearch.lycos.com?form=advanced)
Copyright (c) 1998,1999 by Anarchriz
(this is REALLY the last line :)