It was the first time I had to deal with virtualized code, so my solution is far from being the best. Surely there were much quicker ways, but mine did get the job done. This write-up is essentially meant for beginners in the domain of obfuscated code reverse engineering.

Type of challenge

Keykoolol challenge description
FCSC 2020 — Keykoolol challenge rules

This is a keygen type of challenge. You have to download a binary that takes inputs and, much like licensed software, verifies those inputs against each other. The goal is to create a keygen: a script that generates valid inputs to feed to the license verification algorithm. There are two inputs: a username and a serial.

Bash
./keykoolol
[+] Username: toto
[+] Serial:   tutu
[!] Incorrect serial.

In these types of challenges, I would advise you to manually fuzz inputs. By sending special characters and strings with an invalid length, you might get interesting error messages. The first step is to understand what the software expects as inputs.

ELF analysis

The file is a 16KB ELF file:

Bash
keykoolol: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
           dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
           for GNU/Linux 3.2.0, stripped

The sections reveal more interesting things:

Bash
[14] .text    PROGBITS  0x730   0x1ce2  AX
[16] .rodata  PROGBITS  0x2420  0x4c0   A
[24] .bss     NOBITS    0x3020  0x2868  WA

So .text contains our code, but .rodata and .bss are quite large. 1216 bytes for .rodata and 10KB for .bss? Something smells fishy. As a reminder, .bss is meant for uninitialized global variables — it often contains stuff like session encryption keys and runtime data.

After offset 0xC0 in .rodata, the data seems like gibberish — but is it random? Let's analyze its entropy with binwalk -E:

Entropy analysis of rodata section
rodata entropy at ~0.894 — far from random

The entropy is around 0.894 — far not enough to qualify as random data. The NULL byte is very recurring, the pattern 0006 appears four times, and 0c00 appears twice. For comparison, truly random data looks like this:

Maximum entropy reference
Random data entropy at ~0.97 — noticeably different

Binary disassembly

The main function:

Main function in IDA
Main function: call to check_serial followed by conditional jump
Conditional jump after validation
Without online check, replacing JZ (0x74) with JNZ (0x75) would bypass the validation

Now to the main part — the check_serial function:

check_serial function graph in IDA
The check_serial function — a virtual machine interpreter

The function takes RO_DATA_ARRAY as argument, which is copied (0x400 bytes) into the .bss:

RO_DATA_ARRAY reference
The suspicious rodata segment being passed to the validation function

The main part is a loop over all values of this array, taken as dwords, checking the least significant byte — a 256-case switch structure:

Opcode dispatcher
Opcode dispatcher — each byte value triggers different code execution
Dispatcher zoomed in

You are looking at a virtual processor, interpreting a custom bytecode, also known as Intermediate Representation. The original code has been split into basic blocks and translated into another higher-level code.

Virtual machine structure
Virtual machine structure with dispatcher and handlers

The conditional jumps used by all the code:

Conditional jump handlers
Four conditional branches shared across all virtual instructions

IR to x86 translation for conditional jumps:

  • 0x15: jump if greater (must be lower)
  • 0x14: jump if shorter (must be higher or equal)
  • 0x0A: jump if not zero (must be equal)
  • 0x09: jump if zero (must be different)

The code is also self-modifying — hence the copy to the writable .bss. There are three XOR deobfuscation stages. After deobfuscating all the IR, the macro steps are:

  1. Verify serial charset
  2. Hash username
  3. Decode new instructions with C1D2E3F4
  4. Decode hex serial
  5. Decode new instructions with A1B2C3D4
  6. Encrypt 32 rounds of AES
  7. Decode new instructions with AABBCCDD
  8. Loop over serial and verify value byte per byte

Hashing the username

The hashing function in IR:

IR
0x0e 3 0d 658 -> (username[i]+j)*0D
0x13 3 25 0e7 -> (username[i]+j)*0D ^ 25
0x11 3 ff 64b -> ((username[i]+j)*0D ^ 25) % FF => res

Reimplemented in Python:

Python
def derived_key(tkey):
    s = tkey
    for k in range(0, 5):
        s += ''.join(chr(((ord(s[i+(k*16)])*0x03)^0xFF)&0xFF) for i in range(0,16))
    return s

def transient_key(username):
    temp = ('00'*16).decode('hex')
    for i in range(0, len(username)):
        temp2 = ''.join(chr((((ord(username[i])+j)*0x0D)^0x25)%0xFF) for j in range(0, 16))
        d = deque(temp2)
        d.rotate(i)
        temp = ''.join(chr(ord(temp[k])^ord(list(d)[k])) for k in range(0, 16))
    return temp

With a given username, you obtain the same hash using derived_key(transient_key(username)).

Crypto — where the fun starts

Summary so far:

  • The serial must be a 0x80 bytes long hex-encoded string
  • The username is hashed into 0x60 bytes
  • The last two 16-byte lines of the serial are encryption keys
Stack showing hash and serial keys
Hash result (boxed) with appended key lines (highlighted in yellow)

The code encrypts the serial using AES-NI instruction aesdec for several rounds, then checks byte by byte that the result equals the hash:

AES-NI instruction block
AES-NI single round encryption using xmm registers (128-bit AES)

The custom block chaining with circular shifts:

Custom AES block chaining diagram
One round of custom block encryption with circular shifts

And the inverse for decryption:

Decryption block diagram
Decryption order matters: l6 depends on l1, l3 depends on l4

The keygen function:

Python
def generate_serial(username, key):
    h = hash_username(username)
    serial = decrypt(h, key, rounds=32)
    return serial
Bash
cat input | ./keykoolol
[+] Username: [+] Serial:   [>] Valid serial!
[>] Now connect to the remote server and generate serials
for the given usernames.

If you do it, be smarter than me

The alternatives that would have spared me a lot of time:

  • Symbolic execution (using angr, miasm or similar)
  • Translating the whole intermediate representation through automation

I didn't try those methods, but people who used them definitely had quicker results.

References

Stay classy netsecurios.


FCSC 2020 — Keykoolol — 500 points


← Back to articles

Need a security audit or tailored cybersecurity support?

Explore our services →