21 July 2012: Dalvik Bytecode Obfuscation on Android
Android Bytecode Obfuscation
Patrick Schulz 2012
team@dexlabs.org
-- "Like on x86 - 20 years ago"
1. Motivation
While working on our disassembler for dexter we got a deep inner
view of the Dalvik Virtual Machine (DVM), which executes the Dalvik
bytecode. To get a better intuition about the problems we would run
into when implementing our disassembler, we tested a lot of existing
reverse engineering tools, but have not found a disassembler which
fulfills all our requirements (full code coverage, basic block view, ...)
Then we started designing our own. To get a good and stable disassembler
we played a lot with the existing ones and found some problems, which
are well known and also solved under the x86 architecture.
In the following we describe one of these flaws which can be turned into
an obfuscation technique called junk byte injection and has been
implemented as a Proof-of-Concept Obfuscator.
2. Obfuscation Technique
The presented obfuscation technique is called junk byte injection is well
known under the x86 architecture for years.
First thing to decide while implementing a disassembler is the
underlying algorithm for finding new instructions. The most famous one is
recursive traversal, but due to simplicity of the dalvik bytecode format
some tools use linear sweep which is fine for apps that have not been
obfuscated. Linear sweep is also used within the DVM as part of a bytecode
verifier, which checks all applications before they can be installed.
The main problem with linear sweep is that the algorithm cannot decide
if a particular byte is part of an instruction or if it is just dead code or data.
The problem appears when having overlapping instructions. In this case it is
likely that one of these instructions is not visible to the algorithm and in
consequence also not to the analyst.
On Android we can make use of some instructions with various length, so
we can hide following instructions. For this PoC we choose the
fill-array-data-payload instruction.
Format of the fill-array-data-payload:
Name Format Description
ident ushort = 0x0300 identifying pseudo-opcode
element_width ushort number of bytes in each element
size uint number of elements in the table
data ubyte[] data values
This is a so called pseudo instruction that can not be executed. It just holds
data for a regular one. In the Dalvik documentation you can find the following comment:
http://source.android.com/tech/dalvik/dalvik-bytecode.html
There are several "pseudo-instructions" that are used to hold variable-length data payloads, which are referred to by regular instructions (for example, fill-array-data). Such instructions must never be encountered during the normal flow of execution. In addition, the instructions must be located on even-numbered bytecode offsets (that is, 4-byte aligned). In order to meet this requirement, dex generation tools must emit an extra nop instruction as a spacer if such an instruction would otherwise be unaligned. Finally, though not required, it is expected that most tools will choose to emit these instructions at the ends of methods, since otherwise it would likely be the case that additional instructions would be needed to branch around them.
The goal of this obfuscation technique is to combine junk byte injection with
such instructions in order to hide as much of the original bytecode as possible.
3. Approach
We use the fill-array-data-payload instruction which has a variable length to
hide the original bytecode. So, we place it at the very beginning of a method
and fill the element_width and size operand with values so that this instruction
will overlap all upcoming instruction of this method, as you can see here:
To ensure that this instruction will not be executed we place an (unconditional)
branch in front of it, which will point to the first instruction of the original bytecode.
By this we also ensure that we don't alter the behavior of the particular method.
When a disassembler using linear sweep will process this method it will observe the
branch instruction as well as the fill-array-data-payload instruction, but not the
overlapped ones. Therefore the original bytecode will not be shown (as instruction)
to the analyst.
For disassemblers using recursive traversal this won't work. In this case the
disassembler will observe the branch instruction and follows the control flow to the
next instruction, which will be the first of the original bytecode.
In order to also cover these disassemblers we can use a conditional branch which
lets the used algorithm also discover our fill-array-data-payload instruction. At this
point we have to make sure, that the checked condition is always
true (opaque predicate).
A further improvement can be done by inserting an additional fill-array-data
instruction pointing to our fill-array-data-payload instruction. By this, we simulate a
usage of our overlapping instruction. This is necessary to also trick androguard,
which won't disassemble our fill-array-data-payload instruction in the other case.
We hope you got the idea. Now let's see how common reverse engineering tools
will handle this...
4. Evaluation
We also wanted to know how effective this obfuscation technique is. We created a
small application (attached as a crackme) and applied this technique. Therefore, we
implemented an obfuscator, which will automatically alter all methods within the dex
file. The yielding obfuscated app was then analyzed with common reverse engineering
tools:
- dexdump (part of the Android SDK)
- androguard (hg version 350:18359d716dc2)
- baksmali (git version 7d37656282f7b1c3d145a0666ad94f4cd491ff8d)
- radare2 (0.9 @ linux-little-x86)
- dedexer (ddx1.22.jar)
- ded (ded-0.7.1)
- dex2jar (hg version 448:ce5c7384ee89) + jd-gui 0.3.3
In the following we show the output of each tool for the method
org.dexlabs.poc.dexdropper.DropActivity.exec()
For some tools this was not possible because an analysis of the modified version leads to
crashes without producing any usable output.
4.1 Evaluation - dexdump
dexdump -d DexDropper.apk
#2 : (in Lorg/dexlabs/poc/dexdropper/DropActivity;)
name : 'exec'
type : '(Ljava/lang/String;)Ljava/lang/String;'
access : 0x0002 (PRIVATE)
code -
registers : 9
ins : 2
outs : 3
insns size : 108 16-bit code units
035afc: |[035afc] org.dexlabs.poc.dexdropper.DropActivity.exec:(Ljava/lang/String;)Ljava/lang/String;
035b0c: 3200 0900 |0000: if-eq v0, v0, 0009 // +0009
035b10: 2600 0300 0000 |0002: fill-array-data v0, 00000005 // +00000003
035b16: 0003 0100 c600 0000 2205 c301 7010 ... |0005: array-data (103 units)
catches : 1
0x0009 - 0x0053
Ljava/lang/IllegalArgumentException; -> 0x0054
Ljava/lang/IllegalAccessException; -> 0x0058
Ljava/lang/reflect/InvocationTargetException; -> 0x005c
Ljava/lang/InstantiationException; -> 0x0060
Ljava/lang/NoSuchMethodException; -> 0x0064
Ljava/io/IOException; -> 0x0068
positions :
...
As you can see, the original bytecode is not visible to the analyst. Only
instructions which had been injected by our obfuscator are shown.
4.2 Evaluation - androguard
androguard/androlyze.py -i DexDropper.apk -m exec
########## Method Information
Lorg/dexlabs/poc/dexdropper/DropActivity;->exec(Ljava/lang/String;)Ljava/lang/String; [access_flags=private]
########## Params
- local registers: v0...v7
- v8:java.lang.String
- return:java.lang.String
####################
***************************************************************************
0 0x0 if-eq v0, v0, +9
1 0x4 fill-array-data v0, +3 (0x7)
2 0xa fill-array-data-payload \x22\x05\xc3\x01\x70\x10\xe4\x0b\x05\x00\x6e\x10\x3e\x0c\x07\x00\x0c\x06\x6e\x10\x58\x00\x06\x00\x0c\x06\x6e\x20\xea\x0b\x65\x00\x0c\x05\x1a\x06\xa7\x00\x6e\x20\xeb\x0b\x65\x00\x0c\x05\x6e\x10\xef\x0b\x05\x00\x0c\x05\x12\x06\x71\x30\x99\x0b\x58\x06\x0c\x04\x1a\x05\x69\x06\x6e\x10\x3f\x0c\x07\x00\x0c\x06\x6e\x30\x98\x0b\x54\x06\x0c\x03\x12\x05\x23\x55\x0d\x02\x6e\x20\xaf\x0b\x53\x00\x0c\x01\x1a\x05\x45\x08\x12\x06\x23\x66\x0d\x02\x6e\x30\xb0\x0b\x53\x06\x0c\x00\x12\x05\x23\x55\x0e\x02\x6e\x20\xf7\x0b\x51\x00\x0c\x05\x12\x06\x23\x66\x0e\x02\x6e\x30\xf8\x0b\x50\x06\x0c\x05\x1f\x05\xc2\x01\x11\x05\x0d\x02\x1a\x05\x9b\x01\x28\xfc\x0d\x02\x1a\x05\x9b\x01\x28\xf8\x0d\x02\x1a\x05\x9b\x01\x28\xf4\x0d\x02\x1a\x05\x9b\x01\x28\xf0\x0d\x02\x1a\x05\x9b\x01\x28\xec\x0d\x02\x1a\x05\x9b\x01\x28\xe8
***************************************************************************
Also in this case you can see that our obfuscation technique is effective against
androguard. You also can see that the original bytecode is stored within the
fill-array-data-payload instruction.
We also tried androsim and compared the original with the obfuscated app.
androguard/androsim.py -i DexDropper.apk DexDropper_orig.apk
Elements:
IDENTICAL: 1
SIMILAR: 1
NEW: 715
DELETED: 0
SKIPPED: 0
--> methods: 0.269642% of similarities
Finally, androsim says that after applying obfuscation the app has no similarities
to the original one. This of course is very bad, when it comes to the point
where malware starts using obfuscation techniques.
4.3 Evaluation - baksmali
java -jar baksmali-1.3.3.jar -o output classes.dex
Stdout:Error occured while disassembling class Lorg.dexlabs.poc.dexdropper.DropActivity; - skipping class
java.lang.RuntimeException: Invalid code offset 83 for the try block end address
at org.jf.baksmali.Adaptors.MethodDefinition.addTries(MethodDefinition.java:478)
at org.jf.baksmali.Adaptors.MethodDefinition.writeTo(MethodDefinition.java:132)
at org.jf.baksmali.Adaptors.ClassDefinition.writeMethods(ClassDefinition.java:338)
at org.jf.baksmali.Adaptors.ClassDefinition.writeTo(ClassDefinition.java:116)
at org.jf.baksmali.baksmali.disassembleDexFile(baksmali.java:205)
at org.jf.baksmali.main.main(main.java:297)
baksmali has serious problems with disassembling our obfuscated dex file. This
can be of interest for app developers which want to protect their apps from being
repacked with malicious code, due to the fact that many malware authors use
baksmali/smali to inject their code.
File output:
.class public Lorg/dexlabs/poc/dexdropper/DropActivity;
.super Landroid/app/Activity;
.source "DropActivity.java"
<----->
.method private download()Ljava/lang/String;
.registers 16
.annotation system Ldalvik/annotation/Throws;
value = {
Ljava/net/MalformedURLException;,
Ljava/io/IOException;
}
.end annotation
if-eq v0, v0, :cond_9
fill-array-data v0, :array_5
:array_5
.array-data 0x1
0x12t
0xet
0x13t
<----->
#rest of the array data
<----->
0xct
0x0t
.end array-data
.prologue
:cond_9
.line 130
.line 132
.local v7, msg:[B
<---->
#meta information
<---->
.line 157
.local v2, bytesOut:[B
.line 158
.line 130
.end method
.method private exec(Ljava/lang/String;)Ljava/lang/String;
.registers 9
.parameter "dexfile"
For baksmali we present the output for two methods. For the download()
method (first one) we recognize a similar pattern as for the other tools. But for
the exec method we don't get any usable output at all. This is because some
errors occur while parsing the dex file as you can see from the console output.
4.4 Evaluation - radare2
radare2 -a dalvik classes.dex -s 0x00035b0c
[0x00035b0c]> pd 20
,=< 0x00035b0c 32000900 if-eq v0, v0, 9
| 0x00035b10 260003000000 fill-array-data v0, 50331648
| 0x00035b16 0003 nop
| 0x00035b18 0100 move v0, v0
| 0x00035b1a c600 add-float/2addr v0, v0
| 0x00035b1c 0000 nop
\-> 0x00035b1e 2205c301 new-instance v5, class+451
0x00035b22 7010e40b0500 invoke-direct {v5}, 0xd904
0x00035b28 6e103e0c0700 invoke-virtual {v7}, sym.method.244.getApplicationContextodsosByText0
0x00035b2e 0c06 move-result-object v6
0x00035b30 6e1058000600 invoke-virtual {v6}, sym.method.19.getFilesDir
0x00035b36 0c06 move-result-object v6
0x00035b38 6e20ea0b6500 invoke-virtual {v5, v6}, 0xd934
0x00035b3e 0c05 move-result-object v5
0x00035b40 1a06a700 const-string v6, str.temp
0x00035b44 6e20eb0b6500 invoke-virtual {v5, v6}, 0xd93c
0x00035b4a 0c05 move-result-object v5
radare2 doesn't fall for this obfuscation technique. The reason is very simple:
radare2 doesn't implement this instruction ;)
But also in the case they would have implemented it, we could still manually
seek to address 0x00035b1e and restart disassembling from this point on.
By doing this it is possible to recover the original bytecode.
4.5 Evaluation - dedexer
java -jar dedexer/ddx1.22.jar -d output classes.dex
Stdout:Processing android/annotation/SuppressLint
Processing android/annotation/TargetApi
Processing android/support/v4/accessibilityservice/AccessibilityServiceInfoCompat$AccessibilityServiceInfoStubImpl
Processing android/support/v4/accessibilityservice/AccessibilityServiceInfoCompat$AccessibilityServiceInfoIcsImpl
Processing android/support/v4/app/ActivityCompatHoneycomb
Processing android/support/v4/app/BackStackRecord$Op
Processing android/support/v4/app/BackStackRecord
Unknown instruction 0x7C at offset 000324DA
dedexer also doesn't implement all instructions and so it crashes while trying
to disassemble the inner values of our fill-array-data instruction. In the end
dedexer only disassembled a small part of our example app.
.method public getCanRetrieveWindowContent(Landroid/accessibilityservice/AccessibilityServiceInfo;)Z
.limit registers 3
; this: v1 (Landroid/support/v4/accessibilityservice/AccessibilityServiceInfoCompat$AccessibilityServiceInfoIcsImpl;)
; parameter[0] : v2 (Landroid/accessibilityservice/AccessibilityServiceInfo;)
if-eq v0,v0,l27282
fill-array-data v0,l2727a
l2727a: data-array
0x71 ; #0
0x10 ; #1
0x0E ; #2
0x01 ; #3
0x02 ; #4
0x00 ; #5
0x0A ; #6
0x00 ; #7
0x0F ; #8
0x00 ; #9
end data-array
.end method
For the first classes which have been "successfully" disassembled, we see that
our obfuscation technique is still working and the original bytecode is
hidden.
4.6 Evaluation - ded
ded-0.7.1 -j jasminclasses-2.4.0.jar -d output classes.dex
"ded" starts processing our dex file but gets stuck in an endless loop
resulting in 100% cpu load. For "ded" we got no output.
4.7 Evaluation - dex2jar + jd-gui
dex2jar-0.0.9.9/d2j-dex2jar.sh classes.dex
jd-gui
private String exec(String paramString)
{
throw new RuntimeException("Generated by Dex2jar, and Some Exception Caught :java.lang.NullPointerException\n\tat com.googlecode.dex2jar.ir.ts.ExceptionHandlerCurrectTransformer.transform(ExceptionHandlerCurrectTransformer.java:66)\n\tat com.googlecode.dex2jar.v3.V3MethodAdapter.visitEnd(V3MethodAdapter.java:214)\n\tat com.googlecode.dex2jar.v3.V3ClassAdapter$2.visitEnd(V3ClassAdapter.java:261)\n\tat com.googlecode.dex2jar.reader.DexFileReader.acceptMethod(DexFileReader.java:702)\n\tat com.googlecode.dex2jar.reader.DexFileReader.acceptClass(DexFileReader.java:446)\n\tat com.googlecode.dex2jar.reader.DexFileReader.accept(DexFileReader.java:333)\n\tat com.googlecode.dex2jar.v3.Dex2jar.doTranslate(Dex2jar.java:82)\n\tat com.googlecode.dex2jar.v3.Dex2jar.to(Dex2jar.java:219)\n\tat com.googlecode.dex2jar.v3.Dex2jar.to(Dex2jar.java:210)\n\tat com.googlecode.dex2jar.tools.Dex2jarCmd.doCommandLine(Dex2jarCmd.java:108)\n\tat com.googlecode.dex2jar.tools.BaseCmd.doMain(BaseCmd.java:118)\n\tat com.googlecode.dex2jar.tools.Dex2jarCmd.main(Dex2jarCmd.java:34)\n");
}
dex2jar is a converter which translates dalvik bytecode into java bytecode.
Based on this jd-gui tries to generate some meaningful java code. But in this
case we just get a "throw new RuntimeException" call. Just for some methods,
which don't use try/catch blocks, we got some "meaningful" output: public FragmentTransaction setTransitionStyle(int paramInt)
{
if (this != this)
{
this[0] = 89;
this[1] = 1;
this[2] = 63;
this[3] = 0;
this[4] = 17;
this[5] = 0;
}
}
5. Conclusion
All tested reverse engineering tools have serious problem parsing our
obfuscated app. We saw misleading output as well as crashing tools. Only
interactive tools were able to handle the obfuscation, but in this case
the analyst has to do this by hand which is sometimes not an option.
The point is that most disassemblers cannot handle overlapping instructions
and for console output it is hard to display this circumstance in a meaningful
way. So for dexter we chose a graphic representation, which you can see
in the following picture:
Please find attached our crackme where we have applied this obfuscation
technique. So you can test your reverse engineering tools and improve them.
UPDATE:
IDA Pro 6.3 seems to handle this fine as you can see in the picture. Thx to cryptax for the screenshot:
UPDATE:
Also androguard has fix this issus very quick. You have to use the interactive shell in order to use the new feature "set_code_idx":
androguard/androlyze.py -i new.apk -s
a, d, x = AnalyzeAPK(' DexDropper.apk')
d.CLASS_Lorg_dexlabs_poc_dexdropper_DropActivity.METHOD_exec.set_code_idx(0x12)
d.CLASS_Lorg_dexlabs_poc_dexdropper_DropActivity.METHOD_exec.pretty_show()
########## Method Information
Lorg/dexlabs/poc/dexdropper/DropActivity;->exec(Ljava/lang/String;)Ljava/lang/String; [access_flags=private]
########## Params
- local registers: v0...v7
- v8:java.lang.String
- return:java.lang.String
####################
***************************************************************************
exec-BB@0x0 :
0 (00000000) new-instance v5, Ljava/lang/StringBuilder;
1 (00000004) invoke-direct v5, Ljava/lang/StringBuilder;-><init>()V
2 (0000000a) invoke-virtual v7, Lorg/dexlabs/poc/dexdropper/DropActivity;->getApplicationContext()Landroid/content/Context;
3 (00000010) move-result-object v6 [ exec-BB@0x12 ]
exec-BB@0x12 :
4 (00000012) invoke-virtual v6, Landroid/content/Context;->getFilesDir()Ljava/io/File;
5 (00000018) move-result-object v6
6 (0000001a) invoke-virtual v5, v6, Ljava/lang/StringBuilder;->append(Ljava/lang/Object;)Ljava/lang/StringBuilder;
7 (00000020) move-result-object v5
8 (00000022) const-string v6, '/temp'
9 (00000026) invoke-virtual v5, v6, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
10 (0000002c) move-result-object v5
11 (0000002e) invoke-virtual v5, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
12 (00000034) move-result-object v5
13 (00000036) const/4 v6, #+0
....
crackme-obfuscator.apk