21 July 2012: Dalvik Bytecode Obfuscation on Android
                        Android Bytecode Obfuscation
                             Patrick Schulz 2012

-- "Like on x86 - 20 years ago"

1. Motivation
  While working on our disassembler for dexter we got a deep inner
  view of the Dalvik Virtual Machine (DVM), which executes the Dalvik
  bytecode. To get a better intuition about the problems we would run 
  into when implementing our disassembler, we tested a lot of existing 
  reverse engineering tools,  but have not found a disassembler which 
  fulfills all our requirements (full code coverage, basic block view, ...)
  Then we started designing our own. To get a good and stable disassembler 
  we played a lot with the existing ones and found some problems, which 
  are well known and also solved under the x86 architecture.

  In the following we describe one of these flaws which can be turned into
  an obfuscation technique called junk byte injection and has been
  implemented as a Proof-of-Concept Obfuscator.

2. Obfuscation Technique
  The presented obfuscation technique is called junk byte injection is well
  known under the x86 architecture for years. 
  First thing to decide while implementing a disassembler is the 
  underlying algorithm for finding new instructions. The most famous one is
  recursive traversal, but due to simplicity of the dalvik bytecode format
  some tools use linear sweep which is fine for apps that have not been 
  obfuscated. Linear sweep is also used within the DVM as part of a bytecode 
  verifier, which checks all applications before they can be installed.

  The main problem with linear sweep is that the algorithm cannot decide
  if a particular byte is part of an instruction or if it is just dead code or data.
  The problem appears when having overlapping instructions. In this case it is 
  likely that one of these instructions is not visible to the algorithm and in 
  consequence also not to the analyst. 
  On Android we can make use of some instructions with various length, so 
  we can hide following instructions. For this PoC we choose the 
  fill-array-data-payload instruction.
Format of the fill-array-data-payload:
Name                Format                     Description
ident               ushort = 0x0300            identifying pseudo-opcode
element_width       ushort                     number of bytes in each element
size                uint                       number of elements in the table
data                ubyte[]                    data values
This is a so called pseudo instruction that can not be executed. It just holds data for a regular one. In the Dalvik documentation you can find the following comment:
 There are several "pseudo-instructions" that are used to hold variable-length data payloads, which are referred to by regular instructions (for example, fill-array-data). Such instructions must never be encountered during the normal flow of execution. In addition, the instructions must be located on even-numbered bytecode offsets (that is, 4-byte aligned). In order to meet this requirement, dex generation tools must emit an extra nop instruction as a spacer if such an instruction would otherwise be unaligned. Finally, though not required, it is expected that most tools will choose to emit these instructions at the ends of methods, since otherwise it would likely be the case that additional instructions would be needed to branch around them. 
The goal of this obfuscation technique is to combine junk byte injection with such instructions in order to hide as much of the original bytecode as possible. 3. Approach We use the fill-array-data-payload instruction which has a variable length to hide the original bytecode. So, we place it at the very beginning of a method and fill the element_width and size operand with values so that this instruction will overlap all upcoming instruction of this method, as you can see here: To ensure that this instruction will not be executed we place an (unconditional) branch in front of it, which will point to the first instruction of the original bytecode. By this we also ensure that we don't alter the behavior of the particular method. When a disassembler using linear sweep will process this method it will observe the branch instruction as well as the fill-array-data-payload instruction, but not the overlapped ones. Therefore the original bytecode will not be shown (as instruction) to the analyst. For disassemblers using recursive traversal this won't work. In this case the disassembler will observe the branch instruction and follows the control flow to the next instruction, which will be the first of the original bytecode. In order to also cover these disassemblers we can use a conditional branch which lets the used algorithm also discover our fill-array-data-payload instruction. At this point we have to make sure, that the checked condition is always true (opaque predicate). A further improvement can be done by inserting an additional fill-array-data instruction pointing to our fill-array-data-payload instruction. By this, we simulate a usage of our overlapping instruction. This is necessary to also trick androguard, which won't disassemble our fill-array-data-payload instruction in the other case. We hope you got the idea. Now let's see how common reverse engineering tools will handle this... 4. Evaluation We also wanted to know how effective this obfuscation technique is. We created a small application (attached as a crackme) and applied this technique. Therefore, we implemented an obfuscator, which will automatically alter all methods within the dex file. The yielding obfuscated app was then analyzed with common reverse engineering tools: - dexdump (part of the Android SDK) - androguard (hg version 350:18359d716dc2) - baksmali (git version 7d37656282f7b1c3d145a0666ad94f4cd491ff8d) - radare2 (0.9 @ linux-little-x86) - dedexer (ddx1.22.jar) - ded (ded-0.7.1) - dex2jar (hg version 448:ce5c7384ee89) + jd-gui 0.3.3 In the following we show the output of each tool for the method org.dexlabs.poc.dexdropper.DropActivity.exec() For some tools this was not possible because an analysis of the modified version leads to crashes without producing any usable output. 4.1 Evaluation - dexdump
dexdump -d DexDropper.apk
    #2              : (in Lorg/dexlabs/poc/dexdropper/DropActivity;)
      name          : 'exec'
      type          : '(Ljava/lang/String;)Ljava/lang/String;'
      access        : 0x0002 (PRIVATE)
      code          -
      registers     : 9
      ins           : 2
      outs          : 3
      insns size    : 108 16-bit code units
035afc:                                        |[035afc] org.dexlabs.poc.dexdropper.DropActivity.exec:(Ljava/lang/String;)Ljava/lang/String;
035b0c: 3200 0900                              |0000: if-eq v0, v0, 0009 // +0009
035b10: 2600 0300 0000                         |0002: fill-array-data v0, 00000005 // +00000003
035b16: 0003 0100 c600 0000 2205 c301 7010 ... |0005: array-data (103 units)
      catches       : 1
        0x0009 - 0x0053
          Ljava/lang/IllegalArgumentException; -> 0x0054
          Ljava/lang/IllegalAccessException; -> 0x0058
          Ljava/lang/reflect/InvocationTargetException; -> 0x005c
          Ljava/lang/InstantiationException; -> 0x0060
          Ljava/lang/NoSuchMethodException; -> 0x0064
          Ljava/io/IOException; -> 0x0068
      positions  :
As you can see, the original bytecode is not visible to the analyst. Only instructions which had been injected by our obfuscator are shown. 4.2 Evaluation - androguard
androguard/androlyze.py -i DexDropper.apk -m exec
########## Method Information
Lorg/dexlabs/poc/dexdropper/DropActivity;->exec(Ljava/lang/String;)Ljava/lang/String; [access_flags=private]
########## Params
- local registers: v0...v7
- v8:java.lang.String
- return:java.lang.String
0 0x0 if-eq v0, v0, +9
1 0x4 fill-array-data v0, +3 (0x7)
2 0xa fill-array-data-payload \x22\x05\xc3\x01\x70\x10\xe4\x0b\x05\x00\x6e\x10\x3e\x0c\x07\x00\x0c\x06\x6e\x10\x58\x00\x06\x00\x0c\x06\x6e\x20\xea\x0b\x65\x00\x0c\x05\x1a\x06\xa7\x00\x6e\x20\xeb\x0b\x65\x00\x0c\x05\x6e\x10\xef\x0b\x05\x00\x0c\x05\x12\x06\x71\x30\x99\x0b\x58\x06\x0c\x04\x1a\x05\x69\x06\x6e\x10\x3f\x0c\x07\x00\x0c\x06\x6e\x30\x98\x0b\x54\x06\x0c\x03\x12\x05\x23\x55\x0d\x02\x6e\x20\xaf\x0b\x53\x00\x0c\x01\x1a\x05\x45\x08\x12\x06\x23\x66\x0d\x02\x6e\x30\xb0\x0b\x53\x06\x0c\x00\x12\x05\x23\x55\x0e\x02\x6e\x20\xf7\x0b\x51\x00\x0c\x05\x12\x06\x23\x66\x0e\x02\x6e\x30\xf8\x0b\x50\x06\x0c\x05\x1f\x05\xc2\x01\x11\x05\x0d\x02\x1a\x05\x9b\x01\x28\xfc\x0d\x02\x1a\x05\x9b\x01\x28\xf8\x0d\x02\x1a\x05\x9b\x01\x28\xf4\x0d\x02\x1a\x05\x9b\x01\x28\xf0\x0d\x02\x1a\x05\x9b\x01\x28\xec\x0d\x02\x1a\x05\x9b\x01\x28\xe8
Also in this case you can see that our obfuscation technique is effective against androguard. You also can see that the original bytecode is stored within the fill-array-data-payload instruction. We also tried androsim and compared the original with the obfuscated app.
androguard/androsim.py -i DexDropper.apk DexDropper_orig.apk
	 NEW:		715
	--> methods: 0.269642% of similarities
Finally, androsim says that after applying obfuscation the app has no similarities to the original one. This of course is very bad, when it comes to the point where malware starts using obfuscation techniques. 4.3 Evaluation - baksmali
java -jar baksmali-1.3.3.jar -o output classes.dex
Error occured while disassembling class Lorg.dexlabs.poc.dexdropper.DropActivity; - skipping class
java.lang.RuntimeException: Invalid code offset 83 for the try block end address
	at org.jf.baksmali.Adaptors.MethodDefinition.addTries(MethodDefinition.java:478)
	at org.jf.baksmali.Adaptors.MethodDefinition.writeTo(MethodDefinition.java:132)
	at org.jf.baksmali.Adaptors.ClassDefinition.writeMethods(ClassDefinition.java:338)
	at org.jf.baksmali.Adaptors.ClassDefinition.writeTo(ClassDefinition.java:116)
	at org.jf.baksmali.baksmali.disassembleDexFile(baksmali.java:205)
	at org.jf.baksmali.main.main(main.java:297)
baksmali has serious problems with disassembling our obfuscated dex file. This can be of interest for app developers which want to protect their apps from being repacked with malicious code, due to the fact that many malware authors use baksmali/smali to inject their code. File output:
.class public Lorg/dexlabs/poc/dexdropper/DropActivity;
.super Landroid/app/Activity;
.source "DropActivity.java"


.method private download()Ljava/lang/String;
    .registers 16
    .annotation system Ldalvik/annotation/Throws;
        value = {
    .end annotation

    if-eq v0, v0, :cond_9

    fill-array-data v0, :array_5

    .array-data 0x1
#rest of the array data
    .end array-data
    .line 130
    .line 132
    .local v7, msg:[B
#meta information
    .line 157
    .local v2, bytesOut:[B
    .line 158
    .line 130
.end method

.method private exec(Ljava/lang/String;)Ljava/lang/String;
    .registers 9
    .parameter "dexfile"
For baksmali we present the output for two methods. For the download() method (first one) we recognize a similar pattern as for the other tools. But for the exec method we don't get any usable output at all. This is because some errors occur while parsing the dex file as you can see from the console output. 4.4 Evaluation - radare2
radare2 -a dalvik classes.dex -s 0x00035b0c
[0x00035b0c]> pd 20
   ,=< 0x00035b0c     32000900         if-eq v0, v0, 9
   |   0x00035b10     260003000000     fill-array-data v0, 50331648
   |   0x00035b16     0003             nop
   |   0x00035b18     0100             move v0, v0
   |   0x00035b1a     c600             add-float/2addr v0, v0
   |   0x00035b1c     0000             nop
   \-> 0x00035b1e     2205c301         new-instance v5, class+451
       0x00035b22     7010e40b0500     invoke-direct {v5}, 0xd904
       0x00035b28     6e103e0c0700     invoke-virtual {v7}, sym.method.244.getApplicationContextodsosByText0
       0x00035b2e     0c06             move-result-object v6
       0x00035b30     6e1058000600     invoke-virtual {v6}, sym.method.19.getFilesDir
       0x00035b36     0c06             move-result-object v6
       0x00035b38     6e20ea0b6500     invoke-virtual {v5, v6}, 0xd934
       0x00035b3e     0c05             move-result-object v5
       0x00035b40     1a06a700         const-string v6, str.temp
       0x00035b44     6e20eb0b6500     invoke-virtual {v5, v6}, 0xd93c
       0x00035b4a     0c05             move-result-object v5
radare2 doesn't fall for this obfuscation technique. The reason is very simple: radare2 doesn't implement this instruction ;) But also in the case they would have implemented it, we could still manually seek to address 0x00035b1e and restart disassembling from this point on. By doing this it is possible to recover the original bytecode. 4.5 Evaluation - dedexer
java -jar dedexer/ddx1.22.jar -d output classes.dex
Processing android/annotation/SuppressLint
Processing android/annotation/TargetApi
Processing android/support/v4/accessibilityservice/AccessibilityServiceInfoCompat$AccessibilityServiceInfoStubImpl
Processing android/support/v4/accessibilityservice/AccessibilityServiceInfoCompat$AccessibilityServiceInfoIcsImpl
Processing android/support/v4/app/ActivityCompatHoneycomb
Processing android/support/v4/app/BackStackRecord$Op
Processing android/support/v4/app/BackStackRecord
Unknown instruction 0x7C at offset 000324DA
dedexer also doesn't implement all instructions and so it crashes while trying to disassemble the inner values of our fill-array-data instruction. In the end dedexer only disassembled a small part of our example app.
.method public getCanRetrieveWindowContent(Landroid/accessibilityservice/AccessibilityServiceInfo;)Z
.limit registers 3
; this: v1 (Landroid/support/v4/accessibilityservice/AccessibilityServiceInfoCompat$AccessibilityServiceInfoIcsImpl;)
; parameter[0] : v2 (Landroid/accessibilityservice/AccessibilityServiceInfo;)
	if-eq	v0,v0,l27282
	fill-array-data	v0,l2727a
l2727a:	data-array
		0x71	; #0
		0x10	; #1
		0x0E	; #2
		0x01	; #3
		0x02	; #4
		0x00	; #5
		0x0A	; #6
		0x00	; #7
		0x0F	; #8
		0x00	; #9
	end data-array
.end method
For the first classes which have been "successfully" disassembled, we see that our obfuscation technique is still working and the original bytecode is hidden. 4.6 Evaluation - ded
ded-0.7.1 -j jasminclasses-2.4.0.jar -d output classes.dex
"ded" starts processing our dex file but gets stuck in an endless loop resulting in 100% cpu load. For "ded" we got no output. 4.7 Evaluation - dex2jar + jd-gui
dex2jar- classes.dex
  private String exec(String paramString)
    throw new RuntimeException("Generated by Dex2jar, and Some Exception Caught :java.lang.NullPointerException\n\tat com.googlecode.dex2jar.ir.ts.ExceptionHandlerCurrectTransformer.transform(ExceptionHandlerCurrectTransformer.java:66)\n\tat com.googlecode.dex2jar.v3.V3MethodAdapter.visitEnd(V3MethodAdapter.java:214)\n\tat com.googlecode.dex2jar.v3.V3ClassAdapter$2.visitEnd(V3ClassAdapter.java:261)\n\tat com.googlecode.dex2jar.reader.DexFileReader.acceptMethod(DexFileReader.java:702)\n\tat com.googlecode.dex2jar.reader.DexFileReader.acceptClass(DexFileReader.java:446)\n\tat com.googlecode.dex2jar.reader.DexFileReader.accept(DexFileReader.java:333)\n\tat com.googlecode.dex2jar.v3.Dex2jar.doTranslate(Dex2jar.java:82)\n\tat com.googlecode.dex2jar.v3.Dex2jar.to(Dex2jar.java:219)\n\tat com.googlecode.dex2jar.v3.Dex2jar.to(Dex2jar.java:210)\n\tat com.googlecode.dex2jar.tools.Dex2jarCmd.doCommandLine(Dex2jarCmd.java:108)\n\tat com.googlecode.dex2jar.tools.BaseCmd.doMain(BaseCmd.java:118)\n\tat com.googlecode.dex2jar.tools.Dex2jarCmd.main(Dex2jarCmd.java:34)\n");
dex2jar is a converter which translates dalvik bytecode into java bytecode. Based on this jd-gui tries to generate some meaningful java code. But in this case we just get a "throw new RuntimeException" call. Just for some methods, which don't use try/catch blocks, we got some "meaningful" output:
  public FragmentTransaction setTransitionStyle(int paramInt)
    if (this != this)
      this[0] = 89;
      this[1] = 1;
      this[2] = 63;
      this[3] = 0;
      this[4] = 17;
      this[5] = 0;
5. Conclusion All tested reverse engineering tools have serious problem parsing our obfuscated app. We saw misleading output as well as crashing tools. Only interactive tools were able to handle the obfuscation, but in this case the analyst has to do this by hand which is sometimes not an option. The point is that most disassemblers cannot handle overlapping instructions and for console output it is hard to display this circumstance in a meaningful way. So for dexter we chose a graphic representation, which you can see in the following picture: Please find attached our crackme where we have applied this obfuscation technique. So you can test your reverse engineering tools and improve them. UPDATE: IDA Pro 6.3 seems to handle this fine as you can see in the picture. Thx to cryptax for the screenshot: UPDATE: Also androguard has fix this issus very quick. You have to use the interactive shell in order to use the new feature "set_code_idx":
androguard/androlyze.py -i new.apk -s
a, d, x = AnalyzeAPK(' DexDropper.apk')
########## Method Information
Lorg/dexlabs/poc/dexdropper/DropActivity;->exec(Ljava/lang/String;)Ljava/lang/String; [access_flags=private]
########## Params
- local registers: v0...v7
- v8:java.lang.String
- return:java.lang.String
exec-BB@0x0 : 
	0  (00000000) new-instance         v5, Ljava/lang/StringBuilder;
	1  (00000004) invoke-direct        v5, Ljava/lang/StringBuilder;-><init>()V
	2  (0000000a) invoke-virtual       v7, Lorg/dexlabs/poc/dexdropper/DropActivity;->getApplicationContext()Landroid/content/Context;
	3  (00000010) move-result-object   v6 [ exec-BB@0x12 ]

exec-BB@0x12 : 
	4  (00000012) invoke-virtual       v6, Landroid/content/Context;->getFilesDir()Ljava/io/File;
	5  (00000018) move-result-object   v6
	6  (0000001a) invoke-virtual       v5, v6, Ljava/lang/StringBuilder;->append(Ljava/lang/Object;)Ljava/lang/StringBuilder;
	7  (00000020) move-result-object   v5
	8  (00000022) const-string         v6, '/temp'
	9  (00000026) invoke-virtual       v5, v6, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
	10 (0000002c) move-result-object   v5
	11 (0000002e) invoke-virtual       v5, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
	12 (00000034) move-result-object   v5
	13 (00000036) const/4              v6, #+0
< Blog List