← back

2017-01-26: examining golang binaries and upx compression

Even though C is and continues to be a favourite language of mine for a number of years, lately I'm been somewhat interested in any alternatives that might actually be available.

Ideally, in situations that call for performance, yet without the need to write an entire strings library or harness the libboost stack-and-friends. A popular one that comes to mind is Rust. That said, after glancing at the sort of unique methodology that would be demanded from it, I decided that at least for the time being I would prefer to pick a language that more closely matches the C-style syntax I am familiar with.

As an aside, since I have a bit of experience as a browser developer, perhaps I'll look into Rust another time if I ever want to play around with the Servo engine code. I'll hold off on that until some future date, I think.

Now one of more widely used languages these days is Google's latest and greatest, the golang programming language. A number of larger players in the industry (even outside of Google!) attest to its utility and speed, most notably Amazon. So I felt taking a bit of time to look into this was warranted.

To get a feel for the syntax, I hastily wrote a small hwmon temperature program, you can find it here, which grabs data from the relevant Linux system directories, as per the 4.0-4.8 series of kernels. After a run of the `go build` command to compile, I found it produced binaries that were rather on the big side, at least in my opinion. Roughly 2.5MB for a small 260 line program, which merely reads from files and prints the numerical value to stdout.

I began my investigation into this using the `strings` commandline util to see if I could obtain any unencoded symbols from the binary. This ended up outputting about half a megabyte of various subroutine names from each of the imported files.

Even saw a couple of hardcoded locations embedded in the debug portion of the file, specifically this:


Ah, nothing better than binaries exec'ing script files. Best of all, a Python script. Nothing could possibly go wrong. Admittedly, this gets utilized only during debugging, but still...

Switching over to objdump, I went ahead and gleened the binary. It was actually fun to look at. Here is a peek:

mov 0x18(%rsp),%rbp
cmp %rbp,%rax
jge 4029d1 <type..hash.[3]interface {}+0x71>
mov %rax,0x20(%rsp)
mov 0x30(%rsp),%rbx
cmp $0x0,%rbx
je 4029db <type..hash.[3]interface {}+0x7b>
mov %rax,%rbp
shl $0x4,%rbp
add %rbp,%rbx
mov %rbx,(%rsp)
mov %rcx,0x38(%rsp)
mov %rcx,0x8(%rsp)
callq 403100 <runtime.nilinterhash>

That's part of the golang hash implementation, which is pleasantly well symbolled, likely to make x86-64 debugging easier. Much easier to look at than the old 16/32bit x86 codes, but it does sorta confirm the binaries are filled with literal multitudes.

Anyway, with this in mind, a quick search online suggests that I am not alone in my examination of golang and its beefier binaries, notably mentioned here, here and here.

As surmised from the volume of symbols present in the binary and confirmed by StackOverflow, golang is indeed statically linked with a number of libraries. This explains all of the above.

I also noticed that David Crawshaw, one of the language authors, is aware of the current situation, and has actively worked on solving it in the newer 1.7 and 1.8 versions of golang.

Improvements on that front are always nice, especially the inclusion of method pruning and reflection during the compilation. Still, it does make me wonder how many golang binaries are floating around out there with plaintext or easy to reverse engineer content like the aforementioned.

Back on point, most machines likely have no issue with sizable applications on their hard drives, but I figure it couldn't hurt to check if some easy method exists to improve this.

With a little bit of searching around online, I noted two particular argument flags.

Filippo Valsorda mentions that go-build actually offers a symbol stripper which can be accomplished in the following fashion:

go build -ldflags="-s -w"

That shrunk the binary from 2.5MB to 1.7MB, so an improvement of a sort. As noted on the blog post, I can confirm it's mostly stripping the debug section, which is probably a good thing, but really not that amazing overall.

Filippo then goes on to use UPX, which is a pretty neat binary packer, but it does mean you have to agree to their slightly restrictive license terms. Yet it does reduce my particular binary to a size of 531kB, which means I can now technically drag it around on a floppy; a fairly staggering achievement for 2017!

UPX uses an LZMA-like lossless compression algorithm called UCL, since they wanted the end-users to decompress it as well. Ah, but before I do that, I think it might be worthwhile to take a glance at the substitution coder table for fun, which you can see below:

!!!! ""## #$$$ %%&& &''' (()) )*** ++,, ,--- ..// /000 1122 2333 4455 5666 7788 8999 ::;; ;<<< ==>> >??? @@AA ABBB CCDD DEEE FFGG GHHH IIJJ JKKK LLMM MNNN OOPP PQQQ RRSS STTT UUVV VWWW XXYY YZZZ [[\\ \]]] ^^__ _``` aabb bccc ddee efff gghh hiii jjkk klll mmnn nooo 9999 9999 9999 9999 9999 9999 9999 9999 9999 9999 NNNN NNNN NNNN NNNN NNNN NNNN NNNN NNNN NNNN NNNN

I tend to find they're interesting and nifty to look at, since they give a concrete idea of what "lossless" compression means; basically a kind of fancy hash array. Also notice how the table is terminated by means of ten 9999 and ten NNNN characters.

Going back to the binary itself, it was good to see that the symbols had more or less been dealt with, aside from obvious watermark footer embedded in the file:

$Info: This file is packed with the UPX executable packer http://upx.sf.net $ $Id: UPX 3.91 Copyright (C) 1996-2013 the UPX Team. All Rights Reserved. $

After compression, objdump had difficulty, which makes sense given it's now a compressed archive for all practical purposes.

Well, that concludes my brief exploration of golang binaries and an overview of UPX. I might consider doing another investigation once golang 1.8 arrives and is more widely available.