Obfuscation and obfuscatorshomeI've been thinking about obfuscators lately ;). There is a widely discussed problem of .NET assemblies having very open format, and being very easy to reverse-engineer using disassemblers. People are concerned about protecting their intellectual property, and people would like to know what can be done to solve this problem. One of the answers to this is obfuscators.
Metadata stores the names, the layouts and pretty much all the logistics inside the assembly. Metadata is what makes your assembly easy to reverse-engineer. It makes all your type/method/field/property/event names visible to whoever wants to look at it. IL or Intermediate Language is a simple object oriented byte-code-like language (which contributes much to its readability). Together with metadata, IL makes it really easy for people with little training to read your managed code and understand what is going on in it. Obfuscators try to attack the problem by changing the contents of the metadata and/or IL to make them less readable. They do it by:
The most important and reliable approach, in my opinion, is mangling the names. Names are pretty much the only piece of information you can change without affecting anything else. It is important that the obfuscator changes the names so that reverse transformation is not possible. The more information it discards the better. For example, an obfuscator that just takes a name and replaces each letter with a next one in the alphabet is no good because no information is discarded and there is a way to get the original names back. A better obfuscator would use cryptographic hash functions (such as SHA1 or MD5) to derive new names from the old ones, bit it would still not be the best approach – not all information is discarded and in some cases hashes are potentially reversible. The best obfuscator would completely throw the old names away and replace them with new names. For example, “MyFileReader” would get replaced with “A”, FileRead with “B”, “fileName” with “C” and so forth. This way it is not going to be possible to restore the names. More advanced tools take one step further – the analyze member visibility and try to “conserve” the name usage – for example int foo(int i), string bar(string s) and int bar(string s) would be replaced with “int a(int a) “, “string a(string a)” and “int a(string a)” respectively. Note that the last replacement in the example also makes this code un-consumable by IL->C# or IL->VB decompilers as those languages don’t allow overloads that only differ in the return type. I don’t believe too much in encrypting strings/resources (at least not until we have a good DRM story with hardware support). The simple truth is whatever can be executed can also be copied/read. Whenever somebody comes up with an obfuscator that encrypts strings only to decrypt them at run time, they should expect a friendly anonymous competitor to come up with a “de-obfuscator” tool that removes the encryption and exposes everything back. Same deal for IL flow change. In most of the .NET code IL comes in its optimized form, and changing its flow is not only going to affect its size and performance, but also (remember three paragraphs above?) does not remove any information! Whenever somebody writes a standalone IL optimizer (which is not much harder to do than write a good .NET language compiler), running that tool on the “obfuscated” IL is going to get us back to optimized and easily readable IL. So my point is that IL flow obfuscation has little value compared to simple name obfuscation. There are two major kinds of obfuscation tools – those working directly on the assembly binary and those decompiling assembly into IL using ildasm.exe and then recompiling it back using ilasm.exe after they have done their thing. I like the former better because it is more universal – for example the ildasm/ilasm approach will probably not work too well on mixed mode assemblies (where both managed and unmanaged code are in the same executable). As a case study and as a fun weekend project I have written my own simple obfuscator that operates directly on the assemblies and replaces the names with mangled strings equal in length to the original names. I call it a “pseudo zero impact” obfuscator, because you are basically getting your old assembly back, and the only thing that changed is the names, everything else (length, IL, etc.) is the same. While very simple this type of obfuscation works really well; I found myself unable to just read and understand code of a random application that I have processed by my tool. Unless you have a “pseudo zero impact” tool ;-)) you should expect your obfuscated assembly change its size. Some tools will shrink it because the names’ sizes are going to shrink. If your names are already very short the assembly size will probably grow. The reason for this is string interning in the metadata. Imagine your application using Console.WriteLine and having its own class with a member called WriteLine. That assembly’s metadata will have just one string “WriteLine” inside it. An obfuscator will need to keep the original “WriteLine” so that you could continue using Console, and it will have to create another string entry for the obfuscated name of your own WriteLine. Another thing that is going to change in an obfuscated assembly is its signature. If your assembly has a strong name or a publisher signature you those will become invalid after obfuscation and you will have to re-sign the assembly (for example, sn.exe tool has –R option for re-signing). Those were some thoughts I had on the topic.. Update: there is an excellent how-to guide on obfuscators at http://www.howtoselectguides.com/dotnet/obfuscators/ Looking for programming tools and software? Look no further. At our download directory you may find a large selection of Active-X controls, components and libraries as well as code editors, setup and distribution tools.
(c) Ivan Medvedev 2003-2007 |