Can you “stop” a C program from being reverse engineered? [duplicate]

I’m sure that many of the middle to high level languages can be reverse engineered. But if a C program can be reverse-engineered, and turned back into editable source code, how do I discourage such a thing?

If someone can observe the program in operation, they can reverse-engineer it. Even if you could somehow prevent the user from accessing the internals of the program, they can perform black-box reverse engineering to build up a model of how the program reacts to various inputs.

It is impossible to prevent reverse engineering for any program that runs on your computer. But you can make steps that make it harder…

Program languages like Java and .NET are trivial to recreate human source code for, as they are byte code ‘compiled’. Meaning the programming language is compiled into a set of tokens that are interpreted by a run time engine or ‘virtual machine’. This makes the reverse compilation an achievable task.

Because of this ease of obtaining C# source code for any .NET library, there are several tricks often done to make it harder. One approach is to rewrite the code so that all the class and method names are not meaningful and similar. This makes the C# source code 100% available, but usually unintelligible to a human to understand what is going on in the code.

The next step is to make this harder is to create an encrypted version of the library, with wrapper code that decrypts the byte code library in memory. This makes it harder, but there is nothing stopping a determined person using an assembler debugger to step through the decryption process until the .NET byte code was decompiled and then doing a memory dump to get to the byte codes.

C has a huge advantage in protecting the source code, in that the source is compiled into machine code. This is not fool proof though. To allow developers to step through their code, the compiler will have options that prevent optimisations reorganising the code, and worse still, often add additional data to the executable to provide context to the code that would be useable for decompilation.

Where C code is non-trivial, compiled with optimisations turned on and stripped of all debugging data, trying to recreate understandable source code is beyond the effort/reward for most people.

That said, I learned how to program in assembler as a child by single stepping through the assembler tape loader code on ZX Spectrums designed to make it harder for people to copy, and/or alter the program to give you infinite lives.

Spending three days in a long summer holiday stepping through the code that was transcoding the set of bytes stored on the tape into the modified tape loading code that actually loaded the game seemed like a good use of my time back then, and started me out on a long career as a software developer as an adult.

Any program can be reverse engineered, because of the very simple principle that anything that a computer can execute has to be in a format where a person that understands machine code can read. (With the proper tools of course.)

The real question is, how easy is it to reverse engineer? C is trivial to decompile; I’ve seen some surprisingly clean C code generated from decompilation tools. But because C doesn’t have the same rich metadata as managed languages, you lose a lot of useful information, such as names of variables and data types, that help out with the task of reverse engineering. (Then again, most managed code systems have an “obfuscator” tool available that mangles the metadata to make such tasks more difficult on the reverse engineer. So the difference may not necessarily be all that big.)

Many anti-debugging techniques can be used to make reverse-engineering a hell of a task, but it’s not impossible.

Malware authors use them regularly, and many are applicable to any native machine code, including that compiled from C.

A simple example that was really popular in the early years was inserting thousands of breakpoints to make running to a particular code point tedious, as you have to remove all those breakpoints first.

Other methods are polymorphism, self-modifying code and run-time decryption/encryption or cache-based hacks to conceal the execution paths taken as long as possible.

Additionally, you can use traditional obfuscation techniques such as creating decoy code, opaque value construction, extra jumps and pseudo-conditions to make understanding code harder.

This works for any language, not just managed .NET, Java or PHP where reverse-engineering is normally trivial, but also for C/C++.

Malware authors provide the best examples, e.g. Rombertik as analyzed by Cisco.

Hardening against rev-engineering is usually an economic problem:

how easy is it to add protections
how much does it hinder maintenance
how does it affect resource use and performance (e.g. memory, CPU use),
how compatible is it (does it work anytime, anywhere, may it trigger an anti-virus),
how hard is it to break them

This ends in the question: Is it worth it?

Are the savings higher than the expenses?
Are the costs for breaking higher than the potential gains from rev-engineering?

There are obfuscators available for popular languages to automate adding protection measures to your programs, but their quality varies.

You can reverse engineer any program, the result will not be the original source code with comments, descriptive variable names, structure etc. but its functionality will be the same. You cannot prevent this, you can make it more difficult but you cannot not prevent it because at the end of day it will be instructions that the CPU will understand and thus you can interpret. The best way IMHO to prevent reverse engineering/pirate copy is to update your program frequently by adding more functionality, you will then reach a point when the effort to reverse engineer is too much of a hassle compared to buying/writing it yourself.

Don’t let anyone else access the program. Run it in a secure* server you control, and let users interface with it only by providing the input over the network and receiving back the output.

See http://en.wikipedia.org/wiki/Software_as_a_service

The downside is that you now have to manage the server.

*Securing a server is far from trivial. For truly high security, hire a professional.

As the other answers say, if you can see the binary you can reverse engineer the function of the program, although possibly with difficulty. The cutting edge is therefore to prevent the user from getting at the binary.

Old arcade systems did this by storing the program or critical parameters in battery-backed RAM linked to tamper switches. Open the case and the program evaporates. Modern iPhones use ARM “TrustZone” to load a microkernel from encrypted bootrom with privileged hardware access; this is used for the fingerprint sensor and some of the payment functions.

(Of course, this means the phone may be factory-compromised with no way for the user to detect or recover..)

Myself I’ve reported a couple of bugs in manufacturer firmware from ARM disassembly, and considered how much work it might be to write a decompilation assistant program.

Simple, at least in the U.S. You just put an anti-reverse-engineering clause in the program’s EULA, and threaten to sue the pants off of anyone who cracks it (this is no joke; the DMCA actually lets you do so).

To make C almost impossible to reverse engineer? It is possible – write your C such that the entire logic is dependent on the input AT RUNTIME (perhaps now you have to protect your input, which hold the secret to your logic in the C program). So reading the C source code really has no meaning.

How? One way: Using lots of function pointers. So reading:

a->function1();
c->function2();

it is impossible to guess what function1() does, as it is a dynamically allocated pointer assigned only during runtime – depending on the input.

If you create lots of basic functions like the above, and then dynamically construct your logic by arranging the input token, it is possible to implement any logic.

In assembly language, the analogous equivalent is called “Return-oriented programming” – where you can construct almost any logic possible using carefully selected hexadecimal number from the original program:

http://en.wikipedia.org/wiki/Return-oriented_programming

In a way, what I am proposing is something like an “interpreter”: eg, A javascript interpreter program itself, being written in C, can behave in many different ways, depending on the “javascript” program as an input. So there is no point reversing the interpreter itself – as the “javascript” input program is the ultimate logic machine.

Update:

Assuming you have the input, to RE it dynamically can be very difficult as well, because if you use software breakpoints (assuming 0xcc instrumented into the code) then by self-checksumming the code area, plus checksum of many randomly selected small block, and compare the checksum with original uninstrumented code, you can easily detect any software breakpoint. And if you use hardware breakpoint? well, timing analysis between any randomly selected pieces of code will easily detect that someone is trying to analyse the code. And if you use a software emulator like Boch/pintool/Intel branch tracing that does not use software breakpoint, and can deceive the clock enough to bypass timing analysis, well, it is possible it can be done. But perhaps the amount of data to analyse may amount many gigs of data. Not easy.

But to be able to do the above two tasks, it is no longer C – you have to take the binary output and instrument it with checksum-checking codes, plus timing analysis code between any two points – all these inserted randomly as well.

A real world example of binary that is difficult to analyse (or taken years and big Anti-Virus companies are still trying to understand it): Stuxnet. And this binary (not sure?) does not connect to internet, run standalone.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 18:01

Thẻ: c++, security