I am Developing a program in several platforms and languages, But I don't want anybody to discover the origin computer where the program was developed, is there any way that someone (especially the government!) can discover that ?

I am not developing viruses or anything harmful, But There is a revolution in my country and I'll be killed if someone discovered the programmer (you should guess where I am living!) so please tell me if there are ways to discover that even if you don't want to tell me how.

What about compiling inside a virtual machine?
–
maligreeNov 23 '11 at 22:40

2

@maligree possibly, will certainly help, but I wouldn't expect it to a perfect magic bullet. I would check over very carefully the output for anything that could link back.
–
ewanm89Nov 24 '11 at 1:19

1

Can you develop your program in a scripting language? Whether Perl or Javascript, none of the scripted languages is very susceptible to accidental hidden channels. (Intentional hidden channels, OTOH...)
–
MSaltersNov 24 '11 at 15:50

3 Answers
3

Source code consists in a bunch of text files. The contents of a text file are exactly what a text editor shows, so you can control that "visually". Beware of revision control systems such as CVS or Subversion: they can automatically replace some specific tags in source code (like "$Id$") with an identifying string which may contain the current date and time, your login name, and other information -- that feature is considered to be good for traceability, but I understand that you would not like it in your specific case.

Compiled code is quite something else. Some compilers may automatically added identifying strings like what revision control software does, as "comment" fields in the executable structure. This needs not even be a deliberate spying device: traceability is really a good idea in the general case; no need to imagine a government bribing compiler developers into adding such things in compilers just to be able to spy on programmers. Also, executable formats often include some "blanks" -- unused parts added for alignment reasons -- which the compiler might not have bothered with filling with zeros, instead of just writing what was in RAM at that place. This has occurred with an older version of lcc-win32, which was thus writing out random excerpts of the RAM contents which could contain confidential information (I think this has been fixed for lcc-win32, but it could happen with other toolsets).

Other file formats can also embed (and thus leak) some information. For instance, PNG images can include "comments" (which do not change the visual aspect of the picture in any way). GIMP, an image manipulation program, uses the comment field to state that it was involved in the image processing; any tool could also add some information which, in your view, would be less benign.

Many potential leaks can be detected visually, by looking at the files as if they were text. But this does not cover the possibility of one of your tools being voluntarily bugged so that it includes incriminating evidence in its output (such tracing information would be encrypted so as to "look random" except for whoever knows where to look).

Unfortunately for the state of the World at large, "a revolution in my country" is not a very precise indication. There currently are armed insurrections or similar unrest in quite a few countries just now, including, but not limited to, Afghanistan, Yemen, Syria, Somalia, parts of Libya, Colombia, Sudan, and Southern Sahara; and things are not completely clear in Egypt, Iraq or Iran, just to include the few I can think of from memory.

Microsoft's compilers like to go further and start signing the applications by default (usually with self signed keys).
–
ewanm89Nov 24 '11 at 1:16

4

Worse than that - Microsoft compilers (the ddk does this especially) embed absolute file system references to pdb (debug database) files into compiled C/C++ code. Search your compiled exe for c:\Users\username\... if you've used that path. Not good if your user name happens to be your full name...
–
user2213Nov 24 '11 at 12:26

There's a whole discipline devoted to extracting information out of computer files and systems: computer forensics.

Computer files are relatively easy to make anonymous. Relatively to network traffic, that is: files are just a bunch of bits, whereas network traffic carries a lot of information through timing. On an absolute scale, anonymity of nontrivial files is often not so easy to carry out.

The first thing to make sure is that there is no identifying information in your source files, except for comments. Comments are mostly safe, though they do influence line numbers that can appear in debugging information (e.g. __LINE__ in C). In particular, as Tom notes, make sure your source files don't have RCS tags or the like. Use very generic names for functions, variables, classes, source files, etc., as well, as many of these make their way into the executable.

Your compiler can be identified with a good success rate by a sufficiently motivated examiner, as different compilers compile and optimize commonplace code in different ways. So make sure to use a very common compiler. The same applies to any libraries your program is linked against.

Then, make some basic tests. Compile the same program on several machines with the same compiler version, and make sure the results are bit-for-bit identical. The test will be more conclusive if the machines are as different as possible apart from the compiler and libraries (different OS version, different user name, different language settings, …).

Under unix systems (such as Linux, or Cygwin under Windows), run the strings command on your binaries to find printable substrings. This is just a basic sanity check, it won't find all potentially incriminating information by any count. For example, it won't find strings encoding in multibyte character sets such as UTF-16 (used by Windows and Java).

If at all possible, try to separate your program between an innocent-looking program that won't raise any flags and a small text file containing the sensitive data. Distribute the two through different channels. Even better, arrange to make your program a small text file; ideally, use a popular interpreted language and only distribute the program source. Don't distribute your own working source, though: distribute a sanitized source with no comments, sensitive variable names, etc.

When you distribute your program, you might decide to use an archive format such as zip or tar. Note that these archives store the date of the file, and some archive formats (e.g. tar) can store a user name.

While there is no such thing as complete anonymity, I think it is possible to achieve good enough anonymity for a one-man programming effort. Your biggest worry will be distribution. It is a lot harder to remain anonymous when you're moving bytes around, and even less so once you start interacting with people.
Using Tor helps, but it's not a magic bullet, especially against interceptors with government resources.

(In particular, this question may be traceable to you! Particularly if you live in a country with a government filter on all Internet accesses, which can correlate your Internet activity with the public information about your interactions with Stack Exchange.)

I don't know of any way for someone to trace back what computer you wrote the program on or what program you compiled it on.

But, if you want to be safer, one possibility might be to buy access to computing resources on some other system (e.g., on Amazon EC2, or on a virtual private hosting service), and compile your program there. I would also "strip" the binary, to remove all debugging symbols, just in case. If you're able to obtain access in a way that can't be traced back to you, then even if there is some crazy way to identify the computer it was compiled on, that won't be directly linkable back to your identity. Perhaps folks here can suggest a way to get remote shell access (or to run a virtual machine) on some machine, without revealing your identity.

I'm not convinced that working on the cloud is a good idea, as it introduces the quite realistic possibility of correlating user activity with program distribution. Distributing the program would presumably be as incriminating as writing it.
–
GillesNov 24 '11 at 16:05