Is there an easy way to remove comments from a C/C++ source file without doing any preprocessing. (ie, I think you can use gcc -E but this will expand macros.) I just want the source code with comments stripped, nothing else should be changed.

EDIT:

Preference towards an existing tool. I don't want to have to write this myself with regexes, I foresee too many surprises in the code.

@Neil:sorry, but no. A parser deals with the structure of statements. From the viewpoint of the language, a comment is a single token that does not participate in any larger structure. It's no different from a space character (in fact, in phase three of translation, each comment is to be replaced by a single space character). As for building the preprocessor into the compiler, the explanation is much simpler: the preprocessor often produces very large output, so communicating it to the compiler efficiently improves compilation speed a lot.
–
Jerry CoffinMar 6 '10 at 23:23

7

@Neil: Perhaps that's best -- you seem to be just repeating the same assertion, with no supporting evidence. You haven't even once pointed to what semantic analysis you think is needed to parse comments correctly, just repeated that it is (which the standard not only doesn't require, but doesn't really even allow). You substitute trigraphs, splice lines, then break the source into tokens and sequences of white space (including comments). If you try to take more semantics into account than that, you're doing it wrong...
–
Jerry CoffinMar 7 '10 at 0:28

It depends on how perverse your comments are. I have a program scc to strip C and C++ comments. I also have a test file for it, and I tried GCC (4.2.1 on MacOS X) with the options in the currently selected answer - and GCC doesn't seem to do a perfect job on some of the horribly butchered comments in the test case.

NB: This isn't a real-life problem - people don't write such ghastly code.

Consider the (subset - 36 of 135 lines total) of the test case:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.
/\
\/ This is not a C++/C99 comment!
This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.
/\
\* This is not a C or C++ comment!
This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.
This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.
/\
\/ This is not a C++/C99 comment!
This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.
/\
\* This is not a C or C++ comment!
This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.
This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

The output from 'scc' is:

The regular C comment number 1 has finished.
/\
\/ This is not a C++/C99 comment!
This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.
/\
\* This is not a C or C++ comment!
This is followed by regular C comment number 2.
The regular C comment number 2 has finished.
This is followed by regular C comment number 3.

The output from 'scc -C' (which recognizes double-slash comments) is:

The regular C comment number 1 has finished.
/\
\/ This is not a C++/C99 comment!
This is followed by C++/C99 comment number 3.
The C++/C99 comment number 3 has finished.
/\
\* This is not a C or C++ comment!
This is followed by regular C comment number 2.
The regular C comment number 2 has finished.
This is followed by regular C comment number 3.

The source for SCC is about 270 lines of code plus two supporting library files (one that I use in almost all my programs, and one that I use in filter programs). Contact me if you need it (see my profile).

Believe me Jonathan they do. I cleared the code and there was 2000 lines of code which were commented. I couldn't believe how a human being can write this messy code.
–
Halil KaskavalciSep 25 '12 at 20:35

Could you publish this program and give the link here please? (if it is libre/free software)
–
TotorMar 13 '13 at 16:29

StripCmt is a simple utility written in C to remove comments from C, C++, and Java source files. In the grand tradition of Unix text processing programs, it can function either as a FIFO (First In - First Out) filter or accept arguments on the command line.

I had this problem as well. I found this tool (Cpp-Decomment) , which worked for me. However it ignores if the comment line extends to next line. Eg:

// this is my comment \
comment continues ...

In this case, I couldn't find a way in the program so just searched for ignored lines and fixed in manually. I believe there would be an option for that or maybe you could change the program's source file to do so.