Made a proof of concept to automatically parse, translate and
import C header files in D using DStep. DMD is linked against
DStep and does not start new process to make the translation.

While this a relatively common request, I don't think such stuff
belongs to compiler. It creates extra mandatory dependencies
while providing little advantage over doing this as part of a
build system.
So far I am perfectly satisfied with invoking dstep from a
Makefile.

Made a proof of concept to automatically parse, translate and import C
header files in D using DStep. DMD is linked against DStep and does not
start new process to make the translation.

While this a relatively common request, I don't think such stuff belongs
to compiler. It creates extra mandatory dependencies while providing little
advantage over doing this as part of a build system.
So far I am perfectly satisfied with invoking dstep from a Makefile.

I agree that this stuff doesn't belong to compiler, however Makefiles suck
(not even portable) and build systems should be avoided whenever a more
integrated solution exist.
So how about a library solution instead, which doesn't require compiler
change:
----
import parse_c_header_importer;
mixin(parse_c_header(import("foo.h")));
void main () { foo();}
----
There are several options:
A)
mixin(parse_c_header(import("foo.h"))); => defines D symbols for everything
in foo.h (excluding things included by it)
B)
mixin(parse_c_header(import("foo.h"),recursive)); => same, but recursively
(probably not very useful, but could be useful if we instead used .i swig
interface files.
C)
The least wasteful:
void foo();
int bar();
mixin(parse_c_header(import("foo.h"),foo,bar));
=> only defines symbols provided
(I've proposed this syntax in an earlier thread)

So how about a library solution instead, which doesn't require compiler
change:

While semantically a great idea, technically I don't think CTFE is up to
implementing a C front end yet.

This is the right path. We don't need the full front end, do we ?

what's a non-full C front end? If it's not a real C front end it's gonna
break with certain macros etc. Not good.
I see no point in re-implementing a C front end when we can simply use an
existing one to do the job (eg clang). This would also allow to parse C++
just as well.

what's a non-full C front end? If it's not a real C front end it's gonna
break with certain macros etc. Not good.

Macro are processed before parsing? No need for a full C frontend to
handle macros.
I see no point in re-implementing a C front end when we can simply use an

existing one to do the job (eg clang). This would also allow to parse C++
just as well.

When you only need a very limited part of the fronted, it make sense. Here
we don't need to parse function body for instance, and we can skip most of
semantic analysis (if not all ?).

you'd still need to parse C files recursively (textual inclusion...),
handle different C function calling conventions, different C standards,
you'd need a way to forward to dmd different C compiler options (include
paths to standard / custom libraries), and eventually people will want to
parse C++ as well anyways. That can be a lot of work. Whereas using
existing tools takes much less effort and is less error prone.

you'd still need to parse C files recursively (textual
inclusion...),
handle different C function calling conventions, different C
standards,
you'd need a way to forward to dmd different C compiler options
(include
paths to standard / custom libraries), and eventually people
will want to
parse C++ as well anyways. That can be a lot of work. Whereas
using
existing tools takes much less effort and is less error prone.

I'm talking about semantic analysis, you answer with parsing, I'm
not sure this is going to lead anywhere. Do you understand that a
parser is actually quite a small part of a frontend ?

So how about a library solution instead, which doesn't require compiler change:

While semantically a great idea, technically I don't think CTFE is up to
implementing a C front end yet.

This is the right path. We don't need the full front end, do we ?

Yeah, you do need the full front end. It's pretty amazing how the simplest of
.h
files seem determined to exercise every last, dark corner of the language.
If the converter doesn't accept the full language, you're just going to get a
dump truck unloading on it.

Yeah, you do need the full front end. It's pretty amazing how the
simplest of .h files seem determined to exercise every last, dark corner
of the language.
If the converter doesn't accept the full language, you're just going to
get a dump truck unloading on it.

When you do have a complete front end you can choose how to handle the
language constructs the tool cannot (yet) translate. I.e. just skip it,
insert a comment or similar.
If you don't have a full front end and encounters something that you
cannot translate, you will most likely have weird behaviors.
--
/Jacob Carlborg

Yeah, you do need the full front end. It's pretty amazing how
the
simplest of .h files seem determined to exercise every last,
dark corner
of the language.
If the converter doesn't accept the full language, you're just
going to
get a dump truck unloading on it.

When you do have a complete front end you can choose how to
handle the language constructs the tool cannot (yet) translate.
I.e. just skip it, insert a comment or similar.
If you don't have a full front end and encounters something
that you cannot translate, you will most likely have weird
behaviors.

Yes, but for the C family of languages we already have a
compiler as library, that is Clang.

Agreed.
I also confess that my anti-C bias got a bit softened with clang.
It does not sort out all C and C++ issues in regard with safety,
but it helps bringing to C a Pascal like safety when integrated
with proper tooling.
Unfortunately when using C and C++, not all compilers are like
clang and it is not always easy to convince people to add extra
tooling (lint and friends).
--
Paulo

Yeah, you do need the full front end. It's pretty amazing how
the
simplest of .h files seem determined to exercise every last,
dark corner
of the language.
If the converter doesn't accept the full language, you're just
going to
get a dump truck unloading on it.

When you do have a complete front end you can choose how to
handle the language constructs the tool cannot (yet) translate.
I.e. just skip it, insert a comment or similar.
If you don't have a full front end and encounters something
that you cannot translate, you will most likely have weird
behaviors.

My understanding is that we only want to convert declaration to
D. Can you give me an example of such corner case that would
require the full frontend ?

My understanding is that we only want to convert declaration to D. Can you give
me an example of such corner case that would require the full frontend ?

One example:
--------------------------------
//**************************Header**********************\\
int x;
--------------------------------
Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is
that one line splice or two? You have to get the hairy details right. I've seen
similar nonsense with trigraphs. I've seen metaprogramming tricks with token
pasting. You can't dismiss this stuff.

My understanding is that we only want to convert declaration to D.
Can you give me an example of such corner case that would require the
full frontend ?

One example:
--------------------------------
//**************************Header**********************\\
int x;
--------------------------------
Yes, this POS is real C code I got a bug report on. Note the
trailing \\. Is that one line splice or two? You have to get the
hairy details right. I've seen similar nonsense with trigraphs. I've
seen metaprogramming tricks with token pasting. You can't dismiss
this stuff.

I've seen C code where the "header" file has function bodies in them.
Though about trigraphs... I've to admit I've never actually seen *real*
C code that uses trigraphs, but yeah, needing to account for them can
significantly complicate your code.
But as for preprocessor-specific stuff, couldn't we just pipe it through
a standalone C preprocessor and be done with it? It can't be *that*
hard, right?
T
--
Bare foot: (n.) A device for locating thumb tacks on the floor.

Though about trigraphs... I've to admit I've never actually seen
*real* C code that uses trigraphs, but yeah, needing to account for
them can significantly complicate your code.

Building a correct C front end is a known technology, doing a
half-baked job isn't going to impress people.

IOW either you don't do it at all, or you have to go all the way and
implement a fully-functional C frontend?
If so, libclang is starting to sound rather attractive...

But as for preprocessor-specific stuff, couldn't we just pipe it
through a standalone C preprocessor and be done with it? It can't be
*that* hard, right?

You could, but then you are left with failing to recognize:
#define FOO 3
and converting it to:
enum FOO = 3;

Hmm. We *could* pre-preprocess the code to do this conversion first to
pick out these #define's, then suppress the #define's we understand from
the input to the C preprocessor. Something like this:
bool isSimpleValue(string s) {
// basically, return true if s is something compilable
// when put on the right side of "enum x = ...".
}
auto pipe = spawnCPreprocessor();
string[string] manifestConstants;
foreach (line; inputFile.byLine()) {
if (auto m=match(line, `^\s*#define\s+(\w+)\s+(.*?)\s+`))
{
if (isSimpleValue(m.captures[2])) {
manifestConstants[m.captures[1]] =
m.captures[2];
// Suppress enums that we picked out
continue;
}
// whatever we don't understand, hand over to
// the C preprocessor
}
pipe.writeln(line);
}
Basically, whatever #define's we can understand, we handle, and anything
else we let the C preprocessor deal with. By suppressing the #define's
we've picked out, we force the C preprocessor to leave any reference to
them as unexpanded identifiers, so that later on we can just generate
the enums and the resulting code will match up correctly.
T
--
Prosperity breeds contempt, and poverty breeds consent. -- Suck.com

IOW either you don't do it at all, or you have to go all the way and
implement a fully-functional C frontend?
If so, libclang is starting to sound rather attractive...

That's what I'm telling

Hmm. We *could* pre-preprocess the code to do this conversion first to
pick out these #define's, then suppress the #define's we understand from
the input to the C preprocessor. Something like this:
bool isSimpleValue(string s) {
// basically, return true if s is something compilable
// when put on the right side of "enum x = ...".
}
auto pipe = spawnCPreprocessor();
string[string] manifestConstants;
foreach (line; inputFile.byLine()) {
if (auto m=match(line, `^\s*#define\s+(\w+)\s+(.*?)\s+`))
{
if (isSimpleValue(m.captures[2])) {
manifestConstants[m.captures[1]] =
m.captures[2];
// Suppress enums that we picked out
continue;
}
// whatever we don't understand, hand over to
// the C preprocessor
}
pipe.writeln(line);
}
Basically, whatever #define's we can understand, we handle, and anything
else we let the C preprocessor deal with. By suppressing the #define's
we've picked out, we force the C preprocessor to leave any reference to
them as unexpanded identifiers, so that later on we can just generate
the enums and the resulting code will match up correctly.

You will just end up needing to build a full C preprocessor. Just use an
existing one, that is libclang.
--
/Jacob Carlborg

My understanding is that we only want to convert declaration
to D. Can you give
me an example of such corner case that would require the full
frontend ?

One example:
--------------------------------
//**************************Header**********************\\
int x;
--------------------------------
Yes, this POS is real C code I got a bug report on. Note the
trailing \\. Is that one line splice or two? You have to get
the hairy details right. I've seen similar nonsense with
trigraphs. I've seen metaprogramming tricks with token pasting.
You can't dismiss this stuff.

My understanding is that we only want to convert declaration to D. Can you give
me an example of such corner case that would require the full frontend ?

One example:
--------------------------------
//**************************Header**********************\\
int x;
--------------------------------
Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is
that one line splice or two? You have to get the hairy details right. I've
seen similar nonsense with trigraphs. I've seen metaprogramming tricks with
token pasting. You can't dismiss this stuff.

This do not require semantic analysis.

Semantic analysis for C is trivial. The real problems are the phases of
translation and the preprocessor.

Yeah, you do need the full front end. It's pretty amazing how the
simplest of .h files seem determined to exercise every last, dark corner
of the language.
If the converter doesn't accept the full language, you're just going to
get a dump truck unloading on it.

When you do have a complete front end you can choose how to handle the language
constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment
or similar.

Yes, but the front end itself must be complete. Otherwise,
it's not really practical when you're dealing with things like the preprocessor
- because a non-compliant front end won't even know it has gone off the rails.
There are other issues when dealing with C .h files:
1. there may be various #define's necessary to compile it that would normally
be
supplied on the command line to the C compiler
2. there are various behavior switches (see the PR for DMD that wants to set
the
signed'ness of char types)
3. rather few .h files seem to be standard compliant C. They always rely on
various compiler extensions
These problems are not insurmountable, they just are non-trivial and need to be
handled for a successful .h file importer.

Yes, but the front end itself must be complete. Otherwise,
it's not really practical when you're dealing with things like the
preprocessor - because a non-compliant front end won't even know it has
gone off the rails.
There are other issues when dealing with C .h files:
1. there may be various #define's necessary to compile it that would
normally be supplied on the command line to the C compiler
2. there are various behavior switches (see the PR for DMD that wants to
set the signed'ness of char types)
3. rather few .h files seem to be standard compliant C. They always rely
on various compiler extensions
These problems are not insurmountable, they just are non-trivial and
need to be handled for a successful .h file importer.

Yes, I agree with all the above. That's why I'm using libclang. What I'm
saying is that when I use a library like libclang I can choose quite
freely what to convert and not convert. Example, DStep doesn't handle
the preprocessor at all. But since libclang does, it can parse any
header file anyway. What happens is just that the preprocessor
declarations won't be translated and not end up in the translated file.
--
/Jacob Carlborg

While this a relatively common request, I don't think such stuff belongs
to compiler. It creates extra mandatory dependencies while providing
little advantage over doing this as part of a build system.

I started to think a bit about this. One might need to specify various
options to translate the header file. Options like include paths and
similar. That might be quite problematic to do in a pragam, or via DMD
command line options.
--
/Jacob Carlborg

This sounds pretty cool, and the suggestion from Timothee also
makes a lot of sense.
Is there any way we can rig this to behave as if it were a CTFE
invocation? It could be treated like an intrinsic up to the
point where we have powerful-enough CTFE to replace it. I'm
still not sure if Walter would be OK with this, but I figure I'd
mention it, since it could give us something really nice without
having to wait for CTFE to get good.