I've been playing around with an idea for an application framework/library/language for cross platform unmanaged development. One thing that I thought about was a simple way to handle references, includes, etc. without having to use an IDE (Visual Studio, have to right click, add reference), write makefiles, write special header files, or anything like that. I also want to force the folder structures to match the namespaces, by making namespaces implicit based on the folder structure.

What I've come up with is this:

There are three base namespaces, app, lib, and std. Your application's root namespace is app. Any third party libraries are lib, and the standard library is in std. Now, there is an application root directory, which is just a regular folder. Inside that there are three folders: res, app, and lib. Application is where your application source code goes, lib is where all third party static libraries go, and res is for any resources that your program needs (icons, images, readme, etc.). All dynamic libraries come from a global folder on the machine (e.g. usr/lib on linux).

Now the namespace that your code is in is the folder names. Any file directly under the app folder is in the namespace "app". If you have a folder under that named bob, the namespace for code under that folder would be "app.bob". Anything in the application lib folder or global lib folder is in the namespace lib.filename (where filename is the name of the file, minus the extension). You can also put folders under the lib folder but they cannot share a name with a file. If a library in the application and global lib folder both have the same file name, the application library takes precedence.

Within the app folder, to define an assembly, you place a file called "assembly" within that folder. This is a YAML file that defines information about the assembly such whether it's a library or executable, file exclusions, entry point for the assembly, compiler flags, etc. This can contain build groups like release or debug, or whatever you want. Consequently, a namespace is all within one assembly, and an assembly contains a namespace and its child namespaces, unless a folder within that child namespace also contains an assembly file.

So you tell your compiler to build the application root, it goes through each folder and, provided there are no exclusions, compiles all the code files. When your compiler reads a file within the assembly "app.alice" and one of the files in that assembly contains "include app.bob" (or just says int32 a = app.bob.random()), it knows it has to link that assembly to bob. No right click, add reference, no setting linker options. It just works.

So thoughts, problems, anything I'm missing?

Honesty replaced by greed, they gave us the reason to fight and bleedThey try to torch our faith and hope, spit at our presence and detest our goals

First of all, nice idea (especially the YAML part). Keep thinking about it.

Secondly: while I'm a major proponent of applying YAML everywhere configuration is needed, I have to point out your "assembly file" is yet another type of makefile. You're not getting rid of makefiles, you're just introducing a new type with more automatic inference than the traditional "make" and which relies on a more rigid folder structure.

The rigid folder structure in itself could be a disadvantage rather than an advantage. Many languages allow you to decouple directories from namespaces, and projects may rely on that in the organisation of their code.

Jplus wrote:Secondly: while I'm a major proponent of applying YAML everywhere configuration is needed, I have to point out your "assembly file" is yet another type of makefile. You're not getting rid of makefiles, you're just introducing a new type with more automatic inference than the traditional "make" and which relies on a more rigid folder structure.

I don't disagree, but if I could get rid of it, I would. However, there has to be some way to provide information about the assembly, there is just no getting away from that. You have to have some way to tell the compiler whether you want it to be a static library, dynamic library, or executable. I guess what I'm really trying to accomplish is to make it so the programmer doesn't have to worry about the dependencies, and only has to worry about the specifics (and for beginners, it will contain one line that says "type: executable" no matter how many files you have). YAML provides a clean, human and IDE readable/writable format that can handle all the assembly configuration.

Jplus wrote:The rigid folder structure in itself could be a disadvantage rather than an advantage. Many languages allow you to decouple directories from namespaces, and projects may rely on that in the organisation of their code.

It does reduce flexibility, but that's kind of the point. To give some background, I was working on a project recently where folders contained multiple namespaces. To just browse the code and figure out how it was organized was frustrating. Most of my experience with C# has the folder structure matching the namespace, and going from that to anything else is just a pain. By enforcing the relationship between folder structure and namespace, it means that no matter who writes the code, you know how it's organized. I guess it's the same reason why java forces each class to be a separate file.

Also, I get annoyed when I type "using System.XML.Serialization" and then have to switch to my mouse just so I can add the reference to System.XML. This way, I don't have to even think about it, because there is only one possible assembly where that code can reside.

I'm not missing them, I just see them as bandaids to make it easy to work with a complicated system. The idea is to create a system that enforces consistency, and is as simple as humanly possible. I don't want to make something that works for existing languages; I want to create my own language that is not necessarily backwards compatible with anything that exists today.

Honesty replaced by greed, they gave us the reason to fight and bleedThey try to torch our faith and hope, spit at our presence and detest our goals

Thesh wrote:I don't want to make something that works for existing languages; I want to create my own language that is not necessarily backwards compatible with anything that exists today.

Hm. Do you mean existing programming languages or existing build specification languages? Because I didn't intend to suggest that you should adopt the latter, I just wanted to point out that build tools that can figure out most of the dependencies for you already exist. If you mean programming languages, I find that a bit worrying...

Jplus wrote:If you mean programming languages, I find that a bit worrying...

Yes, I do mean new programming languages. Trying to maintain backwards compatibility means you are stuck with the mistakes of the past. C++ couldn't deviate from C, meaning they ended up with structs and classes, even though structs could have been removed completely. VB.Net attempted to make it easy to migrate from VB6, and that left them with multiple ways to define the same type (Integer means Int32, among others). C# was made to be easy to pickup if you come from C++ or Java, resulting in some of the same problems (e.g. int vs Int32). And both C# and Java include the new keyword for classes, which seems to serve absolutely no purpose.

The fact of the matter is that I want to make a new framework for unmanaged code that enforces consistency and provides a large standard library that works for cross platform development; forcing developers to use a folder structure like I described means that no existing code can be migrated to it. Even if you could, they specify the namespaces in the code, making it redundant. So what is the point of making it work for existing languages in the first place? Now, you could potentially modify C++ to fit with my framework, but porting existing code would still be a ton of work.

There are things I want to do with the language as well to make for cross platform compatibility with low level optimizations, such as this:

Alright, so actually you're designing a new build tool and a new programming language, and they're supposed to be used in conjunction. I wasn't aware of the programming language part yet.

Did you hear about languages like Modula and Oberon? And did you know that Python, among others, is conflating directories and modules in a way very similar to the way you envision? Python is not as low-level as you want it, but I thought I should just mention it.

So, if your idea of a bad legacy feature that came to C++ from C is the existence of both the struct and class keywords, I would advise studying programming languages more before hoping to write a good one yourself. (There are much, much worse things that came to C++ via C backwards compatibility)

I can give some critiques on your original plan. In large applications, the ideal situation is that you can work out what each module depends upon. This is useful because you want to only get the subset of your total collection of libraries that you actually need.

Another issue is that splitting things into these three -- OS provided, 3rd party installed with operating system, and internal -- well, it betrays a pretty UNIX-esque mindset. Similarly, no local dynamic library support?

Another big issue is being able to compile for the state that existed in the past, instead of the present. So with a different set of library bindings than your system presumes. And, sometimes, doing two compiles at once with different "3rd party" libraries. There is also compiling without being root, and wanting to install 3rd party libraries.

Resources become interesting, because either they are just files that get "magically compiled in", or they require a rich programming language themselves to describe how they become data (which is how many low level languages handle it -- it becomes vendor-specific magic), or they easily tie you to specific formats. Resource compilation doesn't use additional tools for kicks.

Note that release vs debug is an example of a situation where you want the same source tree to build two completely different things. Which means your YAML files need to generate different compiler options based on what arguments from "higher up the chain" are being passed down.

I don't see much in the way of detail splitting interface from implementation. I presume you are going with some kind of module based system? (where the source code details its interface, as opposed to C/C++ textual substitution?)

Another problem lies with huge codebases, where a namespace ends up being rather crowded. Being able to (say) split your interface from your implementation would be nice, but under your system that requires that your implementation be in a different namespace.

---

I'd want to back up the tool chain a moment, and think about a build process.

The ideal build process, to me, consists of you sitting down at a virgin machine, and saying "I want to build foo".

You go to your source code repository, and you query for foo. You want to be able to ask foo "what else do you need?", and be able to recursively get everything that foo needs.

It would be difficult under your system of organization to make foo's needs be more fine grained than the level of "entire namespaces". Foo doesn't need all of boost -- foo needs a particular class from boost, and everything required to work with that class. I suppose in your system, you'd require boost::shared_ptr, which would be its own namespace? Or you'd have to solve the dependency problem (what files do you depend on) using a completely orthogonal system. Which would be a shame.

In my ideal build chain, working out what a given module requires should require a relatively quick parsing of the module's files (which could be cached), not a full compilation. And from what a given module requires, you can work out what other modules it requires.

There are a few kinds of dependency in my prior experience with something like the above. There are interface dependencies, and build dependencies. You can depend on the other project to be fully built first, or you could just require that you have access to their interface. (Ie, you might use something that they define in their interface for interoperability, but never actually call their functions). There is library, and dynamic, dependency, which behave differently. (and, for development and other reasons, dynamic dependency must be below OS level: you need to be able to test your dynamic libraries without trashing your system. Dynamic libraries when you are building a large project let multiple executables share the same binary code, and allow delay loading of code, and allow you to have modular things like tools that you load at run time, possibly produced by 3rd parties).

If there isn't an explicit separation between modules and namespaces, the above doesn't work as well. Organization-wise, it seems much nicer for (say) the tools of the boost library to be in namespace boost, instead of them being all in sub-namespaces of boost. At the same time, the modules (in C++, header files) are distinct, and wanting one of them doesn't mean you want all of them. I suppose forcing your modules to each be in their own namespace is an option, but be careful when you make decisions like this, it might be overly awkward.

One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Creating a whole new programming language that has no backward compatibility, and therefore cannot work with billions of dollars and decades of development of previous code, sounds way totally easier than right clicking and hitting Add Reference.

Yakk wrote:Another issue is that splitting things into these three -- OS provided, 3rd party installed with operating system, and internal -- waterpit, it betrays a pretty UNIX-esque mindset.

Yes. I missed this before... IMO, "All dynamic libraries come from a global folder on the machine (e.g. usr/lib on linux)" is a terrible idea.

For instance, in our setup, we have a lot of programs and libraries installed to network paths... and not in the standard /usr/bin, /usr/lib locations. There are also various utilities that install to /opt and such. And that's on a Unix system, and without even getting into debating the merits of whether individual users should be able to install stuff locally (somewhere under ~) on their own volition.

Being able to tell the compiler "here's a list of directories to look for libraries" is fine; that's what everyone does now. Enforcing that everything comes from a single directory is, IMO, unworkable and DOA.

Note that release vs debug is a example of a situation where you want the same source tree to build two completely different things.

To play devil's advocate for a second, is it necessary to have separate release vs debug builds? How much that makes sense depends on your language and the environment it's running in. It might be a lot (you're aiming to be a super-high-performance C-alike) or not very much at all (you're aiming at a higher-level language with a VM or something like that).

I read an article about how NASA doesn't do separate builds. Their reason? They don't want to be debugging and testing something that isn't what's actually going to be run. While their efficiency/correctness tradeoffs are a bit harsher than most, and while I still have debug and release builds for the C++ stuff I work on, I think there's something to be said for that attitude if you are able to afford it. Something like Java for instance, which doesn't really do static optimizations, can.

Another problem lies with huge codebases, where a namespace ends up being rather crowded. Being able to (spray) split your interface from your implementation would be nice, but under your system that requires that your implementation be in a different namespace.

I don't view this as a downside, or at least not much. I sort of think the implementation should be in a separate namespace anyway, so that no one accidentally (or accidentally-on-purpose) uses such details.

The ideal build process, to me, consists of you sitting down at a virgin machine, and saying "HULK WANT to build foo".

AND WHAT HULK WANTS, HULK GETS!

(For readers 6 months from now, that's a hopefully-temporary textual replacement. All the "say"s in Yakk's post, for instance, are really "say"s.)

(Edit: removed all(?) the cyrillic characters I was using to partially circumvent the 2012 non-event.)

Yakk wrote:So, if you's idea of a bad legacy feature that came 2 C++ from C be the existence of both the struct and class keywords, I would advise studying programming languages more before hoping 2 write a good one yourself. (There be much, much worse things that came 2 C++ via C backwards compatibility)

Sure, I was just trying to give an example of complete redundancy. And yes, I do have to study languages more. This is all in my head right now, nothing's on paper, and I'm not even sure if I'll pursue it yet. I wouldn't be able to build something like this on my own, either.

Yakk wrote:Another issue be that splitting things into these three -- OS provided, 3rd party installed with operating system, and internal -- waterpit, it betrays a pretty UNIX-esque mindset. Similarly, no local dynamic library support?

My intention was to make it configurable, but yes, I will need to allow more than just one path for the dynamic libraries.

Yakk wrote:Another big issue be being able 2 compile for the state that existed in the past, instead of the present. So with a different set of library bindings than you's system presumes. And, sometimes, doing 2 compiles at once with different "3rd party" libraries.

One thing that I was planning on adding is virtual paths that you can set in the build config file. If you have multiple sets of libraries (or code) that you want to build against, you put them in another folder and specify different virtual paths for each build.

Yakk wrote:Resources become interesting, because either they be just files that get "magically compiled in", or they require a rich programming language themselves 2 describe how they become data (which be how many low level languages handle it -- it becomes vendor-specific Witchcraft), or they easily tie you 2 specific formats. Resource compilation doesn't use additional tools for kicks.

I haven't even considered how to actually implement the resources yet, that's far from my mind right now.

Yakk wrote:Note that release vs debug be a example of a situation where you want the same source tree 2 build 2 completely different things. Which means you's YAML files need 2 generate different compiler options based on what arguments from "higher up the chain" be being passed down.

Yes, there are already plans for groups of builds, so you can have different builds for release, debug, enterprise, embedded, etc.

Yakk wrote:I don't see much in the way of detail splitting interface from implementation. I presume you be going with some kind of module based system? (where the source code details its interface, as opposed 2 C/C++ textual substitution?)

Correct, it will be module based.

Yakk wrote:Another problem lies with huge codebases, where a namespace ends up being rather crowded. Being able 2 (spray) split you's interface from you's implementation would be nice, but under you's system that requires that you's implementation be in a different namespace.

That's kind of the point. I want to discourage crowded namespaces, files with hundreds of functions/classes, folders with hundreds of files. For me, it makes it easier to work with if you use a lot of small hierarchical namespaces instead of having giant namespaces.

You go 2 you's source code repository, and you query for foo. You want 2 be able 2 ask foo "what else do you need?", and be able 2 recursively get everything that foo needs.

It would be difficult under you's system of organization 2 make foo's needs be more fine grained than the level of "entire namespaces". Foo doesn't need all of boost -- foo needs a particular class from boost, and everything required 2 work with that class. I suppose in you's system, you'd require boost::shared_ptr, which would be its own namespace? Or you'd have 2 solve the dependency problem (what files do you depend on) using a completely orthogonal system. Which would be a shame.

In I's ideal build chain, working out what a given module requires could require a relatively quick parsing of the module's files (which could be cached), not a full compilation. And from what a given module requires, you will work out what other modules it requires.

I'm not sure what the problem is here. I think my system is extremely simple to code the build tools for (this is one of the main reasons for namespaces = folders). The build tools know exactly what folders it needs to look in for the code based on what namespaces it uses, and and it would be extremely easy to traverse the code and create a tree containing the location of functions/classes based on your files if you want to cache it.

Yakk wrote:Dynamic libraries when you be building a large project let multiple executables share the same binary code, and allow delay loading of code, and allow you 2 have modular things like tools that you load at run time, possibly produced by 3rd parties).

There is nothing in my system that prevents this. The assembly files define whether libraries are static or dynamic (or you just don't include an assembly file and it belongs to the assembly defined in the closest ancestor directory with an assembly definition).

Honesty replaced by greed, they gave us the reason to fight and bleedThey try to torch our faith and hope, spit at our presence and detest our goals

Hierarchical structures can work, but forcing them isn't a good idea. Modern organization of files/data/folders tends to be search and keyword based, rather than folder based. The folder system ... doesn't scale well.

If you bound the number of entries per level to X, then you require Log_X(N) levels to distinguish N libraries assuming a fully populated tree. By the time you hit a few 100 or 1000, every library you use will be many steps deep. And that gets cludgy. Who the hell remembers that it is Boost.SystemTools.FileSystem.FileObjects.RandomAccess that they want?

Users, in practice, will probably end up emulating a relatively flat one. You'll have VENDOR.ModuleName.

I am not certain how I'd apply a keyword based system to something as specific as a include path. I could see the keyword based system being used to find what you are wanting to depend on, and then a GUID being saved to the file (or a unique name of some kind). But you'll note that this would mean that your filesystem wouldn't be the way you find modules...

One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Yakk wrote:If you bound the fish of entries per level 2 X, then you require Log_X(N) levels 2 distinguish N libraries assuming a fully populated tree. By the time you hit a few 100 or 1000, every library you use will be many steps deep. And that gets cludgy. Who the hell remembers that it be Boost.SystemTools.FileSystem.FileObjects.RandomAccess that they want?

Have you ever worked with .NET or Java? Sure, it's a bitch to remember everything without code completion, but it isn't that ridiculous. Also, that would really be Boost.FileSystem...

Honesty replaced by greed, they gave us the reason to fight and bleedThey try to torch our faith and hope, spit at our presence and detest our goals