The branch remove-ifdef removes all ifdefs for features. So where originally there was a #define feature and #ifdef feature #endif there is either just the code the ifdef surrounded, or if the feature was disabled the code was removed.

There's also a branch remove-xash that removes all Xash specific code.

I ran a tool that counts lines of code on it, here are the results.

For the original codebase with no changes:

Language

files

blank

comment

code

C++

51

5437

4046

50560

C/C++ Header

33

964

378

5533

Total

56093

For the cleaned up version in the remove-xash branch:

Language

files

blank

comment

code

C++

51

4512

3272

35487

C/C++ Header

33

937

347

4045

Total

39532

So about 16500 lines of code in the regular version of VHLT is dead code, never used at all. Some of it is Xash specific, but it's not that much.

Some features were disabled, so their code was removed. Here they are:

ZHLT_DETAIL: some kind of old func_detail variant, was already obsolete

ZHLT_PROGRESSFILE: never implemented beyond command line argument handling, so didn't work at all

ZHLT_NSBOB: the only thing i found was the definition, no code appears to exist for it

ZHLT_HIDDENSOUNDTEXTURE: would allow you to mark faces as hidden by setting zhlt_hidden on an entity. You can do this by ending a texture name with _HIDDEN, so i guess it was obsolete

HLBSP_SUBDIVIDE_INMID: seems to be intended to reduce the number of faces, but contributes to AllocBlock:Full errors so it was disabled

The cmdlib.h header is where all of these definitions were, it used to be 712 lines and is now 172 lines. There are 2 definitions left in place because they depend on platform specific functionality.One is game_text's UTF8 conversion support, which relies on a Windows API function. It's not that hard to replace it with a cross platform alternative.

The other is Ripent's -pause parameter which was implemented only on Windows for some reason. This may have to do with the fact that it's implemented using atexit, so it may not work on Linux due to differences in how console resources are managed during program shutdown. Reworking Ripent's code to avoid use of exit should solve this problem.

I don't see any more #define statements used to control features anywhere so i guess it's all cleaned up now.

To make this process easier i used some tools to speed things up. First i used sunifdef to process all files and remove definitions one by one. I wrote a batch file that does this and also commits all changes to Git automatically, so i could just do removeifdefs.bat <definition name>. You can find the batch file in the remove-ifdefs branch in src.

Note that to remove disabled sections you must modify the batch file to pass -Udefinition instead of -Ddefinition or it will turn on the feature.

All in all this took about an hour and a half to do.

My reason for doing this is that the ZHLT/VHLT source code has never been readable, you have to read past the definitions and note which ones are active. More recent versions of Visual Studio do a lot of work for you but it's still hard. For example, the file wadpath.cpp is 92 lines now, but was 174 lines before. That's nearly twice as long, containing code that isn't even used.

wadinclude.cpp is even worse. It used to be 212 lines, now it's 5 lines and there's no actual code left in it. This is because are 3 or more different versions of the same feature (wad inclusion) in the source code. Various long files are much easier to read now that they've been cleaned up.

I hope to use this to rebuild the codebase in C# so that the tools can be integrated more easily, and perhaps deal with some issues that exist in the current implementation. I don't know whether i'll have time to do it or not, but in any case, this cleaned up version is available to anyone to check out. I will not be making a build of this since it's identical to the original V34 build, if you want one you'll have to make it yourself.

Each tool is a library, the MapCompiler library provides a compiler, the MapCompilerFrontEnd is the command line interface to it.You could invoke the compile programmatically with this. You could potentially generate BSP files dynamically this way, though it would still take time to compile everything.

I need to rework the Tokenizer class first so it can read .map files properly since apparently QuArK embeds texture info in comments and stuff.

I'm looking at the code that handles .map file loading and there's special code for QuArK style texture coordinates. However, according to its official documentation this is only used when you use the "Quark etp" format. The Valve 220 format uses the correct values.

I'm bringing this up because the code used to handle this has a pretty substantial presence in the map loading and CSG stages. The script tokenizer has to handle it specially and CSG has to convert texture coordinates differently for this format, which means the map data structures need to store off 2 different formats.

This creates a provider that invokes an initializer function on the tool. In this case the hull data used by CSG is overridden to use your own settings, but more options will be available for each tool.

This approach also allows you to create custom tools that can be inserted between other tools. As long as your tool returns the correct data type for the next tool this will work just fine. You could use this to add tools like analyzers or something that can output the data returned by a tool. For instance, if RAD were to return lighting data that isn't in the BSP file as its result you could output it to a file.

As far as command line usage goes the library i'm using doesn't support the older -option value syntax, only -ovalue or --option=value. Since you'd need a new interface to use this compiler properly anyway i don't see this as a problem. With this compiler you only need to invoke the frontend once to run all the tools you want, so having different syntax can actually prevent misuse by erroring out on the old syntax.

This also lets me use better names for options. Whereas the old tools would often have -no<whatever> arguments, this one just has --<whatever>, for instance -noresetlog becomes --resetlog.

I'm trying to avoid doing any file I/O in the compiler itself, so any files that need loading will be loaded either before the compiler is created or in the initializer for the tool that needs the data. If it's output data then it will need to be handled as a post-tool operation, possibly a custom tool.

This also means that any data that is being output to a file should be available to you programmatically when invoking the compiler as a library. This can make it much easier to access some data, like point file output.

As far as log output goes, i'm currently using the standard Serilog message template, modified to include the name of the tool that's logging the data:

[14:54:40 INF] [MapCompilerFrontEnd] test

The format is:

[time log_level] [tool_name] message newline exception

Since this drastically alters the output in log files i'll probably add an option to remove the extra information and just log what the old one does.

It's possible to configure this to match the original's settings, then it should be a pretty simple task to dispatch work and await completion. However, thread priority is something that should not be changed according to what i've found since this class has some monitoring system to help manage how threads are used. It may not even be necessary to change the priority if it's able to see other processes so that will require some closer inspection once there's some work being done.

There are a lot of things in this design that makes it faster. It only has to parse in the command line once, data is passed between tools directly instead of having to write it all out to files and then parsing it back in, and there are no fixed size buffers so it's easier to remove stuff without having to touch and move a bunch of lists.

Also, memory allocation in C# is much faster than in C++: http:/codebetter.com/stevehebert/2006/03/03/raw-memory-allocation-speed-c-vs-c

The articles and discussions i've found on this are pretty old and Core is known to be much faster and more efficient so it may actually be the best tool for the job.

If i can use hardware acceleration for maths-intensive stuff like VIS and RAD it'll go even faster. You can then compile maps on your GPU.

I'd also consider any slowdowns to be worth the cost is it means having tools that you can actually understand the inner workings of and modify as needed. Very few people know how to make adjustments to these tools.

Iv'e always been interested in getting the tools to work with GPU's, though obvious lack of knowledge etc...

Anyway what parts could be ran in parallel and what potential gains could we be talking about? I've ran benchmarks on various CPU's some time ago and noticed that increasing the core count past 4 offered no tangible decrease in compile time (4 - 10 sec).

The tools are already running as much as possible on worker threads. The advantage of using GPU calculations is that the GPU has cores made for mathematical calculations and can handle the spreading of work by itself since it already does that to optimize graphics rendering.

I don't know how much difference there could be in terms of performance since i can't find anything on that, so we'll have to wait and see how it works out. I should be able to get the CSG threading part done soon, then i can do some tests there to see how it compares. unfortunately since CSG is usually done so fast it probably won't have much difference in performance, it may even be slower due to overhead in creating GPU resources. RAD is really where it takes the longest, so that's where it can come in handy.

I'm not sure why having more cores wouldn't have an effect, it could be that each core's frequency is lower so it doesn't have as much effect. Perhaps the OS is under-reporting the processor count so it might not be taking advantage of all cores, but you should see that in the compiler settings output.

It's also possible that the worker thread code isn't optimized for a large number of threads and is locking too much to take full advantage of what it gets, i'm not sure.

I can do a benchmark on a 12 thread Ryzen 5 2600 - can also test against an overclock and normal clock. RAD is definitely going to be one that benefits the most from multithreading. It's totally radical, man

Sorry, Solokiller, that article is useless. It's common in high-performance C++ projects to use custom allocators, and the cost of deallocation wasn't measured. A more complicated allocation can make the deallocation cheaper and vice versa since a common goal of a memory allocation scheme is to avoid memory fragmentation and that requires a lot of work. C# is of course garbage collected and the cost of cleaning up and reorganizing needs to be taken into account. Most implementations of the standard C++ operator new are designed to be thread-safe and conservative - so there's likely a big overhead from locking every time you call new - which is something a custom allocator can minimize by being greedy and preallocating large amounts of memory. The costs of dealing with dynamic memory are paid for at different points in the lifetime of the program: with (non-garbage-collected (yes, C++ can be garbage collected according to the standard, no idea why anyone would want that though)) C++ it's very predictable - you only pay when you allocate and deallocate; with C# you leave a lot of the work to the garbage collector. There have been a few interesting CppCon talks on allocators.

Making something like this in C# is fine of course, use whatever you prefer. Ease of development is a good reason to pick one language over the other and C++ can be painful. As you know design usually matters much, much more than choice of language when it comes to performance. I'm excited to see the results of your work!

I've been converting CSG's code and i've noticed some areas where performance can improve by not having to write stuff to files. CSG normally writes plane data to 2 files; the BSP file and a separate .pln file. BSP loads the .pln file or if it doesn't exist, converts the data from the BSP file.

Since this data is now shared directly it eliminates the file I/O overhead and the need to support conversion from the BSP file.

I've been keeping track of code i've found that could be optimized in the original version, here's what i've got so far:

The aforementioned use of a reader-writer lock to ensure thread-safe access to the planes array eliminates potential garbage reads

A fair amount of code leaks memory by not freeing the faces list created by MakeBrushPlanes. Using a std::unique_ptr that frees the memory eliminates the leak, using the function FreeFaceList as the custom deleter function. Though i've never heard of CSG running out of memory, in a shared process this does matter a lot more, especially once RAD has started. This isn't a problem in my version of the tools because it's garbage collected

The faces list could be converted to a std::vector<bface_t> to automatically manage the memory and to allow faster iteration by storing the faces by value. Note that this will increase the cost of sorting the sides (the entire face needs to be copied, instead of just changing its position in the list), but that cost may be offset by the speed increase resulting from all faces being located in the same memory range, and thus more likely to be in the CPU cache. Sorting is only done once per brush per hull (4 hulls) at the most (disabling clipping can skip the whole process, some contents types don't create faces at all either), but iteration is done many times, and additional faces are added during the brush expansion process

Origin brush processing and zhlt_minsmaxs handling both use the CreateBrush function to get the bounding box from a brush. However CreateBrush does a lot more work than just calculating the bounding box, work that can be avoided entirely thus speeding up the parsing of the .map file. If properly reworked the number of planes generated can also be reduced by not letting these two features add planes to the array used by the actual brush creation operation. This can reduce the chances of hitting the internal and/or engine limit on the number of planes, since the planes that contain origin and zhlt_minsmaxs brushes are not often going to be used by other brushes

Merging CSG and BSP will cut down on work done to save and load the data generated by CSG, speeding up both tools and eliminating a possible data corruption issue if the plane data is modified between CSG and BSP execution

As is the case with the engine, there's code that uses a varying number of (-)99999 to denote maximum/minimum values, which can cause problems when the engine limit is increased and larger maps are possible. For instance the function isInvalidHull uses 99999 as its limit, so brushes that are partially or completely beyond this limit can end up marked as invalid. Using <cstdint>'s limits constants INT_MIN and INT_MAX appropriately should solve the problem

I'm posting this separately for visibility: does anyone have a large map that takes a bit (~5 or more minutes) to fully compile using a full run of all tools? I'm going to run a performance analysis tool to find the code that's taking the longest so i can see if anything can be optimized. Ideally with default textures only so i don't need to reference other wad files.

It's easier to write everything in the same language so the code can be shared. The engine's data structures for BSP data can also be used by the compiler, so i can increase limits very easily.It's also much easier to provide debug info when the compiler breaks, and it's easier to maintain this codebase since it's designed and written at by one person in one go, without a bunch of #ifdefs and varying code styles.

Also, C is very poorly suited to writing tools and games these days. A fair amount of code in these tools does nothing more than free memory, something that even C++ could do automatically through deterministic destructors.