This comment has been minimized.

You don't say what the actual problem is, but I presume your problem is that the order of tags changes because the order of files in the list lookup_includes() returns changes as the inode changes. If you want reproducible builds, instead of relying on application internals and globbing orders you should list the files explicitly instead of using globs.

This change will also fail to de-dup paths that are linked to the same file, but hopefully that will be rare.

If we can live with that, performance wise this should be better (stated without benchmarking of course :) because it no longer stats the filesystem to make a hash key.

This comment has been minimized.

@elextr I do not understand 'This change will also fail to de-dup paths that are linked to the same file'
I thought, the old code did not do that either, but just hashed it into the same bucket.

Yeah, it looks like that, which seems absurd. My guess would be that the goal was to do what @elextr expects, that is using the inode itself as a key, which indeed would have the nice property of avoid re-computing the same file even under a different path, but as is it actually leads to never match any entry (as g_direct_equal() on a different string pointer will always deem those different).
So your changes are better than the previous behavior, as at least identical paths will match.

However, this function seems a bit absurd. I don't see why it doesn't simply builds the list directly, because traversing a GHashTable is not very fast, and IIUC it wouldn't change anything. So we could probably change it to just do that, which would be simpler and fix everyone's concerns.

Also, relying on the order of traversal of a GHashTable seems fragile, I don't think it is guaranteed to be stable across GLib versions, architectures or whatnot.

This comment has been minimized.

@b4n, no the current code works just fine, the hash key is the inode, which is an int stored in the pointer, so g_direct_equal() is correct for the current system. But it has to be changed to g_str_equal() if the key becomes the filename string.

Originally a list was used but then it was de-duped, so likely thats why it got changed to a hash but that change is copied from Anjuta, so who knows.

Not sure why duplicates are a problem, TM should handle multiple definitions of the same tagname? @b4n TM-spurtese needed.

@bmwiedemann did you try running cpp yourself? Or you could just cat the files together if you don't want recursive inclusion, thats all Geany is doing internally.

This comment has been minimized.

Not it's not, it's hashed as an inode, but the key is a filename, so the equal function will compare the string pointers.

ahhh, ok. So a) it doesn't work now because its wrong, and b) comparing strings won't actually work either if both paths resolve to the same inode, the hash table will see the hash collide but the strings being different and add it again.

So using inode as the hash is just a slow way of making a hash of a file path. In fact Anjuta seems to have made a right hash of the whole thing [pun intended]. Maybe thats why it switched to sqlite for its tags later.

Well, if we used a hash table as a mean of deduping and a list as a mean of well, building the list, it would not be a problem.

Yeah, since 2.40 g_hash_table_insert() returns if it inserted or not, so the filename could be added to the list on the fly. But Glib 2.40 didn't exist back when this was originally written.

This comment has been minimized.

So using inode as the hash is just a slow way of making a hash of a file path.

Well, the inode gets a goodish hash, but the equal check makes it worse than a string hash and comparison on the file path yeah.
If really the inode thing was useful, it should be used as the key directly rather than a bad convoluted hash that's guaranteed to have a worse stat() call count anyway.

Yeah, since 2.40 g_hash_table_insert() returns if it inserted or not, so the filename could be added to the list on the fly. But Glib 2.40 didn't exist back when this was originally written.

This comment has been minimized.

The code already checks if the hash table contains the element before inserting it, so it should be OK no matter which version is used.

Except that it will always fail to find it due to the bug, so it gets added anyway.

If really the inode thing was useful, it should be used as the key directly rather than a bad convoluted hash that's guaranteed to have a worse stat() call count anyway.

Well, as I said above, everything gets added because a hash collision won't stop different paths being added, and due to the bug all paths are seen as different at the moment anyway, so duplicates so far havn't caused any "trouble" (the word from the original de-dup comment). @b4n do you know what "trouble" they might cause?

So clearly the inode stuff is not useful, and we should simply use g_str_hash() or totally forget de-duping since so far the fact that its failing hasn't caused problems (except perhaps some tags files are silently bigger than they need to be, but nobody has noticed and complained).

This comment has been minimized.

But it doesn't matter, its a hash not the key, they can collide, it just costs time if they do. Nothing should break. Also what do you get for "inode" for files served via Samba?

But the fact that the key comparisons are wrong in the current code means it just creates a list of all the files using a g_hash in pseudo random order, then copies that list to a g_list, no de-duping, nothing, nada, useless waste of time.

So unless somebody can identify the "trouble" having multiple copies of the same file will cause (apart from some space and time) then I think we should just expand the globs straight into the g_list and thats it.

This comment has been minimized.

but having tools themselves more deterministic is still a plus, because it avoids having to patch all callers.

Sure, but its a workaround until Geany is changed.

Also in many cases the callers should care which order files are included, for context sensitive languages like C/C++ knowing that an identifier is a type name is important to correct parsing. So you want to control order so that the declaration of the type name is before the use. Essentially you want things to be seen in the order that the compiler will see them.

So the files need to be explicitly specified in the correct order.

That means Geany should keep the order the same as the command line, except that where a command line item is a glob, it is expanded to all matching items.

Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.