Luaengine Crashfix

What bug does this patch fix: Crashes in the old Lua Engine caused by dereferencing free'd memory.
Detailed Explanation: The old Lua Engine (Luabridge too actually but this patch will not address any issue related to that) stores references to Object and derived types (Unit/GameObject/Player/Creature) but when these Objects are despawned and deleted by ArcEmu, the Lua Engine still holds these references which now point to free'd memory. The solution me and dfighter chose was to make the Lua Engine store guid/mapId/instanceId instead of Object* so when the Object is deleted, NULL will be returned (or "nil" in the Lua slang). I copypasted ArcLuna<T> and added ArcSole<T> described as "As opposed to ArcLuna, this is meant to bring some light to this Lua Engine plagued by crashes since r1". When getting the Object* from guid/mapId/instanceId it first tries to get MapMgr* from mapId with sInstanceMgr.GetMapMgr(mapId). If NULL is returned, it tries with sInstanceMgr.GetInstanceByIds(mapId, instanceId). If NULL is returned again it means the map doesn't exist anymore and NULL will be returned (or "nil"). Once the MapMgr reference is retrieved, MapMgr::_GetObject(guid) is called and the result is casted to the template type T. __gc metamethod is set to nil instead of the old "let's delete the pushed Object" behaviour of ArcLuna because it's not a business of the Lua Engine to delete Objects once they have no more references in Lua Script; they will be deleted by a call to Despawn(), when the MapCell is unloaded or by some other function in arcemu-world. This was probably causing crashes in ArcLuna, too (and probably still causes them, let's wait for a crash report just to be more sure and to fix 1 thing at time ).

How to reproduce: In my journey thought the Lua Engine I had to learn how to be a ril pro Lua scripter (and this means I learnt how to copypaste parts of scritps found around and make them what I wanted to do). Here's a easy crashy script that will work just fine after applying this patch. http://pastebin.com/1vQL80D3 (updated after gossip changes)

Tested:
- storing a ref to a Creature in a global variable, despawning the Creature and accessing the global variable.
- storing a ref to a GameObject in a global variable, despawning the GameObject and accessing the global variable.
- at Stormwind (continent)
- at Black Temple (instance)
- ran on valgrind, no memory issues, no memory leaks (related to this patch)
- Lua functions Unit:SendChatMessage()/Unit:SpawnCreature()/Unit:SpawnGameObject()/Unit:FullCastSpellOnTarget()/gossip functions

I'll be testing this on my server with an active population, though it'll take a while as I'm quite busy today.

Edit:

Everything appears to be working, I noticed how when using flight paths now the units don't start moving to waypoints etc when you load them if the location of the waypoint is not close enough to have loaded yet. This is fine, and I guess the fix you did. When you go back there not on a flight path, all works fine.

SetPlayerLock(0) (or false) no longer works, but I guess this is just something that hasn't been updated in a ArcEmu commit recently. I just updated ~100 revs. I'll look into it.

SetPlayerLock(0) (or false) no longer works, but I guess this is just something that hasn't been updated in a ArcEmu commit recently. I just updated ~100 revs. I'll look into it.

Could you please test that function without my patch to see if it works? If it doesn't work either, please submit it as a ticket on our tracker
Nice to know your players can fly safely now using flypaths

Could you please test that function without my patch to see if it works? If it doesn't work either, please submit it as a ticket on our tracker
Nice to know your players can fly safely now using flypaths

Since this is a pretty vital function for my gameplay functionality, I can't test the patch on my live realm currently with players. :/ However, testing it on localhost and running through all my scripts, playing it as players would has not caused any crashes when originally it would have caused 1-2 crashes by the time the content was exhausted (~10 hours worth).

ticket updates, seems like using 0 as argument with "false" meaning doesn't work anymore. Instead you have to actually write "false", like player:SetPlayerLock(false) (example at http://pastebin.com/EBkPx0T4 , tested and working. I feel infected by this Lua stuff ).
Anyone else up to test this patch? Where are the hordes of Lua scripters ? They went all to luabridge already?

ticket updates, seems like using 0 as argument with "false" meaning doesn't work anymore. Instead you have to actually write "false", like player:SetPlayerLock(false) (example at http://pastebin.com/EBkPx0T4 , tested and working. I feel infected by this Lua stuff ).
Anyone else up to test this patch? Where are the hordes of Lua scripters ? They went all to luabridge already?

Hurp durp.
/facepalm

I could of sworn I tried that. =/ That's what late nights do to you. Tested, working. Experienced a single crash with IsInPhase and the bad pointers, but I think this was a random crash out of the blue as I was unable to reproduce it. Updating my scripts then transferring to my dedi to test in a live environment.

The hordes of Lua scripters don't come to the ArcEmu forums, lol. They go on ac-web and spam people on why there copy and pasted scripts no werk.

Patch updated setting __gc metamethod to nil because it was deleting the pushed Object, while arcemu-world should take care of deleting it. __gc is called when there are no more strong references pointing to the Lua object, but this doesn't mean the Object should be deleted because there are most likely a lot of C++ references. It's up to arcemu-world deal with the lifetime of Object (and derived types). http://pastebin.com/hxMgGkUK (same link of first post).
Tested : same Lua script of first post, ran on valgrind (no memory issues or leaks related to the changes), at Stormwind and Black Temple, deleted with .npc delete the Creature that had the gossip script: everything looks fine.
Untested: My poor knowledge of the Lua Engine (but most likely my laziness) didn't let me test the behaviour of ArcSole when __gc is called, even if nothing should happen has it was set to nil.
Feedback is most welcome

Are you sure those Objects were being garbage collected? Seems unlikely because objects are pushed with gc=false, so "DO NOT TRASH" is in the registry with value "true", which is checked for in the GC function, ultimately leading to the object not being deleted. It is a bad idea to remove this function completely because objects that can be allocated by lua (packets and custom taxipaths for example) will never be deleted.

Also, I believe your current implementation could be improved performance-wise. While the method here will work (for Object and derivatives) it will be doing these checks for existance in the world an awful lot. In this implementation Get() will be called every time an object is accessed through a C++ function (including simple Unit:DoThis() syntax, and whenever an Object is an argument to a C++ function).

For example this lovely gossip script you made earlier http://pastebin.com/BQVyKF6r , whenever the OnGossip function is executed the Get() method will be called 9 times at least!

And, while Get() is a constant time function, I think the lua engine is slow enough.

I made a solution in luabridge (just because it was much easier to implement there rather than lua engine) to the problem without the storing of GUIDs approach. How it works is, when any userdata is pushed, a reference to the pushed object and the userdata is stored in a hash_multimap. A new instance hook is used to detect destructors of Spell/Aura and RemoveFromWorld on Objects. When that hook fires the lua engine searches for any stored userdata references for that object, and sets those values to null. Then when check() is called it just has to check that the userdata is not null instead of checking if the map/instance/guid exists

I think this method can be applied to lua engine too but I'll look at that tomorrow. I hope you enjoy this feedback

Please don't PM me asking to fix, correct or look-over your scripts. Please post a new thread in the Lua Scripting section so others can learn and help.

Are you sure those Objects were being garbage collected? Seems unlikely because objects are pushed with gc=false, so "DO NOT TRASH" is in the registry with value "true", which is checked for in the GC function, ultimately leading to the object not being deleted.

Yes I'm sure as there's a callstack right here saying LuaEngine::ArcSole<Unit>::gc_T was called and tried to delete an in-world Creature (that's what triggered the assert).

hypersniper, on 06 August 2011 - 06:56 AM, said:

It is a bad idea to remove this function completely because objects that can be allocated by lua (packets and custom taxipaths for example) will never be deleted.

If you take a look at my patch you can see ArcSole is used only for Object derived types and they don't need any kind of garbage collector method.

hypersniper, on 06 August 2011 - 06:56 AM, said:

Also, I believe your current implementation could be improved performance-wise.

I'll start even remotely thinking about performance ONLY when crashes will be fixed. It's NOT very useful a fast scripting engine that crashes with just 1 simple gossip script.

hypersniper, on 06 August 2011 - 06:56 AM, said:

In this implementation Get() will be called every time an object is accessed through a C++ function (including simple Unit:DoThis() syntax, and whenever an Object is an argument to a C++ function).

Yeah, I know (that's the whole point of this patch, never store references to Object types but always get them with GUID/mapid/instanceid).

hypersniper, on 06 August 2011 - 06:56 AM, said:

Get() method will be called 9 times at least!

and how much time will a modern CPU take to execute those 9 calls?

hypersniper, on 06 August 2011 - 06:56 AM, said:

I think the lua engine is slow enough.

Define "slow enough": if it's already too slow, then it should be removed. If it's at the limit between acceptable and too slow, then it's the fault of who wrote it before me. I'm a Mechanical Engineering student so I'm used to see graphs, tables and math equations about what lead to a conclusion. Could you please post any of those about your "slow enough" conclusion?

hypersniper, on 06 August 2011 - 06:56 AM, said:

I made a solution in luabridge (just because it was much easier to implement there rather than lua engine).

That's unrelated to this thread as the old Lua Engine is single-threaded with locks everywhere while luabridge has 1 instance for each MapMgr.

hypersniper, on 06 August 2011 - 06:56 AM, said:

A new instance hook is used to detect destructors of Spell/Aura and RemoveFromWorld on Objects. When that hook fires the lua engine searches for any stored userdata references for that object, and sets those values to null.

But but, if I cast a Spell that applies an AoE Aura and if there are 9 enemies, that hook will be called 9 times! Even if I don't have any Lua script! Wouldn't this slow down the luabridge performance?
On a more serious note, that's another approach to solve the issue. As my knowledge of the Lua Engine is very limited and I had very little time to find a solution (I think the Lua Engine caused enough crashes already in the last years, thank to the incompetent developers who couldn't find these issues for reasons that are still unknown to me), I (with dfighter) came up with the "usual" GUID solution, used in AIInterface too. If you can post a patch in the format described at http://arcemu.org/fo...?showtopic=2355 (even in this topic is fine, as this was more like a Patch Workshop) then we may find a proper solution to this crash issues.

As for the feedback, that's always welcome, especially from who knows Lua engines better than me.

Look, I don't want to have a fight with you here, I just wanted to help, because you said you wanted feedback. But, it's ok, it's your choice whether you take my ideas seriously or not.

Well your call stack is convincing, I was just skeptical because it was always working perfectly earlier and such a system is widely used even in vanilla Lunar ( http://lua-users.org...indingWithLunar ).

When I said "slow enough" I was saying that the lua engine is slow compared to some other scripting solutions (no I don't have any tables or equations for this) and further slowing it down on a per-instruction basis may not be the best solution. So I suggested another. Implemented on luabridge? yes. Unrelated to this? no.

While you are correct about the 9 auras thing, what I was trying to illustrate is that there is a way to do it without doing (map/instance/guid) checks on a per-lua-instruction basis.

Please don't PM me asking to fix, correct or look-over your scripts. Please post a new thread in the Lua Scripting section so others can learn and help.

Look, I don't want to have a fight with you here, I just wanted to help, because you said you wanted feedback. But, it's ok, it's your choice whether you take my ideas seriously or not.

You two are not fighting. This is a debate about possible solutions to a problem, at most. Which is welcome on this forum.
We are not those other forums, were you either have to kiss up to the poster or be silent, otherwise you are branded a flamer or troll, so feel free to post your input.
As long as it's not personal insults, criticism is always welcome.

Look, I don't want to have a fight with you here, I just wanted to help, because you said you wanted feedback. But, it's ok, it's your choice whether you take my ideas seriously or not.

I think you missed the last sentence of my (quite long) post:

jackpoz, on 06 August 2011 - 09:22 AM, said:

As for the feedback, that's always welcome, especially from who knows Lua engines better than me.

As you can see, I took your feedback so seriously that I even replied to every sentence you wrote (took me 20 minutes to do that and I even had to tell a friend to wait because I was busy). I was just professional (or at least tried) because World Of Warcraft is the game, not ArcEmu, and these Lua Engine crashes are not fun at all (some folks say ArcEmu is a hobby but the definition of hobby at http://dictionary.re...om/browse/hobby is "an activity or interest pursued for pleasure or relaxation" and I don't get any pleasure seeing my hard work on fixing crashes becoming useless because of some old mistakes of other ppl).

hypersniper, on 06 August 2011 - 06:02 PM, said:

While you are correct about the 9 auras thing, what I was trying to illustrate is that there is a way to do it without doing (map/instance/guid) checks on a per-lua-instruction basis.

This is the typical trade-off scenario: we can choose to slow down the Lua Engine (my solution) or slow down the whole arcemu-world (your solution). Also remember that the old Lua Engine (LHA) is single-threaded so having a container accessed by different threads (each MapMgr thread) when an Object is removed from world/~Spell/~Aura will require some locking and that will pretty much make the WHOLE arcemu-world single-threaded. In luabridge this may work but this thread is about the old Lua Engine. Let's keep the focus to one engine at once

I made a solution in luabridge (just because it was much easier to implement there rather than lua engine) to the problem without the storing of GUIDs approach. How it works is, when any userdata is pushed, a reference to the pushed object and the userdata is stored in a hash_multimap. A new instance hook is used to detect destructors of Spell/Aura and RemoveFromWorld on Objects. When that hook fires the lua engine searches for any stored userdata references for that object, and sets those values to null. Then when check() is called it just has to check that the userdata is not null instead of checking if the map/instance/guid exists

due to a recent crash related to the Lua Engine event system ( https://sourceforge....cemu/ticket/183 ), caused by RegisterTimedEvent() being added to World Runnable (Instance ID -1) but accessing Objects of other MapMgrs, it's now clear that each MapMgr should have its own Lua Engine (just like how luabridge works). With some steps we should get a working Lua Engine sooner or later...