Since I cannot find the original email on lkml, NAK on this.__mutex_owner() is not a general purpose helper function.

Since this code also exists in the current kernel, I need to ask whatrecommended alternatives exist for determining the mutex owner?I imagine we could track the mutex owner separately in the auditsubsystem, but I'd much prefer to leverage an existing mechanism ifpossible.

It's not at all clear to me what that code does, I just stumbled upon__mutex_owner() outside of the mutex code itself and went WTF.

The comment (aside from having the most horribly style) is wrong too,because it claims it will not block when we hold that lock, while,afaict, it will in fact do just that.

Maybe if you could explain how that code is supposed to work and why itdoesn't know if it holds a lock I could make a suggestion...

Since I cannot find the original email on lkml, NAK on this.__mutex_owner() is not a general purpose helper function.

Since this code also exists in the current kernel, I need to ask whatrecommended alternatives exist for determining the mutex owner?I imagine we could track the mutex owner separately in the auditsubsystem, but I'd much prefer to leverage an existing mechanism ifpossible.

It's not at all clear to me what that code does, I just stumbled upon__mutex_owner() outside of the mutex code itself and went WTF.

If you don't want people to use __mutex_owner() outside of the mutexcode I might suggest adding a rather serious comment at the top of thefunction, because right now I don't see anything suggesting thatfunction shouldn't be used. Yes, there is the double underscoreprefix, but that can mean a few different things these days.

Post by Peter Zijlstra... is wrong too, because it claims it will not block when we hold that lock, while,afaict, it will in fact do just that.

A mutex blocks when it is held, but the audit_log_start() functionshould not block for the task that currently holds theaudit_cmd_mutex; that is what the comment is meant to convey. Ibelieve the comment makes sense, but I did write it so I'll concedethat I'm probably the not best judge. If anyone would like to offer adifferent wording I'm happy to consider it.

Post by Peter ZijlstraMaybe if you could explain how that code is supposed to work and why itdoesn't know if it holds a lock I could make a suggestion...

I just spent a few minutes looking back over the bits available ininclude/linux/mutex.h and I'm not seeing anything beyond__mutex_owner() which would allow us to determine the mutex owningtask. It's probably easiest for us to just track ownership ourselves.I'll put together a patch later today.

Post by Peter ZijlstraIt's not at all clear to me what that code does, I just stumbled upon__mutex_owner() outside of the mutex code itself and went WTF.

If you don't want people to use __mutex_owner() outside of the mutexcode I might suggest adding a rather serious comment at the top of thefunction, because right now I don't see anything suggesting thatfunction shouldn't be used. Yes, there is the double underscoreprefix, but that can mean a few different things these days.

Post by Peter Zijlstra... is wrong too, because it claims it will not block when we hold that lock, while,afaict, it will in fact do just that.

A mutex blocks when it is held, but the audit_log_start() functionshould not block for the task that currently holds theaudit_cmd_mutex; that is what the comment is meant to convey. Ibelieve the comment makes sense, but I did write it so I'll concedethat I'm probably the not best judge. If anyone would like to offer adifferent wording I'm happy to consider it.

The comment uses 'sleep' which is typically used to mean anything thatschedules, but then it does the schedule_timeout() thing.

Post by Peter ZijlstraMaybe if you could explain how that code is supposed to work and why itdoesn't know if it holds a lock I could make a suggestion...

I just spent a few minutes looking back over the bits available ininclude/linux/mutex.h and I'm not seeing anything beyond__mutex_owner() which would allow us to determine the mutex owningtask. It's probably easiest for us to just track ownership ourselves.I'll put together a patch later today.

Note that up until recently the mutex implementation didn't even have aconsistent owner field. And the thing is, it's very easy to use wrong,only today I've seen a patch do: "__mutex_owner() == task", where taskwas allowed to be !current, which is just wrong.

Looking through kernel/audit.c I'm not even sure I see how you would endup in audit_log_start() with audit_cmd_mutex held.

Can you give me a few code paths that trigger this? Simple git-grep isfailing me.

Post by Peter ZijlstraIt's not at all clear to me what that code does, I just stumbled upon__mutex_owner() outside of the mutex code itself and went WTF.

If you don't want people to use __mutex_owner() outside of the mutexcode I might suggest adding a rather serious comment at the top of thefunction, because right now I don't see anything suggesting thatfunction shouldn't be used. Yes, there is the double underscoreprefix, but that can mean a few different things these days.

Post by Peter ZijlstraMaybe if you could explain how that code is supposed to work and why itdoesn't know if it holds a lock I could make a suggestion...

I just spent a few minutes looking back over the bits available ininclude/linux/mutex.h and I'm not seeing anything beyond__mutex_owner() which would allow us to determine the mutex owningtask. It's probably easiest for us to just track ownership ourselves.I'll put together a patch later today.

Note that up until recently the mutex implementation didn't even have aconsistent owner field. And the thing is, it's very easy to use wrong,only today I've seen a patch do: "__mutex_owner() == task", where taskwas allowed to be !current, which is just wrong.

Arguably all the more reason why a strongly worded warning isimportant (which I see you've included below, feel free to include myReviewed-by).

Post by Peter ZijlstraLooking through kernel/audit.c I'm not even sure I see how you would endup in audit_log_start() with audit_cmd_mutex held.Can you give me a few code paths that trigger this? Simple git-grep isfailing me.

Basically look at the code in audit_receive_msg(), but I wasn't askingyour opinion on how we should rewrite the audit subsystem, I was justasking how one could determine if the current task was holding a givenmutex in a way that was acceptable to you. Based on your comments,and some further inspection of the mutex code, it appears that is/wasnot something that the core mutex code wants to support/make-visible.Which is perfectly fine, I just wanted to make sure I wasn't missingsomething before I went ahead and wrote a wrapper around the mutexcode for use by audit.

FWIW, I just put together the following patch which removes the__mutex_owner() call from audit and doesn't appear to break anythingon the audit side (you're CC'd on the patch). It has only beenlightly tested, but I'm going to bang on it for a day or so and if Ihear no objections I'll merge it into audit/next.

Post by Peter ZijlstraIt's not at all clear to me what that code does, I just stumbled upon__mutex_owner() outside of the mutex code itself and went WTF.

If you don't want people to use __mutex_owner() outside of the mutexcode I might suggest adding a rather serious comment at the top of thefunction, because right now I don't see anything suggesting thatfunction shouldn't be used. Yes, there is the double underscoreprefix, but that can mean a few different things these days.

Post by Peter ZijlstraMaybe if you could explain how that code is supposed to work and why itdoesn't know if it holds a lock I could make a suggestion...

I just spent a few minutes looking back over the bits available ininclude/linux/mutex.h and I'm not seeing anything beyond__mutex_owner() which would allow us to determine the mutex owningtask. It's probably easiest for us to just track ownership ourselves.I'll put together a patch later today.

Note that up until recently the mutex implementation didn't even have aconsistent owner field. And the thing is, it's very easy to use wrong,only today I've seen a patch do: "__mutex_owner() == task", where taskwas allowed to be !current, which is just wrong.

Arguably all the more reason why a strongly worded warning isimportant (which I see you've included below, feel free to include myReviewed-by).

Post by Peter ZijlstraLooking through kernel/audit.c I'm not even sure I see how you would endup in audit_log_start() with audit_cmd_mutex held.Can you give me a few code paths that trigger this? Simple git-grep isfailing me.

Basically look at the code in audit_receive_msg(), but I wasn't askingyour opinion on how we should rewrite the audit subsystem, I was justasking how one could determine if the current task was holding a givenmutex in a way that was acceptable to you. Based on your comments,and some further inspection of the mutex code, it appears that is/wasnot something that the core mutex code wants to support/make-visible.Which is perfectly fine, I just wanted to make sure I wasn't missingsomething before I went ahead and wrote a wrapper around the mutexcode for use by audit.FWIW, I just put together the following patch which removes the__mutex_owner() call from audit and doesn't appear to break anythingon the audit side (you're CC'd on the patch). It has only beenlightly tested, but I'm going to bang on it for a day or so and if Ihear no objections I'll merge it into audit/next.* https://www.redhat.com/archives/linux-audit/2018-February/msg00066.html

Could you please explain the audit_ctl_lock()/unlock() primitive you areintroducing there? You seem to be implementing some sort of recursive lockingprimitive, but in a strange way.

Post by Peter ZijlstraIt's not at all clear to me what that code does, I just stumbled upon__mutex_owner() outside of the mutex code itself and went WTF.

If you don't want people to use __mutex_owner() outside of the mutexcode I might suggest adding a rather serious comment at the top of thefunction, because right now I don't see anything suggesting thatfunction shouldn't be used. Yes, there is the double underscoreprefix, but that can mean a few different things these days.

Post by Peter ZijlstraMaybe if you could explain how that code is supposed to work and why itdoesn't know if it holds a lock I could make a suggestion...

I just spent a few minutes looking back over the bits available ininclude/linux/mutex.h and I'm not seeing anything beyond__mutex_owner() which would allow us to determine the mutex owningtask. It's probably easiest for us to just track ownership ourselves.I'll put together a patch later today.

Note that up until recently the mutex implementation didn't even have aconsistent owner field. And the thing is, it's very easy to use wrong,only today I've seen a patch do: "__mutex_owner() == task", where taskwas allowed to be !current, which is just wrong.

Arguably all the more reason why a strongly worded warning isimportant (which I see you've included below, feel free to include myReviewed-by).

Post by Peter ZijlstraLooking through kernel/audit.c I'm not even sure I see how you would endup in audit_log_start() with audit_cmd_mutex held.Can you give me a few code paths that trigger this? Simple git-grep isfailing me.

Basically look at the code in audit_receive_msg(), but I wasn't askingyour opinion on how we should rewrite the audit subsystem, I was justasking how one could determine if the current task was holding a givenmutex in a way that was acceptable to you. Based on your comments,and some further inspection of the mutex code, it appears that is/wasnot something that the core mutex code wants to support/make-visible.Which is perfectly fine, I just wanted to make sure I wasn't missingsomething before I went ahead and wrote a wrapper around the mutexcode for use by audit.FWIW, I just put together the following patch which removes the__mutex_owner() call from audit and doesn't appear to break anythingon the audit side (you're CC'd on the patch). It has only beenlightly tested, but I'm going to bang on it for a day or so and if Ihear no objections I'll merge it into audit/next.* https://www.redhat.com/archives/linux-audit/2018-February/msg00066.html

Could you please explain the audit_ctl_lock()/unlock() primitive you areintroducing there? You seem to be implementing some sort of recursive lockingprimitive, but in a strange way.audit_receive() -> audit_receive_msg() -> AUDIT_TTY_SET -> audit_log_common_recv_msg() -> audit_log_start()where we can arrive already holding the lock.I.e. recursive mutex, kinda.What's the thinking there? Neither the changelog nor the code explains this.

I don't really go into great detail in the changelog, or comments inthe code, because I'm not really doing anything new with respect tolocking in this commit. The patch simply wraps the existingmutex_{lock,unlock}() calls so that we can track the mutex owner. Itdoesn't fundamentally change the locking, it's a quick patch to getrid our our __mutex_owner() usage as Peter doesn't want anyone,outside the mutex code, to use that function.

Based on your comments above, I'm guessing some of themisunderstanding comes from the__mutex_owner()/audit_ctl_owner_current() call in audit_log_start().We try to determine the mutex/lock owner in audit_log_start() notbecause we are trying to avoid a recursive lock, we do the check as anoptimization to skip the normal queue managment so that the lockholder isn't subject to the same rescheduling/queue-management (is"queue calming" a term?) as regular tasks.