User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.1 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.1
Steps to reproduce:
• Get on a Mac running macOS 10.12.
• Download and install BookMacster version 2.2.18 from here:
https://sheepsystems.com/bookmacster/BookMacster.2.2.18.zip
(Do not get it from the link given on my website because I hope to soon have it replaced with a newer version 2.3 which contains a new WebExtension instead.)
• Launch Firefox 53.0b9 (64-bit) Build ID 20170403072723.
• Launch BookMacster. Close or cancel any windows that open.
• Click in the menu: BookMacster > Manage Browser Extensions. A window will open.
• In the top "Syncing" section, row for "Firefox", click button "Install", and follow the instructions given.
• Relaunch Firefox 53.0b9.
• Click in the menu Tools > Add-Ons and verify that you have Sheep Systems Sync Extension version 330 loaded.
• Return to the Manage Browser Extensions window in BookMacster and click "Test" in the row for Firefox. This sends a message to the .dylib in my extension.
Actual results:
Firefox 53 crashes immediately. This is reproducible. Typical crash report:
https://crash-stats.mozilla.com/report/index/bp-9b566232-4dd8-448d-90a7-855cf2170407
Expected results:
Something more graceful, such as Firefox preventing the extension from loading or just not responding to the function call.

The crash occurs because the XPCOM dylib in my old extension calls _NS_GetServiceManager. Because there is no such symbol in Firefox 53 due to the FIX of Bug 1306329, Firefox crashes.
Mike Hommey asked if my dylib was weak linked. Maybe so. The old XPCOM interface was quite complicated. I do remember that, at some point, I had to build a "xulrunner" library and link it to my dylib. But I don't have that xulrunner any more.
Now, of course the way it will happen in real life is that users who have been running version 2.2.18 of BookMacster or similar apps, and have installed my old extension, and been happily using it in Firefox 52 or earlier, and have not updated BookMacster before the week of April 18 because they have automatic updating turned off, will find that, after Firefox updates itself to version 53, Firefox will crash whenever they add, delete or change a bookmark in Firefox, or perform certain operations in my app.
Please note that there is apparently nothing in Firefox 53 which prevents my old extension from loading. The install.rdf in my old extension specifies the maxVersion for Firefox = 51.0, but I read somewhere that Firefox has never enforced this :(

So, I took a quick look at your addon, and what it's doing is that it's loading a native library with ctypes. Obviously, anything bad can happen in that case, but we should probably avoid random runtime problems when libs loaded with ctypes use symbols that can't be resolved.
Essentially, the problem is that we're loading the ctypes libs with RTLD_LAZY, which makes missing symbols lead to crashes when they are used, rather than the lib failing to load at all.
I think the obvious fix here is something like:
--- a/js/src/ctypes/Library.cpp
+++ b/js/src/ctypes/Library.cpp
@@ -143,17 +143,17 @@ Library::Create(JSContext* cx, HandleValue path, const JSCTypesCallbacks* callba
pathStr->length(), pathBytes, &nbytes));
pathBytes[nbytes] = 0;
}
libSpec.value.pathname = pathBytes;
libSpec.type = PR_LibSpec_Pathname;
#endif
- PRLibrary* library = PR_LoadLibraryWithFlags(libSpec, 0);
+ PRLibrary* library = PR_LoadLibraryWithFlags(libSpec, PR_LD_NOW);
#ifndef XP_WIN
JS_free(cx, pathBytes);
#endif
if (!library) {
#define MAX_ERROR_LEN 1024
char error[MAX_ERROR_LEN] = "Cannot get error from NSPR.";
But I don't know if some things are actually relying on the RTLD_LAZY behavior... (and would in turn, break ; but they would at least break more gracefully than the opposite breakage)
Benjamin, what do you think?

(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> I'm really only worried about whether this affects startup performance
> because os.file uses ctypes.
As of bug 1349389, we don't load osfile.jsm at startup. (Which caused some regressions because various code expected that we did, but that's another story...)

(In reply to Andrew McCreight [:mccr8] from comment #5)
> As of bug 1349389, we don't load osfile.jsm at startup. (Which caused some
> regressions because various code expected that we did, but that's another
> story...)
Oh, sorry, that's just for content processes, so probably irrelevant to your concern.

(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> Oh wow, I definitely don't think we should be using RTLD_LAZY for ctypes!
> That's a footgun.
>
> I'm really only worried about whether this affects startup performance
> because os.file uses ctypes.
Mmmm I wonder what the linker actually does when an already RTLD_LAZY-loaded lib is then loaded with RTLD_NOW...
(looking glibc source) it looks like, from a quick glance at the code, that it doesn't do anything wrt symbols. The question is still open for OSX, though, but I think it's likely it's the same.
That still leaves us with the xdb.so problem on hazard builds, though :-/

[Tracking Requested - why for this release]:
Given that there is a possibility that a number of add-on ill crash Firefox as soon as their code runs (potentially startup), this should be on the release management radar. It looks like the concrete add-on listed here is now blocklisted but we may want to keep our eyes open to see if there are others.

We can track this for 53, but we're already heading into the RC build today.
This sounds maybe more complicated than we can handle before the release.
glandium, are you intending to take this on or do you know who might?

What's happening is that the hazard analysis runs a script using the JS shell to load in databases constructed earlier, and it uses ctypes to dlopen xdb.so. It looks like xdb.so has a number of undefined gmp symbols that it never actually calls, which was previously harmless but with RTLD_NOw will complain about. The set of symbols (for my local build, at least, which might be using different versions of things):
U __gmpz_add_ui
U __gmpz_clear
U __gmpz_cmp_ui
U __gmpz_fits_slong_p
U __gmpz_get_si
U __gmpz_get_str
U __gmpz_init
U __gmpz_mul_si
U __gmpz_sizeinbase
I don't really understand why, given that I'm linking to gmp statically. But I guess I either need to force it to resolve these symbols against the static lib, or stop it from emitting those symbols in the first place. Looking...

I double checked what happens in practice when a library previously dlopen'ed with RTLD_LAZY is re-dlopen'ed with RTLD_NOW, on both Linux and OSX. In both cases, there is no forced bindings happening on the second dlopen, which means OS.file (or anything else) dlopen'ing libxul through ctypes won't cause symbols to be bound.
FWIW, this is how the behavior is defined in POSIX:
RTLD_NOW
All necessary relocations shall be performed when the object is first loaded.
(...)
If a file is specified in multiple dlopen() invocations, mode is interpreted at each invocation. Note, however, that once RTLD_NOW has been specified all relocations shall have been completed rendering further RTLD_NOW operations redundant and any further RTLD_LAZY operations irrelevant.
(http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html)

Comment on attachment 8856890[details]Bug 1354395 - Update sixgill to a rebuild that links against GMP statically.
Approval Request Comment
[Feature/Bug causing the regression]: This patch fixes a problem that is unveiled by the second patch in this bug.
[User impact if declined]: Hazard build bustage when landing the other patch.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: On autoland, yes.
[Needs manual test from QE? If yes, steps to reproduce]: N/A
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: It shouldn't be.
[Why is the change risky/not risky?]: It's a rebuild of the compiler plugin used for the hazard builds. AIUI, there are unit tests checking that the plugin catches what it's supposed to catch.
[String changes made/needed]: N/A

Comment on attachment 8856891[details]Bug 1354395 - Always bind symbols at load time for ctypes libraries.
Approval Request Comment
[Feature/Bug causing the regression]: Technically, this has been a problem for a long time, but the removal of some core symbols made the problem more apparent in 53. However, it's late in the 53 schedule to fully appreciate the risk of this change. It may break addons that otherwise work fine, while fixing crashes due to addons that load native code with ctypes and use symbols that aren't available.
[User impact if declined]: Crashes when using addons that load native code with ctypes and use symbols that aren't available.
[Is this code covered by automated tests?]: Only for the "normal" case (where ctypes-loaded libraries don't have missing symbols)
[Has the fix been verified in Nightly?]: On autoland.
[Needs manual test from QE? If yes, steps to reproduce]: N/A
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: To some degree.
[Why is the change risky/not risky?]: It may break addons that work fine currently (because for some reason they don't actually use the missing symbols, so lazy binding makes the missing symbols a non-issue), while fixing crashes due to addons that load native code with ctypes and use symbols that aren't available.
[String changes made/needed]: N/A