(In reply to comment #4)
> Giving it P2, this is quite bad.
My thinking on this was to introduce a set of functions in QNetworkReplyHandler or utility function in WebCoreSupport that sniff for HTML content when the Content-Type header is empty. This isn't full on mime-type sniffing but it resolves this bug and might be a good way to dip our toe in the water.
I'm also not totally sure that sniffing should happen in QNAM - maybe actual content versus content reported in headers is something only QtWebKit should worry about. Putting in QNAM seems like a bit of a layering violation, I'm not sure.
I'd like to solve this bug so thoughts on the right place to put the solution welcome!

(In reply to comment #5)
> (In reply to comment #4)
> > Giving it P2, this is quite bad.
>
> My thinking on this was to introduce a set of functions in QNetworkReplyHandler or utility function in WebCoreSupport that sniff for HTML content when the Content-Type header is empty. This isn't full on mime-type sniffing but it resolves this bug and might be a good way to dip our toe in the water.
>
> I'm also not totally sure that sniffing should happen in QNAM - maybe actual content versus content reported in headers is something only QtWebKit should worry about. Putting in QNAM seems like a bit of a layering violation, I'm not sure.
>
> I'd like to solve this bug so thoughts on the right place to put the solution welcome!
I agree that the sniffing should not be in QNAM. Robert, do you plan to work on this issue soon?

Another interesting link (from abarth):
Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves
Adam Barth, Juan Caballero, and Dawn Song
In Proc. of the 30th IEEE Symposium on Security and Privacy (Oakland 2009)
http://www.adambarth.com/papers/2009/barth-caballero-song.pdf

Attachment 88170[details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style']" exit_code: 1
Source/WebKit/qt/tests/MIMESniffing/tst_MIMESniffing.cpp:20: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4]
Source/WebCore/platform/network/qt/MIMESniffing.cpp:20: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4]
Total errors found: 2 in 27 files
If any of these errors are false positives, please file a bug against check-webkit-style.

(In reply to comment #14)
> I just looked quickly over the chromium version and they have good comments on wehre things comes from. You should do the same:
>
> // Whether a given byte looks like it might be part of binary content.
> // Source: HTML5 spec
> static char kByteLooksBinary[] = {
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, // 0x00 - 0x0F
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, // 0x10 - 0x1F
>
> static const char* kSniffableTypes[] = {
> // Many web servers are misconfigured to send text/plain for many
> // different types of content.
> "text/plain",
> // We want to sniff application/octet-stream for
> // application/x-chrome-extension, but nothing else.
> "application/octet-stream",
> // XHTML and Atom/RSS feeds are often served as plain xml instead of
> // their more specific mime types.
> "text/xml",
> "application/xml",
> };
I added some pointers to the spec in comments in the source code.
For example the binaryFlags array came from this:
+-------------------------+
| Binary Data Byte Ranges |
+-------------------------+
| 0x00 -- 0x08 |
| 0x0B |
| 0x0E -- 0x1A |
| 0x1C -- 0x1F |
+-------------------------+
I simplly created a function returning true or false based in this information and then used qDebug() to generate the array you see in the code, considering that it would improve performance. Then I created a static inline method with a descriptive name (isBinaryChar) just to do the table lookup.
I did the same now for white spaces.
>
> etc etc... Things like this makes the code more understandable and somewhat reviewable.
>
> link: http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?view=markup
I tried to give to the methods clear names and in several occasions I created static inline methods with descriptive names to make the code more readable. My understanding is that this is the recommended way to proceed. But some short comments pointing to the spec were added to the code as well.

(In reply to comment #17)
> Attachment 88170[details] did not pass style-queue:
>
> Failed to run "['Tools/Scripts/check-webkit-style']" exit_code: 1
>
> Source/WebKit/qt/tests/MIMESniffing/tst_MIMESniffing.cpp:20: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4]
> Source/WebCore/platform/network/qt/MIMESniffing.cpp:20: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4]
> Total errors found: 2 in 27 files
>
>
> If any of these errors are false positives, please file a bug against check-webkit-style.
Those errors were expected. I did it to be able to implement the Qt tests without needing to add too much of WebKit to its include path.
MIMESniffing does not depend on anything from WebKit or Qt.

Comment on attachment 88170[details]
patch
Second round.
View in context: https://bugs.webkit.org/attachment.cgi?id=88170&action=review
Any reason to put MIMESniffing in .../qt/ ?
I would guess other ports might be interested, it could be easier to have it in platform/network.
>> Source/WebCore/platform/network/qt/MIMESniffing.cpp:86
>> +static const char binaryFlags[256] = {
Note to self: some Python says the talbe is correct :)
> Source/WebCore/platform/network/qt/MIMESniffing.cpp:321
> + mimeType = findMimeType(data, dataSize, bomTypes, bomTypesSize);
> + if (mimeType)
> + return mimeType;
Isn't that overkill to use MagicNumbers for this condition?
it could be:
if (dataSize >= 2)
if ((data[0] == 0xFE && data[1] 0xFF) // UTF-16BE BOM
|| (data[0] == 0xFF && data[1] 0xFE)) // UTF-16LE BOM
return "text/plain";
etc
The problem I have is that MagicNumbers is a big beast with masking and the part to skip spaces. You are abusing it here which increase complexity while not increasing code reuse IMHO.
> Source/WebCore/platform/network/qt/MIMESniffing.cpp:477
> +// http://tools.ietf.org/html/draft-abarth-mime-sniff-06#page-6
> +MIMESniffer MIMESniffer::create(const char* advetizedMIMEType, bool isSupportedImageType)
> +{
> + if (!advetizedMIMEType)
> + return MIMESniffer(512, &MimeSniffing::unknownTypeSniffingProcedure);
> +
> + if (MimeSniffing::isTextOrBinaryType(advetizedMIMEType))
> + return MIMESniffer(512, &MimeSniffing::textOrBinaryTypeSniffingProcedure);
> +
> + if (MimeSniffing::isUnknownType(advetizedMIMEType))
> + return MIMESniffer(512, &MimeSniffing::unknownTypeSniffingProcedure);
> +
> + if (MimeSniffing::isXMLType(advetizedMIMEType))
> + return MIMESniffer();
> +
> + if (isSupportedImageType)
> + return MIMESniffer(8, &MimeSniffing::imageTypeSniffingProcedure);
> +
> + if (!strcmp(advetizedMIMEType, "text/html"))
> + return MIMESniffer(512, &MimeSniffing::feedTypeSniffingProcedure);
> +
> + return MIMESniffer();
> +}
This code just determine the parameters of the class internals and pass them to the constructor. It is not clear to me why you need the ::create() function instead of passing (const char* advetizedMIMEType, bool isSupportedImageType) to the constructor and let it populate the internals?
> Source/WebCore/platform/network/qt/QMimeTypeSniffer.cpp:32
> + , m_sniffer(MIMESniffer::create(advertisedMimeType.toLatin1().constData(), isSupportedImageType))
> + , m_isFinished(!m_sniffer.dataSize() || sniff())
I suggest to put those in the function body, not in the initialization.
It is tricky to notice the intialization of m_isFinished call sniff().
I also think using m_sniffer.dataSize() is a bad condition. You must know how the MIMESniffer works to understand this condition. I would rather have something like "isValid()" or "typeMayNeedDynamicSniffing()" or something alike.
> Source/WebCore/platform/network/qt/QMimeTypeSniffer.cpp:44
> + if (!m_reply->isFinished() && m_reply->bytesAvailable() < m_sniffer.dataSize())
> + return false;
Couldn't the sniffing be more dynamic than that? For me it looks like there is no reason to always wait for dataSize(). You just need one invalid character to determine some types.
Let lay you are in "Text or Binary" and the first n (<3) characters are a UTF BOM, you can already say it is text/plain.
This is just an idea to take the decision as early as possible so the rest of the engine can start working.
> Source/WebCore/platform/network/qt/QNetworkReplyHandler.cpp:182
> + if (m_sniffer) {
> + delete m_sniffer;
> + m_sniffer = 0;
> + }
It would be cleaner to store QMimeTypeSniffer with a OwnPtr. I am not a fan of the mutliple copy of this code.
Also note that the if() is unnecessary, it is valid to delete a null pointer.

Attachment 88716[details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebKit/qt/tests/MIMESniffing/tst_MIMESniffing.cpp:20: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4]
Total errors found: 1 in 27 files
If any of these errors are false positives, please file a bug against check-webkit-style.

(In reply to comment #23)
> (From update of attachment 88170[details])
> Second round.
>
> View in context: https://bugs.webkit.org/attachment.cgi?id=88170&action=review
>
> Any reason to put MIMESniffing in .../qt/ ?
> I would guess other ports might be interested, it could be easier to have it in platform/network.
ok
>
> >> Source/WebCore/platform/network/qt/MIMESniffing.cpp:86
> >> +static const char binaryFlags[256] = {
>
> Note to self: some Python says the talbe is correct :)
>
> > Source/WebCore/platform/network/qt/MIMESniffing.cpp:321
> > + mimeType = findMimeType(data, dataSize, bomTypes, bomTypesSize);
> > + if (mimeType)
> > + return mimeType;
>
> Isn't that overkill to use MagicNumbers for this condition?
> it could be:
> if (dataSize >= 2)
> if ((data[0] == 0xFE && data[1] 0xFF) // UTF-16BE BOM
> || (data[0] == 0xFF && data[1] 0xFE)) // UTF-16LE BOM
> return "text/plain";
> etc
>
> The problem I have is that MagicNumbers is a big beast with masking and the part to skip spaces. You are abusing it here which increase complexity while not increasing code reuse IMHO.
I created a new method for the simple type (the ones that do not need masking or space skipping).
>
> > Source/WebCore/platform/network/qt/MIMESniffing.cpp:477
> > +// http://tools.ietf.org/html/draft-abarth-mime-sniff-06#page-6
> > +MIMESniffer MIMESniffer::create(const char* advetizedMIMEType, bool isSupportedImageType)
> > +{
> > + if (!advetizedMIMEType)
> > + return MIMESniffer(512, &MimeSniffing::unknownTypeSniffingProcedure);
> > +
> > + if (MimeSniffing::isTextOrBinaryType(advetizedMIMEType))
> > + return MIMESniffer(512, &MimeSniffing::textOrBinaryTypeSniffingProcedure);
> > +
> > + if (MimeSniffing::isUnknownType(advetizedMIMEType))
> > + return MIMESniffer(512, &MimeSniffing::unknownTypeSniffingProcedure);
> > +
> > + if (MimeSniffing::isXMLType(advetizedMIMEType))
> > + return MIMESniffer();
> > +
> > + if (isSupportedImageType)
> > + return MIMESniffer(8, &MimeSniffing::imageTypeSniffingProcedure);
> > +
> > + if (!strcmp(advetizedMIMEType, "text/html"))
> > + return MIMESniffer(512, &MimeSniffing::feedTypeSniffingProcedure);
> > +
> > + return MIMESniffer();
> > +}
>
> This code just determine the parameters of the class internals and pass them to the constructor. It is not clear to me why you need the ::create() function instead of passing (const char* advetizedMIMEType, bool isSupportedImageType) to the constructor and let it populate the internals?
sure.
>
> > Source/WebCore/platform/network/qt/QMimeTypeSniffer.cpp:32
> > + , m_sniffer(MIMESniffer::create(advertisedMimeType.toLatin1().constData(), isSupportedImageType))
> > + , m_isFinished(!m_sniffer.dataSize() || sniff())
>
> I suggest to put those in the function body, not in the initialization.
> It is tricky to notice the intialization of m_isFinished call sniff().
I forgot this one in last patch. :( if I receive a r+ I can make this change before landing. if not I will put it in next patch.
>
> I also think using m_sniffer.dataSize() is a bad condition. You must know how the MIMESniffer works to understand this condition. I would rather have something like "isValid()" or "typeMayNeedDynamicSniffing()" or something alike.
>
> > Source/WebCore/platform/network/qt/QMimeTypeSniffer.cpp:44
> > + if (!m_reply->isFinished() && m_reply->bytesAvailable() < m_sniffer.dataSize())
> > + return false;
>
> Couldn't the sniffing be more dynamic than that? For me it looks like there is no reason to always wait for dataSize(). You just need one invalid character to determine some types.
>
> Let lay you are in "Text or Binary" and the first n (<3) characters are a UTF BOM, you can already say it is text/plain.
>
> This is just an idea to take the decision as early as possible so the rest of the engine can start working.ng.
Yes, but we will wait for the whole content anyway. And sniffing will not wait for a large amount of data. I decided to keep it simple.
>
> > Source/WebCore/platform/network/qt/QNetworkReplyHandler.cpp:182
> > + if (m_sniffer) {
> > + delete m_sniffer;
> > + m_sniffer = 0;
> > + }
>
> It would be cleaner to store QMimeTypeSniffer with a OwnPtr. I am not a fan of the mutliple copy of this code.
> Also note that the if() is unnecessary, it is valid to delete a null pointer.
sure. The if was left there after some refactoring as well.
I forgot this one as well. But will put it in code before landing if I receive the r+. otherwise it will be in next patch.

Attachment 88881[details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebKit/qt/tests/MIMESniffing/tst_MIMESniffing.cpp:21: You should add a blank line after implementation file's own header. [build/include_order] [4]
Total errors found: 1 in 27 files
If any of these errors are false positives, please file a bug against check-webkit-style.

(In reply to comment #35)
> > Hum, not a fan of this. Why not add a new setting?
>
> This makes the configuration possible without making it public and without creating compatibility issues. :)
I am often the first to shoot when someone want to add new APIs but I think in this case I think it is ok. I think the detection should be enabled by default, but could be disabled by user if needed for whatever reason.
If you think new API is a bad idea, we should enable it by default, and disable with the property _q_MIMESniffingDisabled. I think MIME sniffing is a good think for our "work out of the box" philosophy.

(In reply to comment #36)
> (In reply to comment #35)
> > > Hum, not a fan of this. Why not add a new setting?
> >
> > This makes the configuration possible without making it public and without creating compatibility issues. :)
>
> I am often the first to shoot when someone want to add new APIs but I think in this case I think it is ok. I think the detection should be enabled by default, but could be disabled by user if needed for whatever reason.
>
> If you think new API is a bad idea, we should enable it by default, and disable with the property _q_MIMESniffingDisabled. I think MIME sniffing is a good think for our "work out of the box" philosophy.
it is enabled by default as it in now, but I will change the property from _q_MIMESniffingEnabled to _q_MIMESniffingDisabled.