This is a follow-up to the conversation in https://bugs.webkit.org/show_bug.cgi?id=111255
I am working on a patch that adds a test case verifying that FormData filenames and MIME types with special characters are escaped properly in HTTP headers.

@Darin Adler: Thank you so much for the quick answer, and I'm really sorry for wasting your time with those mistakes! I'm still getting used to the flow. I hope the patch I just uploaded addresses your feedback.
Can you please take another look? I'll also watch the buildbot status.

Comment on attachment 191393[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=191393&action=review> LayoutTests/http/tests/local/formdata/send-form-data-filename-escaping-expected.txt:8
> +file1=not-hax.txt:text/plain:1234567890&file3=more%22hax%0D%0A.txt:application/h\"ax:
> +1234567890&file4=also-not-hax.txt:text/plain:1234567890
Tests for dangerous characters in Content-Type were added in bug 99983, so you probably don't need to add them here, unless you noticed an opportunity to extend test coverage.
This test output is not very clear. What are the real pass/fail criteria? I can see a newline in the output - why is that not bad?

@Alexey Proskuryakov: I thought the newline is caused by some line wrapping somewhere, and I realized that's not true. In fact, this was showing that the newline leaks from the MIME type.
The tests that you pointed me to seem to cover the code in XMLHttpRequest::send(Blob* body, ExceptionCode& ec), whereas my tests cover FormDataBuilder::addContentTypeToMultiPartHeader(Vector<char>& buffer, const CString& mimeType)
I changed the test case to sneak in a Content-Length header. Without the FormDataBuilder patch, that header doesn't show up in the MIME type or file contents, so I'm guessing it is sent as a header.

Also, for the record, explanation for my approach to fixing this problem.
The File API spec [1] says that the type parameter must be used as the Content-Type header value even if it's not a valid MIME type. At the same time, the HTTP spec [2] says that a HTTP receiver may replace any LWS (CR+LF in a header value) by one SP (space) without changing the semantics of the HTTP header value. So I'm taking advantage of this and replacing CRLF -> SP in the Content-Type header value.
[1] http://www.w3.org/TR/FileAPI/#constructorParams
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

I managed to build and test on Chromium for Mac and Linux. It seems like the quote (") in Content-Type shows up as \" everywhere except on Chromium/Linux.
I'm not sure if this is due to the testing environment or some piece of code that completely eludes me, but 1) I think it doesn't really matter, only CRLF can mislead HTTP header parsers and 2) if it does matter, I think it can be the subject of another patch.

> The tests that you pointed me to seem to cover the code in XMLHttpRequest::send(Blob* body, ExceptionCode& ec), whereas my tests cover FormDataBuilder::addContentTypeToMultiPartHeader(Vector<char>& buffer, const CString& mimeType)
Ah, I finally see. So that was about HTTP headers, and this is about MIME headers in multipart form submission. Neat.

Comment on attachment 191509[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=191509&action=review
I do not see anything in relevant specs that says how to handle invalid types for serialization. What I suggest doing is to check what Firefox does, probably match its behavior, and likely file a bug against XMLHttpRequest specification.
> Source/WebCore/platform/network/FormDataBuilder.cpp:82
> +static void appendStringWithoutCrLf(Vector<char>& buffer, const CString& string)
WebKit style would be "appendStringReplacingCRLF".
> Source/WebCore/platform/network/FormDataBuilder.cpp:84
> + unsigned length = string.length();
CString::length() returns a size_t, not an unsigned.
> Source/WebCore/platform/network/FormDataBuilder.cpp:85
> + for (unsigned i = 0; i < length; ++i) {
So index should be size_t too.
> Source/WebCore/platform/network/FormDataBuilder.cpp:86
> + unsigned char c = string.data()[i];
CString::data() returns a char, not an unsigned char.
> Source/WebCore/platform/network/FormDataBuilder.cpp:87
> + if (c == 0x0d && i + 1 < length && string.data()[i + 1] == 0x0a) {
It is not sufficient to replace CRLF sequences. Receivers are likely to treat isolated CR and LF bytes as line separators (they certainly do that in HTTP headers).
> Source/WebCore/platform/network/FormDataBuilder.cpp:88
> + // HTTP receivers may replace any LWS with a single SP.
I don't think that HTTP rules apply here, as these are MIME headers.
Generation of MIME headers is specified by HTML5 "multipart/form-data encoding algorithm", which quickly defers to RFC 2388. Said RFC doesn't consider the possibility of invalid type, just saying that Content-Type defaults to text/plain if not known.
> LayoutTests/http/tests/local/formdata/send-form-data-filename-escaping-expected.txt:7
> +file1=before-hax.txt:text/plain:1234567890&file2=more%22hax%0D%0A.txt%0D%0A:application/hax content-length: 20 :1234567890&file3=after-hax.txt:text/plain:1234567890
My previous comment still stands - the test does not have clear pass/fail criteria. If something changes in the output five years later, how will we know if the test really fails, or it's a benign change?
I suggest staring with splitting the test into multiple subtests, each with its own output and pass criteria.
> LayoutTests/http/tests/local/formdata/send-form-data-filename-escaping.html:42
> + testFailed("This test is not interactive, please run using DumpRenderTree");
Is this actually true? What prevents the test from passing?
For Content-Type, we don't need to have a real file, we can build the blob dynamically. If some subtests can be run interactively and others can not, please split this into multiple .html tests.

@Alexey Proskuryakov: I tried this out on Firefox Aurora, and it seems that they replace CR and LF with one space, and replace a CRLF sequence with one space.
I uploaded a patch that implements the same behavior, and addresses your other feedback. I also applied your feedback to the chunk of code that I had used as an example.
I restructured the test case. Is this getting close to what you'd like to see here?
Thank you very much for your patience and explanations!

I suggested filing a bug against XMLHttpRequest, however Anne thinks that Blob itself should be restricting characters in MIME types, and posted a suggestion to public-webapps mailing list: <http://lists.w3.org/Archives/Public/public-webapps/2013JanMar/0633.html>. It was then pointed out that WebKit already implements non-compliant restrictions on the type, and perhaps that should be extended to cover newlines, and standardized.
Either matching Firefox or doing this is fine with me. Let's wait a few days to see what the conclusion on the list is.

@Alexey Proskuryakov: Would it be OK to land this change today, if my promise that I will file a patch implementing the outcome of the public-webapps thread when it comes out?
I think the changes based on your feedback to appendQuotedString in FormDataBuilder.cpp make sense even if the CR/LF filtering happens in the Blob constructor.

If there is a reason to land this earlier, please say so. I cannot tell from your e-mail address if you work for a company that ships a WebKit based product, and could need this in an upcoming update.
There is an active discussion on the mail thread, and not waiting for it could mean extra work for us all.
You have improved the test case based on my suggestions, but I would really like to have an interactive test if possible, and to have pass/fail criteria. I still don't see how we would triage changes in this test's output in the future, as the output is quite cryptic.
> I think the changes based on your feedback to appendQuotedString in FormDataBuilder.cpp make sense even if the CR/LF filtering happens in the Blob constructor.
Could you please clarify? Keeping unreachable code in the tree makes it harder to learn how WebKit works, because one would assume that the code is reachable somehow.

Sorry, I was unclear when I wrote the previous comment. appendStringReplacingCRLF is the method I wrote, and that will be unreachable (actually, I will remove it) if the CR/LF prevention ends up being done in Blob.
However, I wrote that method trying to match the implementation in appendQuotedString (an existing method) as much as possible. I believe that your feedback applies to appendQuotedString as well, and that method is used to process the Blob filename, so it will not be dead code, no matter what ends up happening with the MIME type.
Regarding the tests: I'm sorry, I think I'm not getting it. Can you please point me to some examples of good interactive tests and/or documentation on what makes a good interactive test?

An interactive test may not be a very good name for it, I was just using it because of the comment in test, saying "This test is not interactive, please run using DumpRenderTree".
You removed the check in the latest patch (which I admittedly didn't notice at first), but does the test actually work when dragged into a browser window? The idea is that it's easier to debug failing tests in a browser, and it's also valuable to have tests that can be easily run inside other browsers for comparison.
A test that does not run in browser should print an explanation, like this one used to do.
> I believe that your feedback applies to appendQuotedString as well, and that method is used to process the Blob filename, so it will not be dead code, no matter what ends up happening with the MIME type.
This is correct. It does not make sense to wait for Blob discussion to do this.
Ideally, you would file a new bug just for cleanup, which you have a lot of in this patch (changing description of another test, cleaning up shared test script to use dataList[i].type instead of dataList[i]['type']).

Thank you for the guidance, I will file a cleanup bug.
The test "almost" works in a browser. It currently fails due to CORS restrictions when ran from the file:// URL provided by the test result.
I tried serving it off of the http server started by Tools/Scripts/run-webkit-httpd, but it won't work because it uses some files outside of LayoutTests/http (LayoutTest/fast/js/resources/js-test-{pre,post}.js)
On the other hand, I did use this test case to figure out Firefox's behavior, so I think it can be made to work properly in a browser with proper infrastructure, e.g. a variant of multipart-post-echo.php that emits CORS headers. Would it make sense for me to write that?

> I tried serving it off of the http server started by Tools/Scripts/run-webkit-httpd, but it won't work because it uses some files outside of LayoutTests/http (LayoutTest/fast/js/resources/js-test-{pre,post}.js)
These are available to http tests as well. Just use src="/js-test-resources/js-test-pre.js".

@Benjamin Poulain: thank you very much for catching that!
@Alexey Proskuryakov: /js-test-resources/js-test-{pre, post}.js works on Tools/Scripts/run-webkit-httpd, thank you very much for telling me about this!
Unfortunately, that path doesn't seem to work when I run tests with Tools/Scripts/run-webkit-tests. Also, I noticed you're not using that path in your empty filename test.
I used a horrible hack to get tests working in both places -- I have a <script> with the relative path, and one with the /js-test-resources path. I submitted the patch like this to get feedback from you and the buildbots. What do you think I should do?
Also, I reworked the tests. Am I heading in the right direction?

> Also, I noticed you're not using that path in your empty filename test.
This is because these tests are in http/tests/local. This directory is special - http server is available when they are run, but they are loaded from file:// URLs.
HTTP tests should use "/js-test-resources". http/tests/local tests should be avoided unless absolutely necessary, and then they should use relative paths to fast/js/resources.

@Alexey Proskuryakov: thank you for clarifying!
In that case, why don't the synchronous XHRs fail due to CORS? And, if I can't open a test in a browser, how is it interactive?
Sorry I keep asking questions, and thank you for your patience!

Local files have universal access when run as tests (like when disabling restrictions for local files in Safari's Develop menu).
In hindsight, one of my tests in bug 111687 shouldn't have been in http/tests/local.

Created attachment 196767[details]
Archive of layout-test-results from webkit-ews-02 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-02 Port: mac-mountainlion Platform: Mac OS X 10.8.2

Created attachment 196768[details]
Archive of layout-test-results from webkit-ews-01 for mac-mountainlion
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-01 Port: mac-mountainlion Platform: Mac OS X 10.8.2

Created attachment 196774[details]
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-14 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.2

The crashes seem to be caused by the "return emptyString()". They don't appear if I replace that with "return contentType" or "return contentType.lower()". Yet, for the life of me, I can't figure out what I'm doing wrong. I've seen other methods with a String type return emptyString() just fine. I'll read more code today or tomorrow.

The crash is actually here:
bool Blob::isValidContentType(const String& contentType)
{
size_t length = contentType.length();
if (contentType.is8Bit()) {
The is8bit function cannot be called on null strings. You should probably add an early return for this case.
> The crashes seem to be caused by the "return emptyString()". They don't appear if I replace that with "return contentType" or "return contentType.lower()".
That's surprising.

Alexey, thank you so much for getting me unstuck!
I tried to add some comments to WTFString, covering your answer and what I learned on my own. I hope this patch works out, and I'll work on a test for Blob.slice().
What do you think of my comments for the spec, do they seem reasonable?

> The File API specification describes a good method for interpreting a Blob's type property as MIME type, in 6.4.1 Paragraph 4. http://dev.w3.org/2006/webapi/FileAPI/#methodsandparams-blob
I'm not sure if this is what this text says, perhaps we are looking at different versions of it. Currently, 6.4.1.4 is about converting an input string to a value for the type attribute, not about using it afterwards. As such, it's entirely internal to the File API spec.
There is certainly a gap in specifications, as they never say that Blob's type should be used in form submission. But I'm not sure how to best address that.

Comment on attachment 197025[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=197025&action=review
I like the new behavior, but I think that another round is needed to improve style.
> Source/WebCore/fileapi/Blob.h:48
> +// Utilities for handling a Blob's type attribute.
> +namespace BlobType {
It is unusual in WebKit code base to put helpers in a namespace. And we try to not define two entities in one header file.
What you had in the previous iteration was perfectly fine, these functions can be in Blob class.
> Source/WebCore/platform/network/BlobData.h:169
> +// Declared in Blob.h, which includes this header.
> +namespace BlobType {
> +bool isNormalized(const String&);
> +} // namespace WebCore::BlobType
I understand why you needed this, but it's not a great thing to do. The correct things to do would be:
- put the functions in a separate header file,
- or just move setContentType() to BlobData.cpp. It's not hot code, and the function does not need to be inline.
> Source/WebCore/platform/network/BlobData.h:183
> + // FIXME: This would be useful, but it causes a link error in WebKit2.framework
> + // ASSERT(BlobType::isNormalized(contentType));
The reason why this fails to link is that the function needs to be exported using WebCore.exp.in file.
> LayoutTests/http/tests/local/formdata/send-form-data-mimetype-normalization.html:57
> +<script src="/js-test-resources/js-test-post.js"></script>
> +<script src="../../../../fast/js/resources/js-test-post.js"></script>
I still don't understand why both are needed.
> LayoutTests/http/tests/local/formdata/resources/send-form-data-common.js:48
> + if (sendAsAsync)
> + return null;
> + else
> + return dumpResponse(xhr, fileSliced);
WebKit style is to not have an "else" after "return".
I'd still like to have an explanation of the changes to this file in a ChageLog.

Alexey, thank you very much for your patience and feedback!
I hope I addressed all the feedback points this time around.
The double <script>s were hacks to get tests in http/tests/local to work when served from the webkit-httpd server. The right solution was to move them out of local. Is http/tests/xmlhttprequest a reasonable place for them to live in?
The JS "returns" pass the HTTP response text returned from the synchronous XHR back to the main JS test file, so I can check it. I needed this to make the tests interactive.
Does this make sense?

Created attachment 197207[details]
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-09 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.2

Alexey, thank you for the suggestion!
I moved the XHR-FormData-Blob tests to http/tests/fileapi and I fixed the tests that the buildbot complained about, except for fast/repaint/japanese-rl-selection-repaint-in-regions.html. I didn't see anything related to blobs there.