Can his version of Zend_Mail::_encodeHeader() be used to update the official version?

In the meantime, I am subclassing Zend_Mail, overloading Zend_Mail::_encodeHeader() with his version, as I need to send e-mail with Chinese subjects and *also* From and To addresses - Zend_Mail::_encodeHeader() is used in multiple methods.

Thank you for your kind consideration of this change.

Jonathan Maron

Comments

Posted by Tomas Markauskas (tamole) on 2008-08-09T15:05:34.000+0000

This solution would work only for short headers, because the RFC does not allow the encoded words in the headers to be longer than 75 characters. The string has to be split to avoid problems with multibyte characters and therefore has to be split between those characters. I had the same problem and now I have a solution which fixes it for utf-8 string (other charsets later).

I will post a patch in the next few days. It should also close a few other tickets related to Zend_Mail/Zend_Mime.

An encoded-word may not be more than 75 characters long, including
charset, encoding, encoded-text, and delimiters. If it is desirable
to encode more text than will fit in an encoded-word of 75
characters, multiple encoded-words (separated by CRLF SPACE) may be
used.

While there is no limit to the length of a multiple-line header
field, each line of a header field that contains one or more
encoded-words is limited to 76 characters.

The 'encoded-text' in an 'encoded-word' must be self-contained;
'encoded-text' MUST NOT be continued from one 'encoded-word' to
another. This implies that the 'encoded-text' portion of a "B"
'encoded-word' will be a multiple of 4 characters long; for a "Q"
'encoded-word', any "=" character that appears in the 'encoded-text'
portion will be followed by two hexadecimal characters.

Each 'encoded-word' MUST encode an integral number of octets. The
'encoded-text' in each 'encoded-word' must be well-formed according
to the encoding specified; the 'encoded-text' may not be continued in
the next 'encoded-word'. (For example, "=?charset?Q?=?=
=?charset?Q?AB?=" would be illegal, because the two hex digits "AB"
must follow the "=" in the same 'encoded-word'.)

Each 'encoded-word' MUST represent an integral number of characters.
A multi-octet character may not be split across adjacent 'encoded-
word's.

There is still one problem with my patch: the first line of the header will probably exceed the length of 76 characters because we don't count the length of the header name. Zend_Mime::encodeQuotedPrintableHeader should get the length of the header name, so it could reduce the length of the first "encoded-word". I haven't seen any cases where this would cause big problems, so I haven't resolved it.

I would like to know if there's any need for other character sets than UTF-8.
Another option to deal with multibyte characters would be to convert all multibyte character sets to UTF-8 in encodeQuotedPrintableHeader (see attached patch).