Windows Phone 7 data usage: finger-pointing and bad communication

Yahoo was yesterday officially outed as the company whose services were causing excessive data usage on Windows Phone 7 devices. The company also said that Yahoo would make a change in coming weeks to resolve the issue, implying that it was an issue on the mail provider's end. Windows Phone 7 uses the IMAP protocol to talk to Yahoo Mail, and other IMAP mail servers don't show the same data usage patterns as have been demonstrated with Yahoo Mail.

Yahoo issued a statement, however, that said the problem was specific to Microsoft's approach to IMAP mail, and that if Windows Phone 7 would "change to a standard way of integrating with Yahoo! Mail" then the problem would go away—and hence the problem is Microsoft's fault, for doing something nonstandard.

It turns out that Windows Phone 7 does indeed do something strange with its IMAP connections. IMAP gives each message in an inbox a unique ID, but instead of using these unique IDs to identify messages and determine what new messages need to be retrieved, Windows Phone 7 is using the message ID embedded into each e-mail. This is why Windows Phone 7 is requesting so much information from Yahoo's servers: it needs to learn the message ID of each message, and the command to do this results in the excessive output from Yahoo. This problem is Yahoo-specific, because other mail servers are not so verbose in their responses.

Yahoo is encouraging Microsoft to use the unique IMAP IDs instead of message IDs: if it does this, Yahoo Mail accounts will no longer cause excessive data usage and things will go back to normal. Windows Phone 7 doesn't appear to be doing anything wrong as such: message IDs should in principle be unique, so using them to track which messages have been downloaded or not may make sense. It's just not normal behavior: IMAP has a unique ID expressly for the purpose of identifying messages, and Yahoo expects clients to use it.

Microsoft is not entirely at fault for the root issue here. Windows Phone 7 asks IMAP servers for a limited amount of information for each e-mail: the unique ID, status flags (such as "unread"), and the message ID. Most servers, in response to such a query, will return just that limited set of information. Yahoo's servers, however, completely disregard the request for specific information, and return dozens of unwanted details about each e-mail, in addition to the explicitly requested information.

It's not clear that Yahoo is doing the right thing here: the IMAP specification appears to say that the server should only return the information requested. Even if sending additional, unwanted information is technically within the rules, Yahoo's servers appear to be inconsistent with what other IMAP servers do, and so a good case could be made that it should limit its verbosity and act like any other mail server, telling e-mail clients only what they ask for. This might be the change that Microsoft said the company will make in coming weeks.

Data usage is a big deal on a smartphone

But Microsoft deserves some blame, too. The way the company has communicated news of this problem has been, frankly, appalling. Make no mistake: data usage problems on a smartphone platform are a serious issue. A case could be made that they're at least as bad as, if not worse than, the typical security bugs that the company patches on Patch Tuesday each month. Many of those security flaws never suffer real-world exploitation, and often the result of exploitation is annoyance rather than any real damage. Excessive data usage on a smartphone, however, can lead directly to financial losses. Unlimited data plans are, for many smartphone users, going the way of the dodo, and the charges for breaking the increasingly common limits can be crippling. A bug that can result in a phone burning through 2GB of data in a month has the potential to land unwitting users with a hefty bill.

Even if the blame lies wholly with Yahoo, Microsoft should have been much faster at identifying the issue—Yahoo has been known to cause abnormal levels of data usage since November—and much more forthcoming about which phones are affected and what users can do to avoid overage charges. The company should also be clearer about when the problem will be fixed, and what form that fix will take. Is Yahoo going to change its servers so that they stop sending unwanted information, or will Microsoft have to ship a patch for Windows Phone 7?

The company may have a good reason for using the message ID instead of the unique ID for tracking messages: IMAP servers are not required to make the unique IDs persistent, meaning they are permitted to change each time a client connects to check for e-mail. Message IDs, being coupled to the messages themselves, won't ever change. As such, using message IDs to detect if any messages need downloading to the phone does make some sense. And for servers that return only the limited information that Windows Phone 7 requests, this approach is bandwidth efficient, too.

But if every other widely used mail client, including those for iOS and Android, uses some other technique for figuring out which messages to download, there would be value to Microsoft in following suit—it makes situations like this much less likely to occur. Microsoft might technically be in the right, but being technically right is of little comfort if it means getting a massive mobile data bill. If Yahoo can't deliver a timely fix, Microsoft ought to step in.

Microsoft's end-user communication about Windows Phone 7 has been poor in general. Though it's no secret that an update is due within the first quarter, the company has refused to specify either when, exactly, the update would arrive (February 7 is the current rumor), or what, specifically, it would contain. All we know for certain is that it will improve start-up performance for many applications, that it will provide support for copy-and-paste, and that some performance and crashing issues in the Marketplace application will be addressed. We don't know if it will provide any other features or bug fixes, even though they are sorely needed.

When the subject of the communication is feature improvements, the poor communication is frustrating, but tolerable. When it's something serious, as is the case with this Yahoo issue, it is more than frustrating: it is downright irresponsible. The company can, and should, do better.

This IMHO is more of a Yahoo problem. Microsoft might not be requesting the same information that say iOS does, but it is requesting 3 items and getting back 25 items... That is Yahoo's fault. Now Microsoft should probably just setup their imap settings to be like other devices, but what MS is doing is within the spec and Yahoo is outside the spec. I certainly don't write SQL queries and expect additional data to be returned.

Simple solution: avoid using both Microsoft and Yahoo products/services.

QFT! Both doing something that "normally" works but in some cases leads to bloat somehow epitomizes companies that do not seem to care for nice, clean and efficient products.

Microsoft probably uses the messageid for some things anyway so doesn't bother to implement using the uniqueid as well to save some bandwidth (because lets face it even if other IMAP servers do return less data when querying email metadata using the uniqueid still would use less data.

And Yahoo implementing functions in a way that work but waste bandwidth as well. As if a SQL server would always return all data of a table regardless which columns are queried.

Conclusion: Its only a small thing and its easy to read too much into it but how about using a nice Android or iPhone with Gmail?

/On a side note, I'm done with Yahoo and with ma$$ivesoft's handheld industry. They created the Zune HD (which I love, by the way) and then offer only minimal support for it. Zune marketplace, at least, isn't the absolute resource-hog that iTunes is, and I like the design but they've screwed me over on song purchases more than enough.

Is Microsoft really using the client-generated Message-ID for checking if messages have been downloaded? That could be a bit problematic - first of all Message-IDs are technically optional (as per RFC 5322), and the client does not *have* to generate one at all. Secondly some clients are in the habit of generating potentially non-unique IDs - this is a particular problem with spam, e.g. this is the Message-ID of a spam I received recently:

Message-ID: <d22ae1e467934eefe185d155e3ce486f@10.0.0.3>

The left-hand side probably makes it unique (at least per mailbox). Of course I'm not sure I care if a spam message does not get downloaded :-)

As I understand it these issues with the non-uniqueness of Message-ID is one of the reasons IMAP has it's own unique ID for messages in a mailbox.

I'm going to say Yahoo and Microsoft are both at fault here - if Yahoo is returning extra data in a non-standardized way, then they are screwing up the chances of good interoperation; if Microsoft is implementing it's own algorithms when the standard has something different, then likewise they are defeating the purpose of having a standard.

Yes MS screwed the pooch on this and didn't communicate well. However the problem is with IMAP in general and email protocols in specific.

The problem is, and MS recognized this and is using an excellent solution, that there is no globally unique id for a message in IMAP (Or Exchange Server!). Thus if you are syncing contents in folder, your only choice is to get the ids that are available based on the server. Then compare those and delete any that you have that the other side doesn't have, and add any that you don't have.

This works fine with one folder and no deletions etc. However as soon as you delete or move an email to another folder, that email gets a completely new id and thus you have to download the message again!

This is a HUGE waste of bandwidth. Especially on client computers. Ironically Exchange Server does the same, and even with MS's web services support, Exchange Server (which was just an IMAP server when they bought it) still doesn't have a truely GUID.

The solution of course is to recognize that every message is PERSISTANT on a server and do two things:

1. Give every message a GUID that is persistent wherever it is in the message store (including across message stores).2. Give every message a LastModifiedDate field that contains to the mllisecond when its state was last changed.

Then from this information you can simply request all changes from a server from a certain time (which you also get before you start so you're not using the local time of the client and you keep this on file) and get the GUIDs and likely the folder that the message is in.

By doing this you have a single request that is incredibly small, a single response that is incredibly small, and everything you need to be able to determine if you need to get the message because you've never seen it, delete the message because it was deleted, move it to another folder without re-download and update it's flags such as read etc. if nothing else changed by simply requesting the flags. (you can do this in one request if you wanted with all changes, the GUID, flags and folder which would be even more efficient.)

IMAP is a broken protocol that doesn't work for synced systems the way we have it now, so you have to resort to all kinds of stupid hoop jumping to make it work and minimize data. Heck, it's impossible to know what you have to request, and you can't just use the last id, because messages may have been deleted that you need to delete out of the previous sequence, so in effect you have to get ALL messages per folder and go through them which is a huge waste in bandwidth. Exchange Active Sync and EWS is only slightly better in this reguard.

MS's solution to using the message-ID is entirely valid because now they don't have to re download messages and the identity of the message is persistent. They don't have to worry about the IMAP id changing. In practice however you have to use the message date as well as the message-ID of the actual message because you will get duplicates. <b>Especially from Yahoo</b> because Yahoo's implementation is broken.

Yahoo returning more than requested is a broken implementation. What MS is doing is right and given that you request a single piece of information, you should responsibly expect that you get a single piece of information back, not everything.

But this isn't new to Yahoo. They also do not encode inline images properly either. The MIME protocol includes specific instructions for how inline attachments should be noted in the mime stream. They should be noted as inline! However Yahoo does not note them as inline thus you cannot just request inline attachments to display the message without downloading things like Excel spreadsheets. To make matters worse, Yahoo does (sometimes!) mark things like Excel spreadsheets as inline when they are just regular attachments! While that is technically ok, (old versions of Outlook let you drag excel files inline into the message the same way but for a specific purpose) it's a HUGE waste of bandwidth when simply trying to display the body of a message because you have to download the excel file to ensure you have everything.

Thus: YAHOO MAIL IS COMPLETELY AND UTTERLY SCREWED UP AND NOT STANDARDS COMPLIANT.

1. They are returning huge amounts of data for a small query. (the WP7 bug)2. They encode MIME incorrectly by making inline images standard attachments when they should be marked as inline thus preventing images to be downloaded that are part of the message and NOT other attachments too.3. They encode MIME incorrectly (sometimes) and mark real attachments as inline when they are not inline at all.

We wrote our own email client, and what we realized is that all mail servers and clients except Yahoo and Apple work exactly the same and follow the standards quite well, even if the standards are broken and wasteful. Thus we wrote our email client based on all of these. We then went along and created custom code paths for Yahoo for their mail servers and then wrote a custom MIME decoding system for both Yahoo and one for Apple Mail to work around their bugs.

WE HAVE TO MAINTAIN SEPARATE CODE JUST TO DEAL WITH THESE TWO HORRIBLY WRITTEN SYSTEMS. It's aweful.

I wish someone would publish and the world would adopt a good XML REST based mail system that allowed for optimized sync as outlined above. It would make things so much easier. Then we could write a test system that hit your implementations and tested them like ACID for HTML/Javascript and could shame the culprits into fixing their mess.

Again: THIS IS NOT MICROSOFT'S FAULT. It's Yahoo's. Sad that they didn't disclose sooner, but I would guess that they were in a catch 22 because of their search agreements with Yahoo.

I love how there are all these articles stating the poor sales of the Win7 phone. Then there is an issue that only affects this tiny user base that actually bought Win7 phones so far, and then even only if those users actually have a yahoo email account. And then only if they actually hooked up the yahoo email account on the phone. So that affected like 10 people probably... but hey, any chance to bash something that isn't Apple or Google right???

Microsoft has chosen a non-standard way of implementing a standard and Yahoo has been lazy in implementing a standard. This shouldn't come as much of a surprise to anyone who's been paying any attention in the last couple of decades.

I don't see why Microsoft needs to do their message requests identically to iOs, Android, et. al. They realized that having persistent ids would be beneficial to them and they engineered their mail software to capitalize on this. Isn't that what competition is all about? They didn't make anything proprietary; they're still using the standard IMAP rules. I have no idea why Yahoo would give that response for such a simple query.

As for messaging: You can tell that MS has the update all ready, and I imagine carrier testing is what has held up the situation so far. We're not told every detail because Microsoft has been really harping on the underpromise kick, having learned from Vista. If they'd promised a January update (which you can tell they expected, from the url on their update page) then we'd be pretty pissed right now.

I love how there are all these articles stating the poor sales of the Win7 phone. Then there is an issue that only affects this tiny user base that actually bought Win7 phones so far, and then even only if those users actually have a yahoo email account. And then only if they actually hooked up the yahoo email account on the phone. So that affected like 10 people probably... but hey, any chance to bash something that isn't Apple or Google right???

Unfortunately for Microsoft, the takeaway from all this is that you should wait before buying a Win7 phone.

Where the fault lies is not the issue. The true issue is that this problem is still unresolved, and that the problem exists only for the Windows Phone 7. There are other - more mature - smartphone choices that are not having this problem with Yahoo.

People will justifiably wonder what other issues exist with the Windows Phone 7 platform, and they will worry that they might get stuck with months of excess data charges if another issue like this pops up. Consumers don't want to hear about IMAP implementation issues. They want a device that they can use and not worry about. Right now, Win7 does not appear to be a platform that you don't have to worry about.

I love how there are all these articles stating the poor sales of the Win7 phone. Then there is an issue that only affects this tiny user base that actually bought Win7 phones so far, and then even only if those users actually have a yahoo email account. And then only if they actually hooked up the yahoo email account on the phone. So that affected like 10 people probably... but hey, any chance to bash something that isn't Apple or Google right???

Yeah, because a systematic problem causing undesired data usage on what are typically metered bandwidth plans and a totally botched public announcement are excusable by virtue of limp-wristed consumer uptake. It's not like Ars has ever praised Windows Phone 7 anyway!

This IMHO is more of a Yahoo problem. Microsoft might not be requesting the same information that say iOS does, but it is requesting 3 items and getting back 25 items... That is Yahoo's fault. Now Microsoft should probably just setup their imap settings to be like other devices, but what MS is doing is within the spec and Yahoo is outside the spec. I certainly don't write SQL queries and expect additional data to be returned.

There's a little handy thing in software development to prevent things like this from happening. It's called "testing". Microsoft should incorporate it into their development - it REALLY helps a lot - no foolin'

Ugh. Speaking as someone who has written an IMAP email client, using the message-id is a braindead thing to do. That's what the IMAP UID is there for. And you don't actually have to worry that much about them changing; IMAP provides a UIDVALIDITY that MUST change if the UIDs in the folder change, and the spec explicitly states that the UIDs SHOULD NOT change between sessions. So your chances of actually having the UIDs change between sessions should be very, very low. Like in the 'I've only seen it happen when somebody changes IMAP servers' range of low.

Message-id, meanwhile, is explicitly _not_ unique per copy of an email; if somebody replies to a mailing list and to you directly, you'll probably get two copies of the same message in your inbox. But those are most likely going to share the same message-id, despite the fact that they will exist as two separate messages, and, say, reading one will not mark the other as read. If you think the message-id will represent a unique message, you'll have some serious problems.

As for Geminiman's point about moving messages, optimizing for that case as opposed to the reading/downloading of messages is probably not a good idea; 99.9% of people will read messages more often than they move them. Plus you can just special-case that; when you move a message between folders, check the message-id then, then check the UIDNEXT for the target folder, and do the move. Then just look at the headers on the new message with that UID in the folder; if it matches, then you just use your already-downloaded version of the message. Problem solved, again for 99% of the cases.

Hmm, also, now that I glance back at the IMAP spec, I don't know that the spec says that you have to support just returning a single header. Requesting the envelope is required, but that will return all of the header fields, which sounds like what Yahoo is doing...

Ugh. Speaking as someone who has written an IMAP email client, using the message-id is a braindead thing to do. That's what the IMAP UID is there for. And you don't actually have to worry that much about them changing; IMAP provides a UIDVALIDITY that MUST change if the UIDs in the folder change, and the spec explicitly states that the UIDs SHOULD NOT change between sessions. So your chances of actually having the UIDs change between sessions should be very, very low. Like in the 'I've only seen it happen when somebody changes IMAP servers' range of low.

SHOULD NOT is not MUST NOT, though, in RFC speak. RFC 2683 concedes that servers may have no good place to store a UID, and as such that UIDs may change each time a mailbox is selected. Hence the whole UIDVALIDITY mechanism. You do have to worry about it, even if it's not common.

Quote:

Message-id, meanwhile, is explicitly _not_ unique per copy of an email; if somebody replies to a mailing list and to you directly, you'll probably get two copies of the same message in your inbox. But those are most likely going to share the same message-id, despite the fact that they will exist as two separate messages, and, say, reading one will not mark the other as read. If you think the message-id will represent a unique message, you'll have some serious problems.

For the purposes of peeking messages (which doesn't reset their /Read flag), that doesn't matter--there's no value in downloading multiple copies of an identical e-mail.

Quote:

Hmm, also, now that I glance back at the IMAP spec, I don't know that the spec says that you have to support just returning a single header. Requesting the envelope is required, but that will return all of the header fields, which sounds like what Yahoo is doing...

The subset returned by HEADER.FIELDS contains only those header fields with a field-name that matches one of the names in the list; similarly, the subset returned by HEADER.FIELDS.NOT contains only the header fields with a non-matching field-name. The field-matching is case-insensitive but otherwise exact. Subsetting does not exclude the [RFC-2822] delimiting blank line between the header and the body; the blank line is included in all header fetches, except in the case of a message which has no body and no blank line.

Windows Phone 7 is asking for a subset; it should be given back a subset.

Message-id, meanwhile, is explicitly _not_ unique per copy of an email; if somebody replies to a mailing list and to you directly, you'll probably get two copies of the same message in your inbox. But those are most likely going to share the same message-id, despite the fact that they will exist as two separate messages, and, say, reading one will not mark the other as read. If you think the message-id will represent a unique message, you'll have some serious problems.

For the purposes of peeking messages (which doesn't reset their /Read flag), that doesn't matter--there's no value in downloading multiple copies of an identical e-mail.

There most certainly is. If you download just one, and assume all the others with the same Message-Id: are identical, then you're opening yourself up to a denial of service. For example, Alice sends Bob and Charlie a message with Message-Id: XYZ. Bob doesn't want Charlie to see the message, so he sends Charlie another message with the same Message-Id:. Now Charlie's MUA doesn't know which version to show him. If Bob knows enough about the MUA, he might even be able to craft his message so as to make it certain that his message will be the one shown.

SHOULD NOT is not MUST NOT, though, in RFC speak. RFC 2683 concedes that servers may have no good place to store a UID, and as such that UIDs may change each time a mailbox is selected. Hence the whole UIDVALIDITY mechanism. You do have to worry about it, even if it's not common.

Yeah, but all messages also SHOULD, not MUST, have a Message-id. In my experience, you have a lot more missing/duplicate Message-Ids than you do UIDVALIDITY failures.

Quote:

For the purposes of peeking messages (which doesn't reset their /Read flag), that doesn't matter--there's no value in downloading multiple copies of an identical e-mail.

But if you do a delete, it matters. Also, the rules on changing Message-Ids are not completely clear; like I said, if you get two copies of an email, one directly to you and one via a cc: to a mailing list, the Message-Id probably won't be changed by the mailing list. But the list might do something like strip attachments, which would still be present on the directly-sent message.

Quote:

The rules on subsetting look quite clear to me...Windows Phone 7 is asking for a subset; it should be given back a subset.

Totally agree--I said that I don't know because I had just glanced and didn't have time to look it up properly. Oops.

...Another thing that I notice is that it looks like the Windows client is requesting UID along with the Message-Id. Which implies that they really are doing the right thing, and just using the Message-ID as a backup. That makes a lot of sense--you use the UID primarily, and then if the UIDVALIDITY changes, you have the Message-Id to use as a second-best solution. In which case Microsoft would be doing the right thing, plus making a low-cost optimization in case of failure, and it's just Yahoo's fault for making it high-cost.

Yes, but I think this is just for figuring out if there's new mail to download--I'm sure the "interactive" traffic (when you're actually using the mail client) is more thorough. This is just the polling, and to me "no new Message-IDs" seems a decent indicator of "no new mail"--better than new UIDs, since IMAP reserves the right to completely change them with each poll.

Quote:

...Another thing that I notice is that it looks like the Windows client is requesting UID along with the Message-Id. Which implies that they really are doing the right thing, and just using the Message-ID as a backup. That makes a lot of sense--you use the UID primarily, and then if the UIDVALIDITY changes, you have the Message-Id to use as a second-best solution. In which case Microsoft would be doing the right thing, plus making a low-cost optimization in case of failure, and it's just Yahoo's fault for making it high-cost.

That makes sense. In any case, I think it's pretty clear that Yahoo! shouldn't be sending back so much data.

However, it is MIcrosofts fault anyway: they shoul have been evaluating the amount of data transferred during their testing and noticed the problem.

I suppose you think Microsoft is supposed to test against all mail servers in the world just to make sure that one of them isn't functioning against RFC standards in a way that would cause higher data usage? Yea, makes sense. Shame on Microsoft for not doing what no other developer out there would do.

Anyone that says this is Microsofts fault is clearly, and I mean CLEARLY, just a Microsoft hater or has comprehension problems. Microsoft is not doing anything here that is not standards complaint yet Yahoo is. So say again why it's Microsoft's fault?

> Microsoft's end-user communication about Windows Phone 7 has been poor in general. Though it's no secret that an update is due within the first quarter, the company has refused to specify either when, exactly, the update would arrive (February 7 is the current rumor), or what, specifically, it would contain.

So by that standard Apple has the worst end-user communication in the world, right?