Posted
by
Soulskill
on Wednesday May 27, 2015 @07:55AM
from the easy-way-to-get-your-kids-to-put-the-phone-down dept.

DavidGilbert99 writes with news that a bug in iOS has made it so anyone can crash an iPhone by simply sending it a text message containing certain characters. "When the text message is displayed by a banner alert or notification on the lockscreen, the system attempts to abbreviate the text with an ellipsis. If the ellipsis is placed in the middle of a set of non-Latin script characters, including Arabic, Marathi and Chinese, it causes the system to crash and the phone to reboot." The text string is specific enough that it's unlikely to happen by accident, and users can disable text notification banners to protect themselves from being affected. However, if a user receives the crash-inducing text, they won't be able to access the Messages app without causing another crash. A similar bug crashed applications in OS X a few years ago.

Years ago, I had a number of Nokia flip phones. I also converted emails to text messages and sent them to the phone (actually, probably MMS, not SMS), so that I could read my emails on a dumb phone.

However, every now and again, I would receive a "text of death". The phone would receive a text message, crash, reboot, attempt to download text messages again, crash.... etc.. It continued to do this until the network would decide to give up attempting to send that MMS message.

However, every now and again, I would receive a "text of death". The phone would receive a text message, crash, reboot, attempt to download text messages again, crash.... etc.. It continued to do this until the network would decide to give up attempting to send that MMS message.

I've got a better story than that. Back in the mid-80's, when I was working at IBM, we did almost all of our programming in REXX and APL2 using dumb terminals.

One of the features of the system was the ability to send a message to another user that would appear directly on his or her screen like a text message.

By accident, one of the guys in my group discovered that by sending a certain string of characters to another user, he could force the receipt's terminal to automatically log off. Predictably, this led to a campaign of various people sending the "message of death" to each other, hearing the recipient yell and curse, and then quickly closing any open file before the victim fired back with a message of his own. This went on for about two weeks before we all got tired of it.

And of course I could also talk about the REXX worm that shut down the entire IBM internal email system for more than a day, but that is another story.:-)

I came here to comment on this. There are services and botnets that can send millions of simultaneous texts. I once read a blackhat idea of rebooting millions of phones over and over or all at once to see if it would crash the next layers of the networks. Maybe the towers couldn't handle it and would go down. And then...

Did Apple already fix this? I immediately tried to crash every phone of every coworker who has an iPhone within earshot of me and it didn't work. Much to my disappointment. I'm now having to save face by harassing them with Pictures of Steve Job's license plateless car parked in multiple handicapped spots.

What causing your co-workers phone to reboot is a sackable offence where you are? What if you accidentally kick the power cord out of their PC? Doesn't sound like a fun place to be. I did it here but it didn't work, people here were genuinely interested is this was for real or not.

Were you using SMS or iMessage? This is probably fixable in iMessage -- they get routed through Apple's servers and could probably be sanitized so that the offending characters can be byte-aligned to avoid the crash -- but SMSes go directly through the carriers and would require an OS update.

Well if your parser can't handle something, it should sanitize input before parsing it. Eg if you use special characters internally to do something, make sure your user input doesn't have those characters in that order unless you want the user to be doing that thing.

Even if you just add something, eg "\ " if spaces do special things, and a user input "\" can be stored internally as "\\".

It's not a special character that needs escaping. It's a character that needs multiple bytes to specify the code point. The parser just isn't handling the fact that you can't just crop a character mid code point - it's operating at the byte level when it should be operating at the code point level during a crop operation.

It's not a special character that needs escaping. It's a character that needs multiple bytes to specify the code point. The parser just isn't handling the fact that you can't just crop a character mid code point - it's operating at the byte level when it should be operating at the code point level during a crop operation.

Too bad I don't have mod points because you are absolutely correct! As more and more code points are defined, the number of bytes needed to represent characters increases. Their abbreviation mechanism should at least recognize surrogate pairs and combining characters.

For some reason UTF-8 turns otherwise intelligent programmers into complete morons. Here is another example from Apple. Let me state some rules about how to deal with UTF-8:

1. Stop thinking about "characters"!!!! This is a byte stream. The ONLY reason to think about a "character" is because you are DRAWING it on a display designed for a human to read, and humans do think about "characters". All other software either doe

Where's the illegal UTF-8 sequence in the message? Is the actual octet sequence in the message different from what's in this Slashdot posting [slashdot.org] (once converted to a sequence of octets), which contains no invalid UTF-8 sequences (yes, I went through them all by hand)?

And to sanitise the input what process would you need to perform on the input? is it called parsing? and would you need to sanitise the sanitisers parser...

Yes, you could do it with a simpler parser eg delete all non-latin characters from user input because the people who designed our parser were noobs. Or go on a case-by-case basis, this character is used internally for such and such, if user input has this character then put an escape character in front of it or whatever.

For example, a fun gag on a new Linux user is to create a file called " -rf" and ask them to delete it via a command line. If they naively type "rm -rf" then it gets parsed as an option for

Problem with this case is that people who don't speak languages that use latin characters are going to be pissed that they can't get texts in their language.

The real bigger problem is that Unicode is hard, and it's probably some weird unchecked buffer overflow or something stupid. What's strange is that when another text is sent, the problem case solves itself and the messaging app becomes usable again.

What's strange is that when another text is sent, the problem case solves itself and the messaging app becomes usable again.

It's not that weird. The IOS messaging app shows messages organized by sender with the most recent message summarized under each sender's name. If a message is being cropped incorrectly and causing the crash, then wh3n another message comes in from that sender, the new message will replace the problematic one in the summary.

And what exactly do you mean by 'sanitized its input' in this case? Since we're talking about truncating a string of non-latin character sets with variable-length values for the codepoints the problem is with the parser not correctly handling this and assuming fixed length codepoint values. Changing the input is most definitely the wrong way to handle a bug like this.

There's no law that says they can't pad the variable length input to fixed length

I'm not sure you quite understand the problem, it's not the input length, it is the encoding of each of the characters. So are you suggesting turning all single-byte encoded characters into multi-byte encoding of some arbitrary maximum length? If you can already identify the problem at this level then you would just do that in the parser that is truncating the string.

Just because it's unlikely with a real text string doesn't mean that any of the text is invalid for a message. The text string should still not need to be changed. The bug only affects notifications, and it's clear that the text can be displayed just fine in conversation view.

This is almost certainly due to splitting multibyte characters on sub-character boundaries. That's a design problem, not a sanitizing problem.

Just because it's unlikely with a real text string doesn't mean that any of the text is invalid for a message. The text string should still not need to be changed. The bug only affects notifications, and it's clear that the text can be displayed just fine in conversation view.

This is almost certainly due to splitting multibyte characters on sub-character boundaries.

Or mishandling combining characters; the screenshot geminidomino provided shows several combining characters, as indicated by the dotted-line circles in some of the glyphs (and I suspect some of the marks above the Arabic characters come from combining characters as well).

Because writing in Chinese or Arabic is something to be sanitized? According to the summary (no, I didn't RTFA - this is/. after all), having a long enough message so that the ellipsis happens between those ISO-extended characters causes the problem. Sucks to be a user who writes/communicates in other languages!

Yes, it is. Any input that will crash your library needs to be sanitized. You need to truncate the message on display, at the bad character. That doesn't mean you change the source message or render whatever folks use it to communicate unable to do that- you as the fucking programmer SANITIZE YOUR INPUT, because otherwise, you fuck the user.

If you need to truncate after X characters, you don't just truncate after X*8 bits. Sure, that works if you're using an 8-bit encoding, but we're talking about multi-language script, variable-length encodings like UTF-8 here. You truncate after X code points when dealing with those, it's not a fixed number of bytes, and sanitizing your input (which I'm sure they're already doing) does not protect you against cutting a multi-byte character in half if you're counting bytes for truncation.

No you don't. You are demonstrating the typical moronic attempts to deal with UTF-8.

Here is how you do it:

Go X bytes into the string. If that byte is a continuation byte, back up. Back up a maximum of 3 times. This will find a truncation point that will not introduce more errors into the string than are already there.

BUT BUT BUT I'm sure you are sputtering about how this won't give you exactly X "characters". NOBODY F**KING CARES!!!! If you want the string to "fit" you should be *measuring* it, not saying s

This only works for UTF-8, and theoretically fails with the older type of UTF-8 (when you could have up to 6 bytes, by spec).

So you probably will have to go through it character by character, not byte by byte, exactly as Brons said. If you go N bytes into the string, and the string was just a ton of kanji, then you might truncate a VERY short message indeed- if you go looking 40 bytes in, you could be looking at a 10 character string or something for no reason, when your display would happily fit 40.

You guys that are talking about sanitizing the input don't understand the bug. There is absolutely nothing wrong with the input. This is not injection of escape codes. It's valid text in a different language, and changing at at input would constitute a second defect. The problem is connected to the eliding algorithm.

And in fact the argument where in the code the problem lies is not that helpful anyway. The bigger problem is that there wasn't a test case for it.

The string doesn't just need funny characters in it, it needs them at a precise position (and apparently not just any character will break, so it needs to have a particular expansion in whatever they used to encode their unicode). A test case would have solved it, but it doesn't sound like a reasonable test case to expect.

And yes, if you call a library that does some buggy truncation, you need to guard it on input.

Yes, it is. Any input that will crash your library needs to be sanitized. You need to truncate the message on display, at the bad character.

Where has it ever been stated that the message, as sent to the phone, contains a bad character? Everything I've read indicates that the problem is that the code that's displaying the message is inserting an ellipsis in the middle of a perfectly valid character, making the resulting string invalid.

Yes, technically there is a way to execute phone specific code with specially crafted text messages. This is not doing that.
It's not executing a program. The system is trying to abbreviate the contents of the message to display in a notification banner or on the lock screen through a widget (or whatever apple calls them).
The system is doing something it's designed to do, but due to lack of foresight or just shoddy development, they never bothered testing this with special characters. And some clown obviously found the bug. This is actually pretty big. So in the past few months I've learned about 3 important issues with IOS devices, even those running the current release:
1)They are still including a chinese root cert that has been delisted for handing out forged google certs, and who knows what else.
2)A specially crafted access point being in range of your IOS device can cause it to become unstable and eventually crash, even if you have not connected to that network
3)A specially crafted text message can crash your phone upon receiving it.
Lets be clear, I'm not saying Android doesn't have some major issues as well, so don't try to fanboy me. But this is not what I expect from Apple. This is just bad. Lack of sanity testing? Keeping their users at risk seemingly just to say FU to google?

Yes, technically there is a way to execute phone specific code with specially crafted text messages. This is not doing that. It's not executing a program.

Generally, if a carefully-crafted input can cause your application to crash, a similarly-crafted data may be able to exploit the same bug and cause an execution of malicious code. If — as is usually the case — the crash is due to buffer overflow and I can stomp over your app's memory, I may be able to place my code in the right place and it wil

Generally, if a carefully-crafted input can cause your application to crash, a similarly-crafted data may be able to exploit the same bug and cause an execution of malicious code. If â" as is usually the case â" the crash is due to buffer overflow and I can stomp over your app's memory, I may be able to place my code in the right place and it will be executed as part of the app...

This is only true for certain classes of memory management defects. There are many different kinds of defects, and man

Correct. As a random example, a NULL pointer read - certainly the most common class of memory error I've seen, probably the most common by far in general - is almost never exploitable (for arbitrary code execution). You can use it to crash programs (denial of service) but usually not for anything else.

I'd be willing to bet that the unicode library they were using was UTF-16 . Either that or they were handling unicode in a straight binary string with something homebrewed. Both are horribly dangerous - the latter for obvious reasons, but the former in particular because it makes it easy to code something that "just works" for 99,99% of cases, but those rare 0,01% side cases involving 32-bit unicode characters slip through testing and come back and bite you down the road. It's amazing how many apps have incorrect behavior with 4-byte unicode characters, on a wide range of platforms.

Both should be considered bad practice and programming languages evolved to standardize on UTF-8 for any string format that is to handle unicode. C++ for example needs to introduce something along the lines of "std::ustring" that makes unicode string ops "just work" with a UTF-8 backend, at the cost of some memory and performance vs. std::string, which should be seen as exclusively for ascii and binary string operations. std::wstring should be obsoleted.

It's not that NSString itself is broken, it's that the fact that 99.99% of the time an NSString is one 16-bit code unit per glyph that apps using it rarely test the use case where it's two code units per glyph. So a person goes in and writes an app that inserts a new character at a particular byte offset and it works 99.99% of the time, but if it happens to get stuck in the middle of a multi-code-unit glyph, the program breaks.

The documentation is no help. First off, it lies:

Conceptually, a CFString object represents an array of Unicode characters (UniChar) along with a count of the number of characters. The [Unicode] standard defines a universal, uniform encoding scheme that is 16 bits per character.

As we all should know, that's simply not true. Unfortunately, a lot of people don't know better. Unicode is not a universal, uniform encoding scheme that is 16 bits per character. Even UTF-16 isn't that.

A string object presents itself as an array of Unicode characters . You can determine how many characters a string object contains with the length method and can retrieve a specific character with the characterAtIndex: method. These two “primitive” methods provide basic access to a string object.

characterAtIndex returns a 16-bit integer. So obviously it has no way to actually represent wider unicode characters. The length method is not the number of characters on the screen, but the number of code units, which is different, but highly misleading to programmers. They're, again, the same thing 99.99% of the time, but those rare cases where they're not generally slip through testing. And this is why UTF-16 is such a hazardous encoding to use.

Yes, NSString is old. And that's part of the problem. It was made at a time where many thought that unicode was only going to be 16 bits. It hasn't aged well. And it's caused a lot of bugs over its time. And now I'd bet that it or something similar has created a brand new iPhone-equivalent of Winnuke.

Programmers really need two types of strings, and only two, for the lion's share of tasks. One, binary strings, where a char is always 8 bytes and operations can be optimized to heck and back. And two, unicode strings, where a char always represents a whole unicode character that you would display, and the count of characters represents the count of display characters and so forth. None of this "99.99% of the time it's one thing, but every so often it's another...". That's asking for bugs.

They do not help at all in doing text manipulation, because Unicode code points are *not* "characters" or any other unit that users think about. This is due to combining characters and invisible characters such as bidi indicators. There is a prefix code unit that eats the next 2 letters and turns it into a country flag! It is a huge mess.

Far more important is they all lack the ability to store errors that are in a UTF-8 string in a lossless way

Pretty much all Unicode handling in framework libraries is UTF-16, and has been for quite a long time. Windows "wide" strings, Java String,.NET String, Qt QString, ICU UnicodeString, etc. Of course some libraries may support choosing between the two, and many newer libraries do opt for UTF-8. Serialized formats are also more likely to be UTF-8. However, UTF-16 is still far more common as the "in memory" representation.

Since multi-element characters are far less common in UTF-16 than in UTF-8, I can see h

I think you hit the nail on the head when you observed "they never bothered testing."

As long as software vendors have zero liability for defects, we'll probably continue to see easy-to-catch and easy-to-exploit bugs in software. Even software out of large, mature dev groups that should really know better.

This isn't as difficult to find as you might think. You do not have to test millions or billions of random text strings.

Software security testing works by breaking inputs into categories, and assuming that if you test one or two items in the category, then the category is covered. Categories are derived from the software specifications.

To clarify: the iPhone has been out for (almost) 8 years, and only now the offending string was found.

No testing provides 100% coverage, especially for the number of combinations of possible Unicode characters in a 160-character/byte message. Only a complete moron would blame this bug on lack of testing.

No testing provides 100% coverage, especially for the number of combinations of possible Unicode characters in a 160-character/byte message. Only a complete moron would blame this bug on lack of testing.

Let's not forget that Unicode is a standard that's constantly evolving - new glyphs are constantly added (there's already a new proposed set for Unicode 9 including glyphs for "selfie", "avocado" and others)

People keep arguing that/. doesn't support Unicode, when it really does - it just uses a narrow white

People keep arguing that/. doesn't support Unicode, when it really does - it just uses a narrow whitelist of characters. The reason for this is obvious if you think about it - to prevent situations like this from happening.

Heck, there might be strings out there that will crash any Unicode library implementation, just we haven't found them yet because the search space is huge.

Hmmm... That tempts me to try a test using a couple of file names on this machine that are two of the names for a Mandarin-English dictionary:.html and Ptnghuà.html (and also Pu3Tong1Hua4.html for systems that can only accept ASCII;-). Those names aren't in any sense obscure or tricky; they're strings you'd expect to see in online discussions of text handling in various languages. If you can't handle at least these trivial Chinese strings, you've failed pretty badly. Of course, they look findin

People keep arguing that/. doesn't support Unicode, when it really does - it just uses a narrow whitelist of characters. The reason for this is obvious if you think about it - to prevent situations like this from happening.

Heck, there might be strings out there that will crash any Unicode library implementation, just we haven't found them yet because the search space is huge.

Hmmm... That tempts me to try a test using a couple of file names on this machine that are two of the names for a Mandarin-English dictionary:.html and Ptnghuà.html (and also Pu3Tong1Hua4.html for systems that can only accept ASCII;-). Those names aren't in any sense obscure or tricky; they're strings you'd expect to see in online discussions of text handling in various languages. If you can't handle at least these trivial Chinese strings, you've failed pretty badly. Of course, they look findin this Comment: panel, and will likely survive the Preview button.

Let's see how/. handles them...

Nope; the 3 Hanzi characters didn't show at all, and only the à showed correctly in the second name. But both everything looks correct in this second editing widget. This proves that/. hasn't damaged the actual text in the Preview. Let's see what happens when I try to post it...

I see that the "Comment:" edit widget for this message does have the Hanzi and marked 'u' and 'o' characters missing. So the damage is done after you hit the Submit button. There's no excuse for this. None of those characters have any special meaning to the code, and text containing them can't do any damage to anything. If damage happens, it's the fault of the crappy software handling the text, not the fault of the creator of the text. The right thing to do is to correct the crappy software. Damaging

I'm not denying it, I'm asking for a link to an article about it, as I legitimately don't recall having heard or read about it at the time. How about being helpful, instead of being a douche, for once?

Oh! It was this [slashdot.org]? Huh, interesting. FTFA: "Users would then see notifications about the finished downloads and would click on them, prompting the malicious application to install if their devices had the "unknown sources" setting enabled"

Yes, iOS is protected from this sort of attack by simply not allowing the user to install from unknown sources, a setting which is disabled by default on Android, incidentally. In other words, one specifically has to make themselves vulnerable to this attack in order to be

Never. Ever. Do. That. Again.Or I will mark you as a troll if I have mod points. And frankly, I hope somebody does that to this one.

If you're going to post an informative link from Wikipedia, go straight to Wikipedia; not that wikiwand crap. Using that link to a site pushing a formatting extension that changes the way wikipedia's UI format looks is trolling for users to hijack with a MitM attack. This is fucking/. The general population knows better than to install random extensions from unverified sources. Go pedal your crapware on reddit.

No, it's 1985 again. Or even earlier. 1985 was when I found out an escape sequence that would reboot the HP100 portable computer my boss used to access the message system on the HP 3000 minicomputer. Cue me sending an email with it in the subject. The reboot took so long the messaging system logged you off and handily when you log in it prints the subjects of your unread emails and around you go again.

Yeah, that was the point I was trying to get at. Most people take the privacy of their most intimate secrets for granted - they keep it in their email, on their mobile devices etc. And while these things are pretty well guarded, from a technological standpoint a single bug can lead to the mass subversion of a whole ecosystem. It seems to me that the day all of Gmail - or some other major email provider, or private data on everyone's iPhones, etc. is hacked and made public, will be an historic event "The da

Yeah, that was the point I was trying to get at. Most people take the privacy of their most intimate secrets for granted -

Or they simply don't care as much as you do. Seriously, you think anyone really wants to see photos of your bum? Who is the market for all this tonnes and tonnes of useless information?
If all the world's secrets were published tomorrow I think the reaction would be "meh, what's for dinner?" It's actually worse if it only happens to one person, if happens to everyone then that becomes the new norm and no-one cares.