A string, as in other languages, is simply a sequence of characters. Like most entities in Ruby, strings are first-class objects. In everyday programming, we need to manipulate strings in many ways. Ruby makes most of these tasks easy, as does Hal Fulton in this sample chapter.

This chapter is from the book

Atoms were once thought to be fundamental, elementary building blocks of nature; protons were then thought to be fundamental, then quarks. Now we say the string is fundamental.

—David Gross, professor of theoretical physics, Princeton University

A computer science professor in the early 1980s started out his data structures class with a single question. He didn't introduce himself or state the name of the course; he didn't hand out a syllabus or give the name of the textbook. He walked to the front of the class and asked, "What is the most important data type?"

There were one or two guesses. Someone guessed "pointers," and he brightened but said no, that wasn't it. Then he offered his opinion: The most important data type was character data.

He had a valid point. Computers are supposed to be our servants, not our masters, and character data has the distinction of being human readable. (Some humans can read binary data easily, but we will ignore them.) The existence of characters (and thus strings) enables communication between humans and computers. Every kind of information we can imagine, including natural language text, can be encoded in character strings.

A string, as in other languages, is simply a sequence of characters. Like most entities in Ruby, strings are first-class objects. In everyday programming, we need to manipulate strings in many ways. We want to concatenate strings, tokenize them, analyze them, perform searches and substitutions, and more. Ruby makes most of these tasks easy.

Most of this chapter assumes that a byte is a character. When we get into an intermationalized environment, this is not really true. For issues involved with internationalization, refer to Chapter 4, "Internationalization in Ruby."

2.1 Representing Ordinary Strings

A string in Ruby is simply a sequence of 8-bit bytes. It is not null-terminated as in C, so it can contain null characters. It may contain bytes above 0xFF, but such strings are meaningful only if some certain character set (encoding) is assumed. (For more information on encodings, refer to Chapter 4.

The simplest string in Ruby is single-quoted. Such a string is taken absolutely literally; the only escape sequences recognized are the single quote (\') and the escaped backslash itself (\\):

A double-quoted string is more versatile. It allows many more escape sequences, such as backspace, tab, carriage return, and linefeed. It also allows control characters to be embedded as octal numbers: