Semi-structured data is a form of structured data that does not conform with the formal structure of tables and data models associated with relational databases but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data...

. An S-expression may be a nested list of smaller S-expressions. S-expressions are probably best known for their use in the Lisp

Lisp programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...

family of programming languages. They are typically represented in text by parenthesized, whitespace-separated sequences of character strings, as in (= 4 (+ 2 2)), which represents the Boolean

Boolean logic

Boolean algebra is a logical calculus of truth values, developed by George Boole in the 1840s. It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation −x replaced by the respective logical operations of...

C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

and related programming languages.

In Lisp, the first element of every S-expression is an operator and any remaining elements are treated as data. This is called "prefix notation" or "Cambridge Polish notation

Polish notation

Polish notation, also known as prefix notation, is a form of notation for logic, arithmetic, and algebra. Its distinguishing feature is that it places operators to the left of their operands. If the arity of the operators is fixed, the result is a syntax lacking parentheses or other brackets that...

". Due to the close association between S-expressions and written representations of data, the term "S-expression" in casual conversation often refers to the written representation of an S-expression.

Other uses of S-expressions are in Lisp-derived languages such as DSSSL

Document Style Semantics and Specification Language

Document Style Semantics and Specification Language is a computer language for specifying stylesheets for SGML documents, based on a subset of the Scheme programming language. It is specified by the standard ISO/IEC 10179:1996...

John McCarthy was an American computer scientist and cognitive scientist. He coined the term "artificial intelligence" , invented the Lisp programming language and was highly influential in the early development of AI.McCarthy also influenced other areas of computing such as time sharing systems...

The Common Business Communication Language is a communications language proposed by John McCarthy that foreshadowed much of XML. The language consists of a basic framework of hierarchical markup derived from S-expressions, coupled with some general principles about use and extensibility...

In computer programming, M-expressions were intended to be the expressions used to write functions in the Lisp programming language. Data to be manipulated using M-expressions was to be written using S-expressions...

s, but the first implementation of Lisp was an interpreter of S-expression encodings of M-expressions, and Lisp programmers soon became accustomed to using S-expressions for both code and data.
This means that Lisp is homoiconic, that is, the primary representation of programs is also a data structure in a primitive type of the language itself.
Definition
S-expressions are defined recursively

Recursion

Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

as either single data objects called "atoms" or lists of S-expressions. Numbers, arrays, character strings, and symbols

Symbol (Lisp)

A symbol in computer programming is a primitive datatype whose instances have a unique human-readable form. Symbols can be used as identifiers. In some programming languages, they are called atoms....

are all atoms.

An S-expression may be a list of other S-expressions, which may also be lists, so S-expressions may be arbitrarily nested to any depth. In Lisp, S-expression lists are constructed from small data structures called cons pairs, written as (x . y). The first element of the cons (x) is the first element of the list, and the second element (y) is the remainder of the list. Longer lists are made up of nested cons pairs, for example(1 . (2 . (3 . nil))). The special symbol nil marks the end of the list. Cons-based lists are usually written more compactly as (1 2 3). Nested lists can also be written as S-expressions: ((milk juice) (honey marmalade)) is a two-element S-expression whose elements are also two-element S-expressions. The whitespace-separated notation used in Lisp (and this article) is typical. Line breaks (newline characters) usually qualify as separators.

A non-atomic S-expression in which the second element is an atom other than nil is a "dotted pair" (e.g., (a . b)); a "dotted list" (e.g., (a b c . d)) is defined recursively as an s-expression in which the second element is either a dotted pair or another dotted list, while a "list" (e.g., (a b c), which is equivalent to (a b c . nil)) is defined as either the reserved atom nil, or a non-atomic s-expression in which the second element is a list.

Example

In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...

:
(defun factorial (x)
(if (zerop x)
1
(* x (factorial (- x 1)))))

S-expressions can be read in Lisp using the function READ. READ reads the textual representation of an s-expression and returns Lisp data. The function PRINT can be used to output an s-expression. The output then can be read with the function READ, when all printed data objects have a readable representation. Lisp has readable representations for numbers, strings, symbols, lists and many other data types. Program code can be formatted as pretty printed S-expressions using the function PPRINT.

Lisp programs are valid s-expressions, but not all s-expressions are valid Lisp programs. (1.0 + 3.1) is a valid s-expression, but not a valid Lisp program, since Lisp uses prefix notation and a floating point number (here 1.0) is not valid as an operation (the first element of the expression).

An S-expression preceded by a single quotation mark, as in 'x, is syntactic sugar

Syntactic sugar

Syntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....

for a quoted S-expression, in this case (quote x).
Standardization
Standards for some Lisp-derived programming languages include a specification for their S-expression syntax. These include Common Lisp

Common Lisp

Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...

ISLISP is a programming language in the LISP family standardized by ISO working group ISO/IEC JTC 1/SC 22/WG 16 . The primary output of this working group was an International Standard, ISO/IEC 13816:1997, published by ISO. The standard was updated in 2007 and republished as ISO/IEC 13816:2007...

Ronald Linn Rivest is a cryptographer. He is the Andrew and Erna Viterbi Professor of Computer Science at MIT's Department of Electrical Engineering and Computer Science and a member of MIT's Computer Science and Artificial Intelligence Laboratory...

submitted an Internet-Draft to be considered for publication as an RFC

Request for Comments

In computer network engineering, a Request for Comments is a memorandum published by the Internet Engineering Task Force describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems.Through the Internet Society, engineers and...

. The draft defined a syntax based on Lisp S-expressions but intended for general-purpose data storage and exchange (similar to XML

XML

Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

) rather than specifically for programming. It was never approved as an RFC, but it has since been cited and used by other RFCs (e.g. RFC 2693) and several other publications. It was originally intended for use in SPKI

Simple public key infrastructure

Simple public key infrastructure was born out of a joint effort to overcome the overcomplication and scalability problems of traditional X.509 public key infrastructure. It is specified in two Internet Engineering Task Force Request For Comments specifications—RFC 2692 and RFC 2693—from the IETF...

.

Rivest's format defines an S-expression as being either an octet-string (a series of byte

Byte

The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s) or a finite list of other S-expressions. It describes three interchange formats for expressing this structure. One is the "advanced transport", which is very flexible in terms of formatting, and is syntactically similar to Lisp-style expressions, but they are not identical. The advanced transport, for example, allows octet-strings to be represented verbatim (the string's length followed by a colon and the entire raw string), a quoted form allowing escape characters, hexadecimal

Hexadecimal

In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

, or placed directly as a "token" if it meets certain conditions. (Rivest's tokens differ from Lisp tokens in that the former are just for convenience and aesthetics, and treated exactly like other strings, while the latter have specific syntactical meaning.) Another interchange format, intended to be more compact, easier to parse, and unique for any abstract S-expression, is the "canonical representation" which only allows verbatim strings, and prohibits whitespace as formatting outside strings. Finally there is the "basic transport representation", which is either the canonical form or the same encoded as Base64 and surrounded by braces

Bracket

Brackets are tall punctuation marks used in matched pairs within text, to set apart or interject other text. In the United States, "bracket" usually refers specifically to the "square" or "box" type.-List of types:...

, the latter intended to safely transport a canonically-encoded S-expression in a system which might change spacing (e.g. an email system which has 80-character-wide lines and wraps anything longer than that).

M-expression

In computer programming, M-expressions were intended to be the expressions used to write functions in the Lisp programming language. Data to be manipulated using M-expressions was to be written using S-expressions...

Car and cdr

car and cdr are primitive operations on cons cells introduced in the Lisp programming language. A cons cell is composed of two pointers; the car operation extracts the first pointer, and the cdr operation extracts the second.Thus, the expression evaluates to x, and evaluates to...

Cons

In computer programming, cons is a fundamental function in most dialects of the Lisp programming language. cons constructs memory objects which hold two values or pointers to values. These objects are referred to as cells, conses, non-atomic s-expressions , or pairs...

Canonical S-expressions

A Canonical S-expression is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI - to retain the power of S-expressions, while achieving the compactness of a binary form and maximizing the speed of parsing.An S-expression is composed of atoms, which are byte...

Comparison of data serialization formats

This is a comparison of data serialization formats, different ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.-Overview:*a. The current default format is binary....

Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...