Summary
In this installment I pave the way to the heart of Lisp, i.e. to the famous *code is data* concept. In order to do that, I will have to introduce two fundamental data types first: symbols and lists.

Scheme and Lisp have a particular data type which is missing in most
languages (with the exception of Ruby): the symbol.

From the grammar point of view, a symbol is just a quoted identifier,
i.e. a sequence of characters corresponding to a valid identifier
preceded by a quote. For instance, 'a, 'b1 e 'c_ are
symbols. On the set of symbols there is an equality operator eq?
which is able to determine if two symbols are the same or not:

> (define sym 'a)
> (eq? sym 'b)
#f
> (eq? sym 'a)
#t

#f e #t are the Boolean values False and True respectively,
as you may have imagined. The equality operator is extremely
efficient on symbols, since the compiler associates to every
symbol an integer number (this operation is called hashing) and stores
it in an interal registry
(this operation is called interning): when the compiler checks the identity
of two symbols it actually checks the equality of two integer
numbers, which is an extremely fast operation.

You may get
the number associated to a symbol with the function symbol-hash:

It is always possible to convert a string into a symbol and viceversa
thanks to the functions string->symbol and symbol->string,
however conceptually - and also practically - symbols in Scheme
are completely different from strings.

The situation is not really different in Python. It is true that
symbols do not exist as a primitive data type, however strings
corresponding to names of Python objects are actually treated as
symbols. You can infer this from the documentation about the builtin
functions hash e intern, which says: normally, the names used in
Python programs are automatically interned, and the dictionaries used
to hold module, class or instance attributes have interned keys.
BTW, if you want to know exactly how string comparison works in Python
I suggest you to look at this post:

Scheme has much more valid identifiers than Python or C, where
the valid characters are restricted to a-zA-Z-0-9_ (I am ignoring
the possibility of having Unicode characters in identifiers, which is
possible both in R6RS Scheme and Python 3.0). By convention, symbols
ending by ? are associated to boolean values or to boolean-valued
functions, whereas symbols ending by ! are associated to functions
or macros with side effects.

The function eq?,
is polymorphic and works on any kind of object, but it may surprise you
sometimes:

> (eq? "pippo" "pippo")
#f

The reason is that eq? (corrisponding to is in Python) checks
if two objects are the same object at the pointer level, but it does
not check the content. Actually, Python works the same. It is only by
accident than "pippo"is"pippo" returns
True on my machine, since the CPython implementation
manages differently "short" strings from "long" strings:

>>> "a"*10 is "a"*10 # a short string
True
>>> "a"*100 is "a"*100 # a long string
False

If you want to check if two objects have the same content you should
use the function equal?, corresponding to == in Python:

> (equal? "pippo" "pippo")
#t

It you know the type of the objects you can use more efficient
equality operators; for instance for strings you can use string=?
and for integer numbers =:

> (string=? "pippo" "pippo")
#t
> (= 42 42)
#t

To be entirely accurate, in addition to eq and equal, Scheme
also provides a third equality operator eqv?. eqv? looks to me
like an useless complication of the language, so I will not discuss
it. If you are interested, you can read what the R6RS document says
about it.

The original meaning of LISP was List Processing, since lists were the
fundamental data type of the language. Nowadays Lisp implements
all possible data types, but still lists have a somewhat privileged
position, since lists can describe code.
A Lisp/Scheme list is a recursive data type such that:

the list is empty: '()

the list is the composition of an element and a list via the
cons operation (cons stands for constructor): (conselemlst).

For instance, one-element lists are obtained as composition of an
element with the empty list:

> (cons 'x1 '()); one-element list
(x1)

Two-elements lists are obtained by composing an element with a one-element
list:

but the expression (listx1...xN) is nothing else
than syntactic sugar for the fully explicit expression in terms of
constructors.

There are also the so-called improper lists, i.e. the ones where the
second argument of the cons is not a list. In that case the
representation of the list displayed at the REPL contains a dot:

> (cons 'x1 'x2) ; improper list
(x1 . x2)

It is important to remember that improper lists are not lists,
therefore operations like map, filter and similar do not work
on them.

As we anticipated in episode 4, the first element of a list (proper
or improper) can be extracted with the function car; the tail of
the list instead can be extracted with the function cdr.
If the list is proper, its cdr is proper:

> (cdr (cons 'x1 'x2))
x2
> (cdr (cons 'x1 (cons 'x2 '())))
(x2)

At low level Scheme lists are implemented as linked list, i.e. as
couples (pointer-to-sublist, value) until you arrive at the null
pointer.

Here case-lambda is a syntactic form that allows to define
functions with different behavior according to the number of arguments.
for-all is an R6RS higher order function: (for-allpredlst) applies the predicate pred to the elements of list lst,
until a false value is found - in that case it returns #f -
otherwise it returns #t. Here the assertion checks at
runtime that all the passed arguments n0, n and s are
numbers, with s non-zero.

The first let defines a comparison
function cmp which is >= if the step s is positive, or
<= if the step s is negative. The reason is clear: if s
is positive at some point the index i will get greater than
n, whereas if s is negative at some point i will get
smaller than n.

The trick used in the loop is extremely common: instead of modifying
a pre-existing list, at each iteration a new list is created ex-novo
by adding an element (consiacc); at the end the accumulator
is reversed (reverseacc). The same happens for the counter i
which is never modified.

This is an example of functional loop;
imperative loops based on mutation are considered bad style in Scheme
and other functional languages. Notice by contrast that in Common Lisp
imperative loops are perfectly acceptable and actually
there is a very advanced LOOP macro allowing you to do everything
with imperative loops.

Actually, the range function defined here is more powerful than
Python's range, since it also works with floating point numbers:

> (range 1.3 2.5 .25)
(1.3 1.55 1.8 2.05 2.3)

As always, the only way to really get Scheme lists is to use them.
I suggest you try the following exercises:

I will show the solutions in the next episode. If you are lazy and you
want an already written list library, I suggest you to give a look at the
SRFI-1 library, which is very rich and available practically in all
Scheme implementations. Many of the SRFI-1 features are built-in in
the R6RS standard, but many other are still available only in the SRFI-1 .

Whenever I've read articles about LISP, there's often a focus on the idea that LISP doesn't make much distinction between code and data. This is generally touted as a really awesome feature but that always leaves me a little puzzled.

Back when I was getting my degree my favorite CS professor hammered it into our heads that as far as the machine was concerned, there was no distinction between instructions and data. Instructions are just data that is interpreted by the machine. So that's the first thing that puzzles me: why is this treated as if it's something invented by LISP? It's a fundamental feature of the computer.

So given that this is not really special, why is it that most languages don't work this way? The answer, as far as I can tell, is that people don't want them to. It's a powerful feature, no doubt, but it's also a really dangerous one. The distinction between code and data is deliberate. On a side note, it's also a core concern of computer security specialists.

I recently had a colleague mention to me that he was working on a system I had helped implement. He mentioned that he wanted to get rid of a bunch of stuff that he didn't think was useful. I agreed that it was an option but that the stuff in question had been added because it addressed a major, inherent problem that had come up in the past. I can't get past the feeling that the 'code is data' aspect of LISP is a similar situation. Maybe someone can help me understand.

> Back when I was getting my degree my favorite CS professor> hammered it into our heads that as far as the machine was> concerned, there was no distinction between instructions> and data. Instructions are just data that is interpreted> by the machine. So that's the first thing that puzzles> me: why is this treated as if it's something invented by> LISP? It's a fundamental feature of the computer.

Right, but only in LISP code is naturally represented as nested lists and only in LISP you have powerful built in facilities to manage lists. Scheme adds pattern matching as the natural way of defining macros, but you will see all of that in the next episodes.

> So given that this is not really special, why is it that> most languages don't work this way? The answer, as far as> I can tell, is that people don't want them to. It's a> powerful feature, no doubt, but it's also a really> dangerous one. The distinction between code and data is> deliberate.

I agree. There may be various reasons for that, more or less valid (people do not stand parenthesis, you can actually do a lot even in languages without the code==data concept, debugging a program without macros is simpler etc.)

> On a side note, it's also a core concern of> computer security specialists.

I do not follow you. Are you referring to the security riskof evaluating untrusted code? But macros affect the user program, not code of unknown origin.

> I recently had a colleague mention to me that he was> working on a system I had helped implement. He mentioned> that he wanted to get rid of a bunch of stuff that he> didn't think was useful. I agreed that it was an option> but that the stuff in question had been added because it> addressed a major, inherent problem that had come up in> the past. I can't get past the feeling that the 'code is> data' aspect of LISP is a similar situation. Maybe> someone can help me understand.

Not clear what you mean. Anyway, if your language does not support the code==data concept, you have to reinvent it somehow. For instance, consider GUI builders, where you write an XML representation of our widgets (remember, XML is nothing else than a verbose implementation of s-expressions) and there is a framework which is able to transform your XML representation in your hosting languages classes. Here the framework is converting data (the XML) in code. In Lisp this functionality is built-in and you do not need a big framework and another language to learn.

This in theory: in practice it may be easier to use a GUI builder written by others in a mainstream language than to write your own in LISP, but this it another question, the well know old issue of libraries.

> > On a side note, it's also a core concern of> > computer security specialists.> > I do not follow you. Are you referring to the security> risk> of evaluating untrusted code? But macros affect the user> program, not code of unknown origin.

He's referring to the fact that the machine's unwillingness to distinguish between code and data is the enabling factor behind buffer overflow exploits. C and the underlying machine are quite willing to let you write past the end of an input buffer, overwrite the return address and jump into the middle of the stack and run the exploit payload.

If, on the other hand, the machine distinguished between code and data, it would complain when you overwrote the return address or at least when you jumped into the middle of what, in theory, is supposed to be a string.

> Not clear what you mean. Anyway, if your language does not> support the code==data concept, you have to reinvent it> somehow. For instance, consider GUI builders, where you> write an XML representation of our widgets (remember, XML> is nothing else than a verbose implementation of> s-expressions) and there is a framework which is able to> transform your XML representation in your hosting> languages classes. Here the framework is converting data> (the XML) in code. In Lisp this functionality is built-in> and you do not need a big framework and another language> to learn.

I could quibble over 'reinvent' but yes, sort of. But there is a distinction. These kinds of facilities are targeted. They only do a small set of things. They are not fully functional languages. They are restricted (ideally) to operations that are known to be safe. Of course there is a trade-off that extra effort must be put into doing these things as opposed to what is required in LISP.

The key point is that it seems to me that a lot of LISP people assume that this is just a oversight or error on the language designers part. There isn't any consideration that perhaps there was a specific reason for not supporting it. Just like my colleague never considered that the 'unnecessary' stuff that was in our design was there for a specific an important reason.

> > > On a side note, it's also a core concern of> > > computer security specialists.> > > > I do not follow you. Are you referring to the security> > risk> > of evaluating untrusted code? But macros affect the> user> > program, not code of unknown origin.> > He's referring to the fact that the machine's> unwillingness to distinguish between code and data is the> enabling factor behind buffer overflow exploits. C and the> underlying machine are quite willing to let you write past> the end of an input buffer, overwrite the return address> and jump into the middle of the stack and run the exploit> payload.

That's one manifestation of the issue at hand but it's a more general idea. All professional developers that work with SQL in code (should) know that you should almost always use prepared statements against the database. There's a performance benefit but more importantly, it's much safer.

I was in a discussion not that long ago about using XML in logs and someone pointed out that XML is basically a crappy reinvention of LISP (which I agree with if you are taking about XML used for writing code) and that writing logs in LISP was a really effective approach because you could execute the logs.

I have to admit something like that is really sexy and cool but it also strikes me as extremely dangerous. Maybe there's a reason that prevents bad things from happening, but it seems to me that this kind of thing has the same fundamental flaw that allows for SQL injection. You are executing code that was written based on the actions of someone (or something) that you don't control. Maybe an attacker knows enough about the system to force it to write certain log statements that compromise the system or perhaps there are corner cases that cause problematic code to be written to the logs. I don't think humans can effective traverse all the possibilities and when you have an open ended system, you are playing with fire.

AS I understand it, macros can write macros and I don't see anything that prevents them from acting like viruses. I may be making a big deal out of nothing. It's just something that I think about when I read about LISP. A more mundane concern is that these features would make code unmaintainable. Maybe not in a one person project but on team projects, I don't see how much use of these features could be sustained.

> AS I understand it, macros can write macros and I don't> see anything that prevents them from acting like viruses.

I see a bit of paranoia here. I do not think this is an issue.

> I may be making a big deal out of nothing. It's just> t something that I think about when I read about LISP. A> more mundane concern is that these features would make> code unmaintainable. Maybe not in a one person project> but on team projects, I don't see how much use of these> features could be sustained.

You have a point here. Actually I am not an advocate of macros exactly for that reason. Macros are very cool for a language implementor, less so for an application programmer.Notice that the goal of my *Adventures* is to spread cool ideas, but "cool" does not necessarily means "good".Anyway, episode #12 will be entirely devoted to discussing these issues. You have to wait a bit, because first I have to explain macros, and that will take episodes #9,#10 and #11 (I will explain just the easiest things). Episode #8will discuss the code==data equation.

> > AS I understand it, macros can write macros and I don't> > see anything that prevents them from acting like> viruses.> > I see a bit of paranoia here. I do not think this is an> issue.

But the more general issue of executing things like logs. Do you not see a risk? My understanding from previous discussions was that there was a known risk and that there are approaches used to mitigate them. If true, that tells me that the risk is real.

> But the more general issue of executing things like logs.> Do you not see a risk? My understanding from previous> s discussions was that there was a known risk and that> there are approaches used to mitigate them. If true, that> tells me that the risk is real.

Of course executing untrusted code is risky, but this is not an argument against macros. BTW executing logs is not such a common technique (I never heard of it); a typical technique is to execute configuration files, but this is fine since they should be edited by trusted users (programmers, sysadmins) and you can always restrict the environment your code is evaluated in. There is an example in the next episode.

> Of course executing untrusted code is risky, but this is> not an argument against macros. BTW executing logs is not> such a common technique (I never heard of it); a typical> technique is to execute configuration files, but this is> fine since they should be edited by trusted users> (programmers, sysadmins) and you can always restrict the> environment your code is evaluated in. There is an example> in the next episode.

I didn't realize this article was about macros. The word 'macro' only appears twice and in what read to me as side-bars. It seemed to me to be about muddying the separation between code and data. As I mentioned before, this is something that concerns security specialists. You don't have to take my word for it, it's a well known principle of computer science. Here's a couple links that came up in Google:

"Pascal, like its creator, believes firmly in the separation of code and data. It therefore (at least in its original form) has no ability to create initialized data. This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored­program computer. Code and data are the same, or at least they can be. How else can you explain how a compiler works?"

Security (i.e. not trusting one's own programs) was not of much of a concern in Turing and von Neumann's times, of course :-)

> So given that this is not really special, why is it that> most languages don't work this way? The answer, as far as> I can tell, is that people don't want them to. It's a> powerful feature, no doubt, but it's also a really> dangerous one. The distinction between code and data is> deliberate. On a side note, it's also a core concern of> computer security specialists.

I'm certainly not a historian but I think the reason more languages don't support it is more due to an historical division. Lisp has been around quite a while but wasn't efficient for a long time. (Techniques like tail call recursion didn't make their way in until Scheme as far as I know.) I think the division is more a consequence of languages modeling pushes of "data" and "instructions" into certain registers in order to be maximally efficient.

> I was in a discussion not that long ago about using XML in> logs and someone pointed out that XML is basically a> crappy reinvention of LISP (which I agree with if you are> taking about XML used for writing code) and that writing> logs in LISP was a really effective approach because you> could execute the logs.

Why would you execute log files?! :)

> AS I understand it, macros can write macros and I don't> see anything that prevents them from acting like viruses.> I may be making a big deal out of nothing.

You are :).

> It's just> t something that I think about when I read about LISP. A> more mundane concern is that these features would make> code unmaintainable. Maybe not in a one person project> but on team projects, I don't see how much use of these> features could be sustained.

If you took the deep-dive and learned about how they work, you would find that these nightmare scenarios aren't very likely to occur.

Also keep in mind, I've seen nightmare scenarios occur in plenty of language *without* macros, and I'm sure you have, too! :)