The Other Codex

I did okay for myself this first year at my job and had a little bit of money left over. I told myself I was allowed to buy one moderately ridiculous luxury item as a reward. And ever since I was like nineteen there’s only been one moderately ridiculous luxury item I really wanted.

So now I’ve got it:

I kind of want to explain, but part of me knows that no one can tell you what the Codex Seraphinianus is. You have to see it for yourself.

(the above is 50 MB high-resolution PDF file I’m hosting my old website. When that goes down under the strain, you can switch to a more manageable version here)

I was prepared to pay $500 for it, which is what it cost five years ago when I first looked into purchasing it, but to my delight there was a new version selling on Amazon for only $80. Getting the only ridiculous luxury item I’ve ever really wanted for $80 seems pretty good.

[That wasn’t originally intended to be an ad. But I realized it was stupid to accidentally advertise something without getting paid for it, so I signed up for Amazon Affiliates program. So if you thought that sounded like an advertisement, you’ve been Gettier-cased.]

Yes, that’s all well and good. But is it racist? Sexist? Ultra-progressive and intolerant? Genocidal? Authoritarian? Creepy and dogmatic? Does it make a mockery of the principles of social justice that we all hold dear?

We need to turn this discussion back to the important matters of the day! Imma real happy for you, Scott, but please don’t derail!

Ouch, sorry then! Seriously! I was just kind of (self-)satirizing and not appreciating the risk of yet another SJ flamewar right in this thread. I mean, hell, can’t we even have an SSC version of r/circlejerk? I love r/circlejerk!

I mean, hell, can’t we even have an SSC version of r/circlejerk? I love r/circlejerk!

#whywecanthavenicethings

Now, in a serious effort to de-escalate:

I’ve always felt like the Codex Seraphianus was just a *little* too obviously tongue-in-cheek compared to the legitimate ethereal otherness of the Voynich manuscript. (Then again, I also felt like ‘Fantastic Planet’ was trying a little too hard.)

This is the sort of tripe I’m talking about. Not even trying to be funny, but instead repeating the same cliche phrases to the point where it’s beyond dull. You remind me of those who spam sex chats with “Who wants to talk about my sex wife” or “Does anyone want to discuss brutal thai prisons”. No finesse. No creativity. Only a mechanical churn of crap.

look at us! we’re a thede! we have injokes! we’re a ‘we’! we’re so much of a ‘we’ you guys check this out here’s an injoke! and then another injoke! and then another injoke! we sure do get these things don’t we! we we we we we

I saw that, but it looks to me like someone saying “What if we rounded off each Codex character to its nearest Latin equivalent?” without showing there’s any reason to do so or that it produces any useful results.

Also, it’s hosted on a site about ancient aliens, so our priors for careful thought are pretty low.

In the next few pages I will prove that an alphabetical language bridge, found inside the Matrix, furnished for our particular use, and modeled on earthly language systems, is what the aliens provided for us to communicate with them.

— at which point i started giggling uncontrollably; i had been acting under the assumption that “alien” was just an adjective to describe the world depicted.

Suppose you wanted to make a Codex or a Voynich Manuscript. The text should be meaningless (ie not an obvious cipher) but it should also resemble a natural language enough that linguists can’t be sure it’s meaningless, and talk a lot about how all the letter frequencies and grammatical features are right.

i’d imagine that you’d make an alphabet along with a couple of common “words”, and then write a hundred or so pages of text — so as to detach the “choosing what to write next” part of writing from conciousness. i suspect that the patterns that would be naturally converged upon would be close to actual written text, although i have no proof for this.

in the specific case of the codex, serafini says that he practiced automatic writing. he did have to heavily edit it afterwards to remove extrenuous characters and such, though, so i think that anything non-labour-intensive is probably going to be pretty imperfect.

of course the only non-speculative way to make text have all the right letter frequencies is to actually become a linguist, and know what the right letter frequencies are.

Scott, I’m pretty sure you could get output like this using current NLP technologies. I think if you took some NLP technologies, and asked them to generate a bunch of text in what they thought was English, you could get the sort of thing you’re looking for. I’m not sure this would get you grammatical features that linguists would like, but it could probably get you letter frequencies.

And now I really want to do this. =P I’ll let you know if I ever get around to it.

First you would define your stochastic recursive context-free grammar, with analogues of ‘nouns’, ‘verbs’, ‘prepositions’ and whatever other categories you choose to come up with, which define possible sentence structures and the probabilities associated with each production rule.

You would also generate dictionaries from a chosen arbitrary script, respecting whichever distributions of letters and combinations thereof you want, for each of the categories in your grammar (potentially through a similar SRCFG method to the language itself).

Then it reduces to the task of repeated markov runs on your grammar until you have enough text.

Darcey, you could form an n-gram model of English and generate stuff that looks like English, but this would be obviously pseudo-English. Even if you permuted the letters, it would be trivial to reconstruct the alphabet and then it would be obvious what you did.

What you want is an n-gram model that is like those you get from natural languages. But what does that mean? Probably someone has an answer. This comes up often, such as people studying this vary manuscript. Or ancient inscriptions that may or may not be language. I haven’t read the literature, but my impression is that the answers people have are not very good.
You could form n-gram models of many languages and see what they have in common, but I don’t know how to go about that, at least not in an automated fashion.

Incidentally, it is common for people to generate pseudo-English based on word n-gram models (n=2,3), but it is rare to do so on the letter level. I suspect that it is a good idea. It would be fun to make a text where the model slowly changed, probably from sense to nonsense. If you combined a word model with a letter model, you could shift from words to letters, shifting from real words to pronounicable fake words. More crudely, you could reduce n. I’d like to switch languages, but I don’t have much idea how to do it. I guess it should go through an intermediate phase of words that are plausible in both languages. Maybe a weighted geometric mean of the probabilities? P(w|English)^a.P(w|French)^(1-a)

@sviga: oh, interesting point. I was thinking about generating from existing models, and existing models need to be tractable to learn from data, so I don’t think any of them do phonology and syntax at the same time. But if all you want to do is generate random crap from the model, then you are totally right, there’s no reason not to combine a CFG with a phonetic model.

@Douglas: what I had in mind was to take an actual phonetic model trained on actual English (or some other language), and wiggle the parameters slightly to get something that was plausible but not recognizably English, even with permutations. Alternatively, a lot of these models are Bayesian, so they have priors that help guide the parameter learning. In case you’re not familiar, the idea is that the prior gives a distribution over languages, and you’re trying to figure out from data which language actually got drawn from the distribution. But instead of doing that, you could also draw a totally new language and then generate things from it! Of course, these models are highly unrealistic, so if you just draw something totally random from the prior, it probably won’t look much like language. The tradeoff between “draw something totally random” and “just take the parameters you learned on English and wiggle them a bit” will probably depend highly on the model, and you may want to do a little of each.

“Suppose you wanted to make a Codex or a Voynich Manuscript. The text should be meaningless (ie not an obvious cipher) but it should also resemble a natural language enough that linguists can’t be sure it’s meaningless, and talk a lot about how all the letter frequencies and grammatical features are right.

What would the easiest way to do this be?”

In Neal Stephenson: “Cryptonomicon”, Waterhouse uses a Riemann zeta function to give the appearance of encrypted text, when in fact it’s sort of a PRNG. I suppose we should call it a pseudo *non*-random number generator, or PNRNG.

But that’s been done now.

Assuming that the Voynich Manuscript does not mean anything, and since we know its creator did not have access to modern cryptology, its persistence seems to indicate that fooling people into thinking nonsense is real language/lightly encrypted real language is quite easy, or the inventor was very good at it.

Suppose you wanted to make a Codex or a Voynich Manuscript. The text should be meaningless (ie not an obvious cipher) but it should also resemble a natural language enough that linguists can’t be sure it’s meaningless, and talk a lot about how all the letter frequencies and grammatical features are right.

What would the easiest way to do this be?

Construct a language and translate a very long text into it?

If I wanted to make a codex I’d just take some text in a dead language no one could recognize and write it in a different alphabet. Or an old and outdated reconstruction of a dead language, if I wanted to be really sure no one could recognize it.

The Derivizer is a tool designed for conlanger to expand the corpus of conlanguages using derivation. It might also be helpful for creating new non-words that are related to existing non-words in ways that resemble how words are related in natural languages.

I would expect there to be other resources for conlanging that are applicable to making up a natural-seeming but meaningless non-language.

That would not work, and it would be ‘deciphered’ immediately, by anyone interested. A simple substitution cipher is a joke.

If it’s done in a dead or obscure language and not a constructed one, there’s a chance of it being deciphered. If it’s done in a constructed one, very little chance.

If it’s done in a dead or obscure language, it would be possible to obscure the text, so it wouldn’t be a simple substitution cipher. Easiest way to do this would probably be a defective script. If you use English but don’t mark consonant voicing and conflate tense and lax vowels, it would be that much harder to determine that it’s English.

If you want to get it done with minimal processing, find an obscure language and collect a few tens of thousands of words of folktales, conversation, descriptions of objects, etc. If the source text is original, less effort.

Also, very few to no dead languages with book length works in them.

The Voynich manuscript has 35,000 words. Wikipedia says that the typical mystery novel has 60,000–80,000 words. Any language with half a mystery novel of content would work. That probably includes any language with even one written epic or religious text.

I recall reading of statistical tests applied to dolphin calls, which had been unable to distinguish them from language (but *did* distinguish monkey calls). So if I were constructing such a thing I might just get some dolphin recordings and make up a transcription alphabet.

How cryptographically secure is a foreign, unknown language? For example, suppose you landed on an alien planet and found a library of (text-only) books written in Klingon – and you don’t know any Klingon, you’ve never met a Klingon, you’ve never even heard of a Klingon, and, of course, you don’t have any Klingons around to interact with. All you’ve got is all the Klingon text you could ever want. Could you decipher any of it?

Alternatively, could modern codebreaking techniques have deciphered the messages sent by the Navajo code talkers of World War II without using any prior knowledge of the Navajo language?

I thought there were European languages for which only one text existed, but a fast search hasn’t turned up any examples.

Supposing that there is such a language with only one or a few texts, it’s embedded in a cultural context– it’s related to other languages, and it’s about human beings on our familiar planet. The texts have been studied.

My assumption is that those texts wouldn’t be a great starting point for a hard-to-decipher language.

This is probably as good a point as any to say: you’re a really good writer. There are other people in the LW cluster who write arguments I can’t find holes in but the only thing that convinces me is that they’re smarter than me; you clarify complex things and come up with the arguments I wish I’d thought of myself.

So, like, you have an exceptional talent, and one that’s probably valuable to a wide range of organisations. That value will go somewhere, but you should probably claim some of it for yourself rather than Faceless Bureaucracy X. And, y’know, market only allocates resources efficiently if everyone prices selfishly. There’s a trap I fell into for a while where you think “I’m earning above the average so I’m already getting more than is fair.” Don’t think that way; not only does it not help you, it doesn’t help others either.

I’m not saying join an evil PR firm or whatever (unless you want to). Just… be aware of your value. Apologies if this is all obvious/condescending/fanboyish/etc.

While looking at Amazon affiliates I found some information about ads on blogs and was kind of shocked at the amount of money I might be able to make with my current traffic (~200,000 hits/month). If anyone has any ideas for how to maximize money-to-intrusiveness ratio, or wants to set something up for a cut of the profits, let me know.

(I promise I will not add anything annoying like popups or autoplay sounds or ads for unethical anything. If people are mostly opposed to any ads at all, I won’t add them.)

The least obnoxious ads I’ve ever seen were on Giant in the Playground, although granted they were a little distracting sometimes. I haven’t been there in a while so things might have changed, but you may still want to ask Rich how he handles his advertisers. If nothing else he’s a pretty savvy guy when it comes to making money.

You might also want to look out though, since some of the bigger advertisers expect to be able to control content. I think you’re probably familiar with what happened to TVTropes and how they had a crazy deletion/banning spree when Google Adsense decided that there were too many trope entries on lolicon animes/mangas for their liking.

Putting up an amazon affiliate link would probably be the least intrusive way to make money. I (and lots of other people) buy a bunch of stuff on amazon anyway, and it’s no extra cost or inconvenience, so we might as well support a blogger we like. Right between the ‘search’ field and ‘Links’ would be a good place for the affiliate link.

On the other hand, do you think you’d end up biasing yourself towards outrageousness if you had a monetary incentive to draw traffic? Might be a bit dangerous. Cost-benefit analysis probably still supports adding advertisements, just be careful.

You can put a PayPal link in the sidebar and just take direct donations. Like most, I’m a little slothful on such things, but people will send you bits and pieces.

Let me say that it is a good idea to do this. It’s better than ads (though I’m not opposed to ads that aren’t for The Way to Happiness or somesuch) or it can be with ads. An Amazon sidebar is really unobtrusive and can make some cash.

You are a great writer, and you’re trying to follow the data. Props for that. You should make some money on the way. Just don’t sell the blog to Kevin Trudeau once you have an income stream.

(Aside, on the book thing: At Powell’s in Portland, I found a 50-year-old signed copy of The Psychosomatic Genesis of Coronary Artery Disease, which was by a University of Kansas associate professor and which extolled the healing virtues of the mind via Jesus. It has charts and everything.)

Ultimately, money from advertising comes from your readers. If you don’t want money from poor readers, don’t pressure your readers to all donate a fixed amount. But why make it hard for rich readers to give you money? Don’t you believe in price discrimination and progressive taxation?

Affiliate links do have the advantage of skimming money off of something that already occurs, but it looks to me that you don’t normally link to amazon.

I’m not sure you’re correct. Like you say, I already link to Amazon. Amazon doesn’t charge people more money for buying from affiliates, so my cut comes out of their profit margin.

If I display ads, then one of a couple things happens. Either people click on the ad and decide not to buy anything (DEAR GOOGLE: I AM NOT RECOMMENDING ANYONE DO THIS DELIBERATELY), in which case the money comes totally from the company, or else people click on the ad and decide they want it and do buy it, in which case I am increasing my readers’ utility by helping them get a product that was apparently worth more to them than the money they paid for it.

In neither case is it a zero sum transfer where readers lose utility and I gain it.

Secondly, should readers feel symmetrically bad about taking up your free time, of which you probably have less than most of us? That’s a pretty sad situation, especially if you deny the opportunity to assuage these feelings.

Thirdly, your writing is good, but let’s be honest — it’s not heroin. I doubt anyone is going to give you all their money and starve on the streets, so there doesn’t seem to be a moral duty for you to save people from their self-destructive spending choices.

I think it is fairly common for people to spend money on things that are worth less to them than the money they paid.

This is particularly true in the presence of advertising, which in practice works out to be the “science” of causing people to overestimate the value of things they can buy.

(I’m not necessarily arguing that you shouldn’t ever have advertising, just that I don’t agree with the statement that showing people ads is neutral or helpful to them. If ads were neutral or helpful, nobody would use adblockers.)

Perhaps consider Patreon? I gather it’s similar to having a paypal donate button, but it’s higher-status.

Sometimes I see movies and it costs ten bucks. It’s rarely as good as this site. I would be happy to send. A small amount a few times a year and so would many. If you’re opposed, you’re opposed, but at least think about it. It’s not coercive and many would be happy to support this blog.

If you’re worried about being horrible to your readers, putting involuntary ads on the site is worse than putting up a voluntary donation button. Even opt-out ads (because most of your readers are smart enough to use adblock if they want) are worse than opt-in donations.

There’s an argument to be made that even opt-in ads are worse than opt-in donations inasmuch as advertising is inefficient and promoting advertising culture is evil, but I don’t think we need to reach that argument to see that a donation button is kinder.

(The Amazon affiliate program is the kindest of all – it’s essentially free money, so long as you’re virtuous enough to withstand the temptation to optimize your content for affiliate money, and so long as you don’t mind being part of Amazon’s nefarious plot to conquer the world.)

If you start hosting ads on this site, let us know and I’ll disable my adblock on here. I’m fine with pretty much anything short of Patheos’ level of intrusiveness. And I certainly don’t mind at all book/art recommendations like this post!

If you do go to ads, please no ‘sliders’ or other animations. Otoh, Google’s all-text ads are too easy to mistake for part of content. I find the less intrusive thing is a sidebar of ads that look like ads — with modest layout and non-moving pictures.

This webcomic apparently makes significant amounts of income from “Support – shop at Amazon [via ]” – at least, enough income that the link is regularly mentioned in blog posts, and enough income that the combination of that, merchandise and adds supports a family. (Amazon profits to the extent that people shop at Amazon rather than at competitors, so this is not just taking money from huge corporations.)

Adding ads makes the site feel… commercial… to me. Of course, it’s rather easy to tell you not to earn money. In penance. I promise to send a (small) amount of money if you do put up a donation button, or accept paypal at your gmail. (Also, are you sure you want to put that address on your other website?)

“There are other people in the LW cluster who write arguments I can’t find holes in but the only thing that convinces me is that they’re smarter than me; you clarify complex things and come up with the arguments I wish I’d thought of myself.”

I’ve felt like SSC was different from LW for a long time but I had always assumed it was the focus on different subjects; reading this, I realize otherwise. And it has effectively summed up a major feeling of wrongness I’ve had with many posts on LW.

I think Scott changes his mind more often than Eliezer does, at least publicly, and this is one reason I trust his arguments more.

Additionally, Eliezer tends to be self-deprecating by ironically dressing himself in grandiose claims, while Scott tends to avoid self-deprecation or to do it only directly. I like Scott’s approach better, because Eliezer’s sometimes leaves me feeling like he’s trying to double bluff me.

Scott also shows more imagination. The way that Scott’s essays turn unexpected corners and draw structure from analogy is fantastic, an incredibly rare skill. The Last Psychiatrist is the only other I’ve ever seen do that, but Scott does it better (though, TLP is not exactly concerned about comprehensibility). In contrast, the Sequences involve more spoonfeeding, which means they can’t demonstrate flexible thought as easily.

Those are not necessarily the most important differences, but they’re three that spring to mind immediately.

However, I’ve also reread some posts from the sequences that I initially wasn’t impressed with, and found them to be more insightful than I had initially thought. I think that LessWrong, although sometimes troubling, still has useful basic knowledge to be eked from it.

Eliezer has changed his mind an exceptional amount a few times I can think of.

I know when he was younger he thought Friendly AI was a waste of time and he was annoyed that people were trying to delay the creation of superintelligence for silly ethical reasons.

I’m five years younger than he is, so you’re probably catching me at a much more formative stage.

(and I am very complimented by the comparison to TLP, whose style I deliberately borrowed from more than I like to admit)

I share the feeling of never being sure whether Eliezer’s grandiose claims are straight grandiosity, an attempt to countersignal humility, or an attempt to double-bluff us because he’s actually grandiose after all. But I got used to it pretty quickly and now just find it hilarious.

A challenge: pretend that Time Cube was actually meant to be something like Codes Seraphinianus – a blog written by a man from a weird alternate Universe who somehow ended up here on Earth. Write a story set in the Time Cube World, where all the weird not-quite-physics (4-corner days and the like) are actually true and accepted by everyone.

Many people already do write stories (and, ahem, “non-fiction”) premised on Time Cube’s… race surrealism, and related positions, to be true. For instance, Moldbug is certainly very serious about defying the evil ONEist god.

The decline of Time Cube from “fun to read, endearingly kooky” to “hateful and depressing” was sad to watch — he didn’t become monomaniacally racist and anti-Semitic until ~2007, IIRC. (Sure, he was always verbally aggressive, but that was just because WE’VE BEEN EDUCATED STUPID.)

I dunno, I’d be more interested in a facsimile of the Voynich manuscript. It’s likely the same thing, an elaborate object d’art/hoax, but at least it’s got pedigree. The author of this thing’s even still alive?

As for ads, an occasional plug to something you like in the sidebar with a reminder that peope can buy whatever they like and you still get a cut would work ok. I’ve got a few amazon payments over the years, it’s easy money if you have a dedicated readership (Alas I no longer do)

A recent paper (written jointly by a botanist and a cryptographer) proposes that the manuscript might be, or be based on, a sixteenth-century Mesoamerican botanical codex. News story here, original paper here.

The authors are admirably cautious — they’ve found a promising, previously unexplored avenue of research, not a definitive solution — but it seems at least plausible.

Congrats Scott! I first heard of the codex from your old livejournal a few years ago. I think you said something along the lines of grabbing a copy of it being on your list of life goals in between securing a career and starting a family. This sort of made my day, actually.

Yeah, this is very sweet. I was looking through Scott’s old Livejournal entries and saw him mention that it was a long-term ambition of his to get a copy of the Codex. I wondered whether he ever had; it’s rather gratifying to know that you eventually did, Scott.

For me, the Codex Seraphinianus was the one that got away for a long time– I’d bought a copy of the previous edition (about $70) as a gift, and while I didn’t regret the gift as the price went up and up (I think it was $800 at one point), I did regret not getting one for myself.

In my experience, if I buy a limited edition to get a copy to read, the odds are extremely high that there will be a trade edition eventually. Is there any way to estimate the odds on that one?

However, the book that really haunted me was Under Milk Wood by Dylan Thomas, lettered and illuminated by Sheila Waters (page down for the entry)– I finally got it, after about 20 years, and it’s gorgeous. Less intellectual content than the Codex, but wonderful calligraphy.

(I’m not expecting a trade edition at this point, and if there were, the printing probably wouldn’t be as good– I want to be able to see the hairlines on the calligraphy.)

Pro-tip– if you care about what edition you’re getting, don’t deal with amazon.

Here’s the one I’m not getting: The Red Book by Jung. The illustrations are unspeakably wonderful, but the truth is that I’d be unlikely to read the whole thing, let alone study it.

For me, luxuries are also books and such, though generally gaming-related ones; for example, I’m contemplating buying the Oxyd Book. I also have started to collect copies of old D&D modules, such as Expedition to the Barrier Peaks; one I’d like to get, but is quite pricy, is Dead Gods.

One rare book I am very happy I managed to get is a Russian original copy of M. Bongard’s Pattern Recognition; for anyone who’s read Gödel, Escher, Bach, that’s the book that introduced Bongard problems to the world, and is generally a fascinating, if quite technical, work.

I would definitely be interested in S1 and possibly in B1/X1, though I can’t currently justify spending enough to pay as much as those are reasonably worth. Email me at achmiz s (without the space) at gmail dot com.

Meta

Subscribe via Email

Email Address

MealSquares is a "nutritionally complete" food that contains a balanced diet worth of nutrients in a few tasty easily measurable units. Think Soylent, except zero preparation, made with natural ingredients, and looks/tastes a lot like an ordinary scone.

Jane Street is a quantitative trading firm with a focus on technology and collaborative problem solving. We're always hiring talented programmers, traders, and researchers and have internships and fulltime positions in New York, London, and Hong Kong. No background in finance required.

Nectome is building the first brain preservation technique to verifiably preserve your memories for the future.

Metaculus is a platform for generating crowd-sourced predictions about the future, especially science and technology. If you're interested in testing yourself and contributing to their project, check out their questions page

Triplebyte is building an objective and empirically validated software engineering recruitment process. We don’t look at resumes, just at whether you can code. We’ve had great success helping SSC readers get jobs in the past. We invite you to test your skills and try our process!

Giving What We Can is a charitable movement promoting giving some of your money to the developing world or other worthy causes. If you're interested in this, consider taking their Pledge as a formal and public declaration of intent.

Beeminder's an evidence-based willpower augmention tool that collects quantifiable data about your life, then helps you organize it into commitment mechanisms so you can keep resolutions. They've also got a blog about what they're doing here

AISafety.com hosts a Skype reading group Wednesdays at 19:45 UTC, reading new and old articles on different aspects of AI Safety. We start with a presentation of a summary of the article, and then discuss in a friendly atmosphere.

80,000 Hours researches different problems and professions to help you figure out how to do as much good as possible. Their free career guide show you how to choose a career that's fulfilling and maximises your contribution to solving the world's most pressing problems.

Altruisto is a browser extension so that when you shop online, a portion of the money you pay goes to effective charities (no extra cost to you). Just install an extension and when you buy something, people in poverty will get medicines, bed nets, or financial aid.