Me

If two points (or posts) make a trend, interactive data visualizations of the Hebrew calendar are a thing I blog about now. In the long and storied tradition of single-usenoveltysites, I’ve created isyontefearlythisyear.com (and its evil twin, isyonteflatethisyear.com). Now you can point to real data when the conversation inevitably comes up before every holiday.

Being Jewish, I immediately thought of the calendrical connection between Easter and Passover. Specifically, since Easter is usually around Passover, does the 19-year cycles of Hebrew leap years play a role in when Easter falls?

Very briefly (and approximately), a solar year is aligned with the seasons (because a year is one orbit of the earth around the sun), but the Hebrew calendar is based on a lunar calendar in which a month is determined by one cycle through the phases of the moon. The solar year is approximately 365 days, while 12 lunar months are approximately 354 days, or 11 days shorter. If the Hebrew calendar were a pure lunar calendar, over time the months would drift around the year. To make up for this shortfall, a 30-day leap month is added to the Hebrew calendar every two to three years, seven times in a 19-year cycle (years 3, 6, 8, 11, 14, 17, and 19). (30 days × 7 years ≈ 11 days × 19 years. Hey, I said this explanation is approximate.)

To see the effect of Hebrew leap years on Easter dates, I recreated iamreddave’s graph, but with larger points for leap years and points colored by position in the 19-year cycle.

What jumps out to me is that all of the late Easter dates are Hebrew leap years, which is what you’d expect when an additional month has recently been inserted, but all of the early Easter dates are also Hebrew leap years.

Passover, on the other hand, always occurs late in a leap year, as you’d expect:

Toggling between the two, it looks like it’s years with the latest Passovers that get leap-year–early Easters.

Zoom in a bit and you’ll find that the early Easter dates are always years 8, 11, and 19 of the 19-year cycle:

I thought maybe this happens because the Christian 19-year cycle is shifted by three years from the Jewish cycle (2014 was the first year of the Christian cycle, while 2017/5777 is the first year of the Jewish cycle), but this isn’t the case. Here’s what seems to be happening:

Easter is (by definition) the first Sunday after the full moon after the vernal (in the northern hemisphere) equinox. Typically, that’s the full moon of Nissan (the Hebrew month which contains Passover), but in those three years the leap month pushes Passover so late that it’s a full month later than the equinox. In other words, in those years the new moon that marks the start of Nissan is at least ~14 days after the equinox, which puts a full moon very shortly after the equinox, which is still in Adar II (the month before Nissan).

With parshatShmot coinciding with the inauguration (err, Put-in) of Donald Trump, this image from Yossi Fendel has been making the rounds on social media. It quotes the eighth verse of the parsha (and book):

Let us deal shrewdly with them, so that they may not increase; otherwise in the event of war they may join our enemies in fighting against us and rise from the ground.”

The liturgy talks a lot about the exodus from Egypt, but focuses far less on why the Israelites became enslaved in the first place. The answer, this parsha makes clear, is fear. Fear of shifting demographics. Fear of an ethnic group that looked different, spoke differently, and had different practices and customs — yet served an important economic function by doing the job no Egyptian was willing to do.

Faced with that fear from shifting demographics, the Pharaoh had at least a couple of courses of action. He could have pushed an agenda of multiculturalism, encouraging the Egyptians and Israelites to get to know one another, thereby mitigating their fear. Instead, he felt that it was more important to maintain what he considered the fundamentally Egyptian character of Egypt.

The United States — at least in theory — was founded not as “a place for a people”, but as a place for all people. Sadly, there are people who believe that America was a white country (back when it was great or something 🙄), and they are now feeling the same fear and oppressive urges the biblical Pharaoh felt.

This is precisely the danger that comes along with ethnic, racial, or religious nationalism. A nation founded as “a place for a people” cannot simultaneously offer full and equal rights/privileges to all, and continue to exist should that people become a minority. And the only ways to maintain the “desired” demographics are exclusion and oppression. Whether it’s in the context of Trump-emboldened white nationalism here in America, or Zionism, its moral equivalent, let’s learn from this week’s well-timed parsha: national ideals that depend on maintaining certain demographics are inherently oppressive.

In a place like America, although changing demographics can bring up a natural fear of the stranger, it also provides us with an opportunity to not be like Pharaoh and to strive for a multicultural ideal. The Torah reminds readers that, because the Israelites were strangers in Egypt, not only is one forbidden to oppress the stranger [1, 2], but it explains how: by loving that stranger [3]. But loving the stranger is abstract. Perhaps it’s better to take a cue from the JPS translation and befriend the stranger. Friends are way less scary than strangers.

I just read Madison’s Federalist Paper #10. Very interesting stuff. At a high level, the purpose of electors is to mitigate the effect of “factions”, which he defines (all emphasis in block quotes is mine):

By a faction, I understand a number of citizens, whether amounting to a majority or a minority of the whole, who are united and actuated by some common impulse of passion, or of interest, adverse to the rights of other citizens, or to the permanent and aggregate interests of the community.

A p’shat interpretation though a contemporary lens would seem to be a strong argument in favor of the electors being unfaithful and voting against Trump. After all, he explicitly threatened the rights of several groups of citizens, and his authoritarian tendencies pose a threat to the “aggregate interests of the community.”

Indeed, it is common to say that the purpose of the Electoral College is to protect the public good from the irresponsible or uneducated will of the people, and that’s also true:

The effect of [a Republic], on the one hand, to refine and enlarge the public views, by passing them through the medium of a chosen body of citizens, whose wisdom may best discern the true interest of their country, and whose patriotism and love of justice will be least likely to sacrifice it to temporary or partial considerations. Under such a regulation, it may well happen, that the public voice, pronounced by the representatives of the People, will be more consonant to the public good, than if pronounced by the People themselves, convened for the purpose.

However, Madison’s actual concern, it seems, is that non–land-owning voters1 would overwhelm the landed class. He even explicitly calls out “an equal division of property” as exactly the type of “wicked project” a representative republic can protect against.

But the most common and durable source of factions has been the various and unequal distribution of property. Those who hold, and those who are without property, have ever formed distinct interests in society. Those who are creditors, and those who are debtors, fall under a like discrimination. A landed interest, a manufacturing interest, a mercantile interest, a moneyed interest, with many lesser interests, grow up of necessity in civilized nations, and divide them into different classes, actuated by different sentiments and views. The regulation of these various and interfering interests forms the principal task of modern Legislation, and involves the spirit of party and faction in the necessary and ordinary operations of the Government.

It’s also impossible to ignore the effects of media and technology. We are hardly a united country, but the divisions depend on sociological environment (racial, religious, and ethnic diversity, wealth, rural–urban, etc.), not proximity.

The influence of factious leaders may kindle a flame within their particular States, but will be unable to spread a general conflagration through the other States: A religious sect may degenerate into a political faction in a part of the Confederacy; but the variety of sects dispersed over the entire face of it, must secure the National Councils against any danger from that source.

A conflagration can now easily spread across the continent.

Here in 2016 we have a situation where the Electoral College is about to vote for a candidate who is “adverse to the rights of other citizens, or to the permanent and aggregate interests of the community” when they are supposed to be the ones “whose wisdom may best discern the true interest of their country.” Therefore, it is easy to argue that they should vote counter to the will of the voters in their states. On the other hand, had Bernie Sanders been elected (if only!), someone reading the very same document could argue that the citizens whose rights are being infringed upon are the wealthy 1% whose property would be at risk of “[more] equal division”.

My take is that the threats to the Republic in the face of a Trump presidency are sufficient enough, and the adverse effects on the rights of citizens substantial enough, that the electors should vote for Hillary Clinton. The argument that a more left-leaning economic policy would infringe on the right of the 1% to hold their wealth breaks down because the effect would not be sufficiently “adverse”, and a better-functioning, more equitable economy is in “the true interest of their country.”

I’m sure there are other historical arguing for and against the Electoral College, but based on this one, I believe the electors should elect Hillary Clinton.

Franchise was being slowly extended to non–land-owning white men in various states at the time. Wikipedia↩

What’s happening at the Republican National Convention doesn’t feel real, but it’s real. The self-aggrandizing nominee for president claimed, “I alone can fix it.” Later, chants of “Yes you will, yes you will.” This is not about policies; it’s fear and cult of personality.

Earlier this year I published a couple of blogposts with some descriptive statistics of trop in the Torah. One of the biggest shortcomings of those posts was that they didn’t deal with the order of trop at all. This is a pretty big shortcoming when you consider that many trop come in pairs/groups, or that certain trop frequently or necessarily follow certain other trop. So, this time around I created an interactive tool I’m calling (for lack of creativity) the Trop Sequence Explorer. If you haven’t checked it out yet, I’d suggest playing around with it a bit; it’ll give you context for the rest of this post.

Basically, it shows each trop listed in order from most to least common. When you click one, it shows you all trop that can follow it and how often each one occurs in that sequence. In other words, it shows transition probabilities to each trop conditional on all trop that come before it in a sequence. There’s also a graph at the bottom that shows how often the selected sequence occurs in each perek of the Torah. Clicking a bar in the graph shows the text of the p’sukim in that perek that contain the current sequence.

What follows is a bit of the thought process that went into its creation, some issues I ran into, and some interesting observations. Feel free to jump to the section that’s most interesting to you.

The Jewish Nerd section

Back in the fall, I was gabbaiing and noticed two tevirs in a row. “How often does that happen?”, I wondered. Seven times, it turns out. It’s pretty well known that a zarka has to be followed by a segol or a munakh segol, but it turns out that the latter is actually more common (by a 13-point margin).

Beyond the factoids, there are other fun things to come across. Parallel sentence structures often have parallel trop, even when the trop itself is not that common. In B’midbar 26, gadol is used at a much higher rate than normal, mostly on names in a genealogy; it really pops out in the bar graph.

One of the most surprising things for me, though, is how relatively unique each pasuk is. Once you get more than three or four levels deep in the tree, there are surprisingly few p’sukim that match that sequence. This is even true for seemingly common sequences. A pasuk that is merkha tipkha etnakhta merkha tipkha sof pasuk only happens 43 times in the entire Torah.

As I was creating the Sequence Explorer, I encountered some challenges and needed to make some decisions about how it used trop data. One question several people have raised is: Why are there ever trop following a sof pasuk? Shouldn’t a sof pasuk, by definition, be the end of a pasuk? The answer is that there are two sets of trop used for the 10 Commandments, the takhtonim, which are used for private study, and the elyonim, which are used for public readings. I chose to use the elyonim because I wanted to examine how trop are read out loud. The problem is that the two sets of trop also have different pasuk divisions. Even though I used the elyontrop, I had to use the takhtonpasuk divisions, because the takhton divisions seem to be more standard, and are the ones returned by the Sefaria API, which is what I used to pull the in actual pasuk text when you click on a perek’s bar in the bar graph. Perhaps at some point I’ll add a setting so people can explore both versions.

Many authorities consider munakh legarmeh a separate trop. I decided not to count it separately for two reasons. The simple technical reason is that there is not a different Unicode character for it (distinct from munakh), so I would have to detect it based on context. The other is that, by definition, the munakh legarmeh is a munakh that precedes another munakh. Since that’s exactly the type of data this app shows, it felt both redundant and somewhat circular to distinguish a trop by what follows it. If you click the munakh, the number of munakhs that follow it should be equal to the number of munakh legarmehs.

Seeing sequences also helped me find issues in the data that I couldn’t see otherwise. For example, I found a couple instances where the data showed four pashtas in a row, but this wasn’t really the case. Trop typically indicate where the stress should fall in a word, but some trop must be placed at either the beginning or the end of a word regardless of stress. To help readers, many sources, including — I found out — the Tanach.us data source I used, put such trop on a word twice: once in the required position, and once where the stress falls. I cleaned out those doublings by searching for any word with two trop on it, and if the two trop were the same, I deleted one of them. Hopefully there was no collateral damage from that.

Another oddity was that there were ten tsinnorits and one geresh mukdam. This was odd because those trop aren’t used in the Torah, even if their lookalikes, zarka and geresh are. It seems like they were used for typesetting reasons — their placement on a word is slightly different — so I just lumped them in with their respective lookalikes.

There were also a number of p’sukim with no sof pasuk. I’m not sure exactly why, but I fixed them. Being able to see the bar graph across the bottom was hugely helpful in seeing that this was an issue.

Speaking of the bar graph at the bottom, aggregating by perek is somewhat arbitrary. At some point I would like to try aggregating in other ways, such as by parshah.

The Design Nerd section

I knew pretty early on that I wanted to do some sort of Markov chain–like visualization of transition probabilities, but I set the idea aside to do real work, which, fortunately, happened to involve learning D3. When I turned my attention back to this, I realized two things:

Pairwise transition probabilities aren’t that interesting in isolation; sequences are much more interesting. (In other words, you need memory in your Markov chain.)

As in the previous posts, we have the complete dataset. Descriptively exploring that is very different from wanting to make predictions or generate new sequences, which is a more typical use of Markov chains.

So, I settled on the basics of a design, but without a few key features. The original idea was a tree, where each level would show the conditional probability of going to a particular trop given all those that had come before it. The plan was just to show simple squares with a trop symbol, its name, conditional probability, and conditional count. And, there was no bar graph at the bottom to show where a given sequence occurred.

It wasn’t until I was sketching out the visual design for the squares — well after I had it actually working — that I came up with the idea of shading them in, making them into a histogram of sorts. Since they seem to follow something not entirely unlike a Poisson distribution, I thought about log-weighting them, but decided it would be more straightforward not to since I’m also showing raw counts.

Once I could play with building sequence trees, I pretty quickly wanted to know where in the Torah those sequences were. And so, the bar graph at the bottom was born. For most of the time I was building it, clicking a bar would just open that perek on Sefaria. Using the Sefaria API to pull in the text of the actual p’sukim was one of the last features to go in.

The Programming Nerd section

When I first started thinking about how to implement this, my intuition was to have the data structure match the tree structure of the interface. It felt elegant, and it seemed like a good idea at the time. I wrote a recursive function (after fighting with mutable container objects in Python) to go through the trop strings and build a giant JSON file shaped like this:

Well, that turned out to be 8.6 MB — way too big to download as part of a web app. A similar file that listed which prakim had which sequences was over two gigabytes. I wrote most of the UI (locally) with these two files. Thankfully, I finally realized that I could just download a 760 kB list of raw trop strings and search for sequences on demand in the browser. And that, folks, is why I’m in HCI, not real computer science. Derp.

Finally, D3 was great to work with. Being able to define a simple linear scale like this

var x = d3.scale.linear()
.domain([0, width])
.range([width, 0]);

even made it easy to work right-to-left when SVG objects have their origins in the upper left-hand corner.

Future work

I’m a grad student, so how can I resist a Future Work section? There are a number of features I’d like to add at some point. As I hinted at earlier in this post, it would be nice to be able to aggregate the bar graph by parshah instead of just perek. Combining other aggregations, like sefer, with the ability to limit sequence queries to certain parts of the text would open the door to adding the rest of the Tanakh. (The Emet books would be outta control!) And color coding disjunctive and conjunctive trop would be a nice way to see more structure in sequences. If you want to take a stab at any of these things, have a look at the issues list for this project on GitHub.

If you have a SIM card from a carrier other than your current carrier, follow these steps:

Remove your SIM card and insert the new SIM card.

Complete the setup process.

If you don’t have another SIM card you can use, follow these steps to complete the process:

Back up your iPhone.

When you have a backup, erase your iPhone.

Restore your iPhone from the backup you just made.

Wait, what? Why would I want to unlock the phone if I didn’t have a SIM from another carrier? And isn’t doing a full restore kind of a lot to ask?

As far as I can tell, here’s what’s going on: when you request an unlock from your original carrier, they don’t unlock your phone, they tell Apple’s activation server that your phone is now unlocked. In order to finish the unlock process, your phone has to check in with Apple’s activation server.

There are apparently only two ways to force an iPhone to re-activate with the server: put in a new SIM, or restore the phone. But why would you go the restore route? Because if you’re traveling abroad, when you arrive at your destination and install your newly acquired SIM, it’ll try to contact the activation server. But it can’t reach the activation server because you don’t have data service on your new carrier yet. At this point you’ll be dropped into Activate mode and won’t be able to do anything with your phone until you activate it. If you happen to be somewhere with wifi that doesn’t require any sort of web-based authentication (so, not most airports or hotels) you may be able to activate that way. Otherwise, you’ll have to use iTunes on your computer — assuming your computer can get wifi.

If you won’t be traveling with your computer, or may not have access to a non-cellular internet connection, you’ll want to do that restore at home before your trip. Otherwise, skip the restore and activate through iTunes or wifi.

A lot of discussion around my last post was about the role of sentence structure. For example, there’s a heuristic that psukim with fewer than five words don’t have an etnakhta, while those with more than five words do. This visualization lets you explore these types of relationships.

We see that, indeed, etnakhtas do approximately follow this pattern, while other trops’ counts naturally vary more linearly with word count (e.g., mapakh and pashta). Other trop, though, like tipkha, quickly hit a ceiling regardless of how long a pasuk gets.

Note that I’ve cut off the x axis at 33 words. While there are much longer psukim, there aren’t enough of them to get meaningful averages.

Click in the legend to turn a trop on and off; double-click to solo it. As with the first post, there’s nothing revolutionary here, but I think it’s still interesting to see and explore. (Also, I'm no expert on D3/NVD3, so don’t judge me too harshly. And if you’re on IE and it doesn’t work, tough luck.)

When read publicly, the Torah is often sung using a system of cantillation marks, or trop in Yiddish. There are many different cantillation marks, each of which has a name, a unique sound (or sounds), and comes in combination with other trop.

When the cycle of readings started over this year after Simchas Torah1, it seemed like there were more telisha gedolahs in Bereshit (Genesis), whereas there were more telisha ketanas in D’varim (Deuteronomy). I decided to find out whether or not this was really the case.

First, I needed a dataset. Tanach.us offers the entire Tanakh in XML form, including trop and vowels. I was only interested in the Torah, so I downloaded XML files for each of the five sfarim (books). I went through the XML and tabulated how many of each trop were present in each pasuk (sentence).

Aggregating by sefer to consider my original question about the relative frequencies of telisha gedolahs and telisha ketanas, we see that my intuition was somewhat correct: while there are more ketanas throughout, there are more overall ketanas in D’varim.

However, the ratio of telisha gedola to telisha ketana is actually not substantially different in D’varim and Bereshit. So while overall counts are higher, the relative frequencies are not so different.

Aggregating by sefer is interesting, but I wanted to see more continuous variations. Looking at a series of what for most trop would be zeros and ones, with an occasional two or three, isn’t that useful, but Zach (a Ph.D. student in Statistics) suggested a moving average, and that worked quite nicely. We used a 500-pasuk-wide window, which struck a balance between detail and low-pass filtering. (I come from a signal processing background, not time-series analysis.)

As with the initial bar graph, you can really see the number of telisha ketanas explode in D’varim. But more interestingly, we can get a sense of how they track each other through the Torah.

Seeing how different trop track each other is fun. There are some things that you’d expect. For example, munakh is often associated with katan, revi’i, and mapakh–pashta, and we see that clearly here.

Particklarly striking is the tight correlation between zarka and segol.

Although other combinations, though, like darga–tevir are more loosely correlated.

While these patterns are intuitive, the fact that trop — especially common ones like merkha and tipkha — aren’t uniformly distributed across the Torah was, to me, somewhat less expected. A big reason for this is changes in sentence structure. This becomes extremely obvious when looking at etnakhta, which essentially functions as a comma.

The reason for the rather dramatic plunge toward the beginning of B’midbar seems to be a shift in sentence structure. Checking the text, this part of the Torah contains quite a bit of genealogy, which contains many single-phrase sentences (“So-and-so begat so-and-so”2), and many occurrences of the common pasuk “וידבר ה` אל־משה לאמר”.

Oddly, I did a bit of digging into this, and it looks like a drop in words per pasuk actually lags the drop in etnakhtas. I’m not sure why.

I could imagine running a logistic regression to see whether words per pasuk predicts the presense of an etnakhta, but I’m going to cut myself off now.

If you’re interested in playing around with this yourself, everything is on GitHub. If you just want to cut to the chase, here’s a CSV file of the raw data. And here’s an IPython Notebook.

Histories of computing, computing culture, and the politics of computers often make this basic claim: the very purpose of the Personal Computer was so individuals could benefit from computation without having to rely on corporate- or government-controlled mainframes. But it’s pretty clear to me that we’ve come full circle and are over-reliant on remote servers. The founders of the PC movement would probably say that we’ve lost our way. (Lazyweb: can someone dig up a quote of Woz saying so?)

Of course it’s not black and white, but we’ve gone overboard. I see this from two directions:

Web apps. I like to give friends on Twitter a hard time because I strongly believe in native apps over web apps. This is partially because of the superior performance and user experience, but in large part it’s because there is simply no reason for most apps to not run on my computer. A word processor or text editor does not need to run on someone else’s computers (which makes me both reliant and vulnerable). Nor does it need to run in a web browser, which, twenty-some years on, is still not particularly well suited to applications. Google’s vision of dumb terminal Chromebooks takes this needless remote execution to its (il)logical extreme.

(Needless?) intermediation. In addition to needless remote code execution in an inferior UI framework and runtime environment, why should my IMs or video calls go through Google or Skype servers between me and their destination? I’m reading The Master Switch by Tim Wu, and one of his points is that the underlying architecture of the internet is helping it resist the corporate forces that have brought about consolidation in other information industries.

That architecture, it seems to me, is one in which every machine on the network can access any other machine on the network. But that no longer seems to be the case. AIM used to have a feature called Direct Connect; years before Google Docs, SubEthaEdit allowed for collaborative editing directly between computers. So why is that not how we communicate now? It was finicky, and we (rightfully) like things that Just Work™. It was finicky because most of us don’t have public/external IP addresses. That meant having to route network traffic through IP masquerading NATs, and getting a two-way route is hard. (Even Skype’s so-called P2P protocol uses “supernodes”, which are located in Skype/Microsoft datacenters, as intermediaries betweened NATed/firewalled clients.)

So because most end users do not have public IP addresses, it’s more reliable to have most traffic routed through a server. This breaks the very property of the internet that makes it so unique. Perhaps this is a natural stage of development in the network, because we’re running out of IPv4 addresses. Maybe with IPv6, everyone can have a public IP, and we’ll be able to collaboratively edit documents peer-to-peer, from my text editor to yours.