Because the rest of your life will happen in the future.

When thinking about the terrorism by white supremacists in Charlottesville yesterday, it’s easy to overlook one aspect of they’re hateful chants: Why were they also chanting anti-Jewish slogans?

It’s natural to mell this in with their other racist and xenophobic chants. Yet, the protest was organized around the removal of a statue of Robert E. Lee. What does this have to do with Jews and Judaism? A lot, apparently. According to this article in Ha’aretz, white supremacy and anti-Semitism have a long and inter-twined history.

At the root of American white supremacy is the belief that white, Christian males are being “replaced” as a majority in America by other racial ethnicities, religions and genders. This outrages white supremacists, and since that outrage needs to be directed somewhere, Jews and racial minorities are easy, visible targets.

But what does it really mean to be “replaced?” It means that white, Christian males no longer have a stranglehold on stable, good-paying jobs. An easily identified culprit is those who are seemingly now occupying these jobs, e.g. immigrants, women, Jews and other non-white, non-Christian groups. The real culprit, though, is Artificial Intelligence (AI).

AI is replacing jobs at an unprecedented rate, and will make the Industrial Revolution seem like child’s play in comparison. The best solution we have right now to disappearing jobs is a Universal Basic Income, where robots do all the mundane jobs and humans are free to either do interesting, creative work or not work at all. But as anyone who has read the exceptional writings of Calum Chace or William Hertling can tell you, in a future with billions of people well-fed and taking art classes all day, it is easy to imagine that those same people will also lack a sense of purpose, which will breed discontent and unrest (similar to large swaths of unemployment).

Seeing where this discontent could be leading, I was originally going to segue this blog post into a discussion of Jews in Europe leading up to World War II. The “smart people” got out before it got too bad to leave: Einstein, Freud, Fermi, Chagall. And thus, I was going to conclude, maybe it’s high time to flee America, as well. After all, every empire crumbles, and the people who survive are those who got out before the impending collapse.

But the AI Revolution is going to be all-encompassing. It will not leave any safe refuges that are unaffected by AI. To be sure, things are going to get worse before they get better. In America, things will almost certainly get worse for non-white, non-Christian, non-males in the immediate future. Although I sometimes feel powerless in the face of America’s current administration and seemingly hate-filled zeitgeist, it is more important than ever to remember the power that every individual yields:

Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it’s the only thing that ever has. – Margaret Mead (probably)

Sometimes, yes, when a relationship is going sour, the best solution isn’t to try and repair it, but rather make a clean exit. In this case, however, there is nowhere to go. If you’re feeling powerless and frustrated today, do something about it. 2 things I’m doing:

Even though I live in a very liberal Congressional district, I’m not too far away from a handful of toss-up, currently Republican districts. I’d bet that your state Democratic party would be happy to have your help right now.

Go to a protest event tonight. The best counterattack to a hate-filled protest is a peace-filled protest right afterwards, that is 100 times larger. You can look up local events here.

Below is my disappointing list of 16 books read in 2016 (but at least it corresponds to the year…). I felt I did a lot of reading, but I guess it didn’t amount to a lot of books. Some of them took me a while to get through. For example, The First Fifteen Lives of Harry August was by far the best book I read in 2016, but it also took a while to get through. I suppose that moving to my year primarily consisted of:

Applying to grad schools programs and attending visiting days/weekends all across the country.

Writing an MA thesis

Moving to a brand new city

Starting a new PhD program in linguistics

So it’s not that I’m without excuses. But still, I wish I had read more. I do think I read a lot of linguistics/machine learning/cogitive science papers, and wish I had tracked those better.

As always, credit for this idea goes to Robin. Her 2015 list can be found here. The only useful tool I’ve found for converting an Amazon library to a list can be found here. Surprisingly, or not, Amazon makes exporting this list tremendously difficult.

Nonetheless, here are the books I read this year, in reverse chronological order:

After Trump’s shocking victory, many of our professors began class with an opportunity for us to voice any fears or feelings we were harboring. One of my professors spoke about how studying linguistics is a way to study what unites us as humans: this strange ability called “language.” Despite all of our languages looking and sounding different, all humans have this amazing ability to learn complex rules and thousands of words in our first few years of existence. Moreover, we do this without being prodded to learn and without much explicit instruction. Language is something that should, at its core, unite us, not divide us.

Earlier this week, Google Research announced a breakthrough in “one-shot” machine translation. What this means is that Google Translate can now perform translations on unseen pairs of languages. Typically, a machine translation algorithm needs to be trained on each language pair, e.g. English <–> French and French <–> Spanish. But Google’s latest results can perform translations from, e.g., English <–> Korean by only being trained on pairs of other languages (see Google’s visual representation below). In essence, they are only training the machine on the “gist” of language or language relationships, rather than a specific pairing.

The Google team calls this “interlingua.” For linguists, this underlying abstract form has been the basis of their field since Chomsky’s earliest writings. “Deep Structure” or D-structure, is distinct from “Surface Structure,” or S-structure; where Deep Structure is something like the Platonic form, the S-structure is the concrete realization in the phonetic sounds of a sentence. For example, the sentences I love New York and New York is loved by me both have essentially the same meaning. According to Chomsky, the D-structure of both of these sentences is the same, and the deep structure is transformed in different ways en route to the different respective surface realizations.

The field of generative syntax has been primarily concerned with elucidating the rules and constraints that each and all languages undergo during this transformational process. If we can unwind these transformations, peeling back layer upon layer of surface structure, then we can uncover the deep structure underlying all of language.

And now, it’s my turn to be speculative: For the last 20 years, computational linguists have been trying to apply the rules and constraints of generative syntax to the computational tasks of natural language understanding and translation. However, rules-based accounts have been less successful than the more flexible probability-based algorithms. The result has been that many “language engineers” have become dismissive of the rules-based Chomskian community.

But if we (speculatively) assume that Google’s algorithms have uncovered an underlying interlingua, then perhaps this means that Chomsky’s notion of D-structure has been right all along, we’ve just been going about the process of uncovering it in the wrong way. Whereas generative syntacticians base most of their findings on patterns in a single language or single collection of languages, maybe the real findings lie in the space between languages, the glue that binds it all together.

Of course, the findings of many deep learning-based systems are notoriously difficult to suss apart, so we don’t really know what the features of this possible interlingua look like. While this is frustrating, I suppose it also means there is still plenty of work left for a budding computational linguist. And if we can start to elucidate the ties that linguistically bind us, maybe we can elucidate the ties that bind humanity, as well.

Obviously there is a lot of uncertainty in the polling numbers coming out of the election. Most articles on fivethirtyeight.com can be reduced to “we’re pretty sure what’s going on, but also absolutely anything could happen.” Regardless, if we take even the worst numbers for Trump, he’s somewhere around 40% of the popular vote. Taking the 2012 voter turnout as our benchmark (which may be optimistic), 40% of 129,000,000 is 51,600,000. This means there are at least 50 MILLION PEOPLE in our country who are so scared for their jobs, their culture and their identity that they would rather see a Trump presidency than any alternative.

And on Wednesday morning, no matter who wins, these 50 million people will still feel the same. They will still be scared that their way of life is slowly disappearing, and that any (real or imagined) former sense of security is gone. While I disagree with that notion, as well as the larger sentiment, we also must acknowledge that this is a very real contingent of the US population.

When I go to lectures, classes and meetings, I use real-time captioning to understand what’s being said. Sometimes I joke with my captioners about how my Natural Language Processing classes are teaching me how to automate their job, replacing manual captioning with automatic speech understanding. We laugh and move on, but really this is a microcosm of a larger reality: Technology is steadily and unflinchingly mowing down the forest that is many long-standing, safe and high-skill occupations. The people I work with and the people in Silicon Valley are giddy about all of this.

In William Hertling’s amazing series of novels about AI, the final book takes us to 2043, where humans and AI live in a precarious balance with AI doing most jobs, and most people free to do what they’d like, be it taking classes, painting or traveling. But the people still rise up in revolt against the machines, because they can’t stand a life without purpose. (Interestingly, the Dalai Lama expressed a very similar sentiment in the New York Times 2 days ago.)

So what’s the takeaway?

I don’t think we should slow down the pace of innovation, or compromise our progress in some way. Further, I don’t think the takeaway is some cliched parable about how no development has only an upside, without leaving people behind.

Rather, I think that this presents a unique opportunity for those working in tech, Silicon Valley, NLP, AI, etc. The 50 million people voting for Trump are a sign of a huge untapped market. They are a sign that technological benefits have only been targeting a specific segment of our society.

On November 9th, we need to decide how we’re going to interact with the 50 million people who voted for Trump, regardless of the outcome. Should we be kind and understanding? Sure, because it’s the right thing to do. But at the same time, let’s also acknowledge our own blindspots that they are forcefully drawing attention to, and work to reduce these. I don’t see how we can move forward as a society without both.

The interwebs are severely lacking any objective comparisons of the two major captioned (landline) telephones on the market: The ClearCaptions Ensemble and the CapTel 840i.

I’ve been using the CapTel for a few years, but the ClearCaptions Ensemble has a 90-day trial period. So I figured I had nothing to lose by trying it out.

Before I get to the captioning quality, which is admittedly the most important aspect, here are a few notes on other aspects of the phones:

Appearance/User Interface

CapTel 840i

The CapTel phone is decidedly unsexy. It is pretty old, large and clumsy. I’ve been using the 840i, which does not have a touch screen, for a few years. However, when I went to the CapTel site, I see they now have a touchscreen version, the 2400i. I may have to try that in the future.

The ClearCaptions Ensemble looks much nicer. It is also all touchscreen, except for the power button. However, the touchscreen interface is horrendous. As I said to a friend, “It has a touch interface, but you wish that it didn’t.” When dialing a number, there is no delete if you make a mistake. In addition, the dialpad changes to an awkward double row of numbers when you’ve already entered a few of the numbers you’re trying to call. In short, there is zero usability advantage in the fact that the Ensemble has a touchscreen, and usually it is less usable than the clunky CapTel 840i.

ClearCaptions Ensemble

Captioning Quality

Obviously this is the most critical aspect of a captioned phone. Below I’ve posted a video with a side-by-side comparison. I used a YouTube video of a person speaking, to ensure that the audio was identical for each trial. Go ahead and check out the video first. I apologize in advance for some of the shaky camera work. My hands were starting to get very tired (see below for an explanation).

As you can see, the speed and accuracy of the CapTel phone is superior to ClearCaptions. Not seen here is the dozen or so trials I did with the ClearCaptions phone, using a different, lower quality video that better portrayed a one-sided phone call. Most of the time, the ClearCaptions phone did not caption anything, and I had to start the call again. The CapTel phone never had any issues with the other video. (This is why my hands (and I) were getting so tired/shaky.)

Additionally, one of the aspects of the ClearCaptions phone that I was excited about is that it supposedly integrates human captioning with automatic/computer-generated captioning. This supposedly makes it faster.As a computational linguist/NLPer, this sounded great! However, as can be seen above, there is no speed or accuracy advantage. When making real calls with the ClearCaptions phone, there are many times when the automatic captions are completely incomprehensible.

Conclusion

While I love sleekness and gadgetry in my smartphone, the most important aspect of a captioned landline phone is reliability: It just has to work. The CapTel phone works faster and more consistently. That’s really all I need to know.

Early in the primary season, when Trump was just starting to gain significant votes, I said to a friend, “I genuinely don’t understand the appeal of Trump.” He replied, quoting one of our favorite movies, “Some people just want to watch the world burn.” I remembered the line from The Dark Knight but didn’t really have an appreciation of how this relates to politics, until a few months later.

As any member of the CUNY community knows, the university has problems. Lots of problems. From its wildly bloated CUNYfirst, to its recent fiasco over its budget and handling of anti-Semitism on campus, the 24-school, 500,000+ student system is in trouble. Big changes are clearly necessary. When CUNY sent the students information about organized protests over tuition increases and professor contract disputes, my initial reaction was that maybe CUNY needs some tough love. In other words, rather than fighting for every nickel and dime, maybe we should “Let CUNY Burn.” In the same way that a healthy forrest ecosystem needs occasional forrest fires to allow for new growth, maybe CUNY needed the same medicine.

But after the fire, then what?

And that’s when I finally understood the enormous (yuuuuge) disconnect between Donald Trump’s appeal and the reality of the grown-up world. For Americans who perceive their situation as hopeless, with their jobs and their culture “disappearing” without any control, a cleansing forrest fire is appealing. Unlike a forrest ecosystem, though, a government or an educational institution does not naturally regrow once the fire has abated.

Does CUNY have problems? Yes. Does it have big problems? Yes. But in the real-world, problems do get solved by fighting for every nickel and dime, and then making incremental changes along the way. It’s not sexy, and it’s not exciting. It doesn’t make for good campaign slogans.

When The West Wing was still a well-written show (Make Bartlet Great Again?), there was a perfect moment when President Bartlet was debating his opponent. The opponent had a quick, canned answer to why he would reduce taxes. President Bartlet retorts about this “10 word answer”:

What are the next ten words of your answer? Your taxes are too high? So are mine. Give me the next ten words. How are we going to do it? Give me ten after that, I’ll drop out of the race right now…”

The CUNY system is large and complex. It encompasses everything from The Graduate Center, granting top-tier PhDs, to open-admission community colleges that have to accept literally anyone with a high school diploma. There are no ten words in the English language that can fix a system that complex.

Solving a problem has two components: 0.1% is naming the problem, and 99.9% is fixing the problem. The latter is frustrating and inelegant and often without immediate rewards. But this is how businesses, colleges and governments are built.

It’s not easy for a politician to get elected for being pragmatic. But as tough as it is, we have to try, bit by bit.

In the past 6 weeks, I have interviewed or attended Open Houses at 8 different schools around the country. Don’t get me wrong, I am flattered and humbled by the positive responses I received from my PhD applications.

But: It. Was. Exhausting.

Nonetheless, it provided an opportunity to try out different captioning systems and see what captioning is like in places that are not New York City.

First off, at every school I visited, I was able to secure captioning accommodations. It’s a good lesson that as long as you’re proactive and explain exactly what you need, most schools are able to comply. Thank you to all of the administrators and coordinators who helped set this up.

That being said, all captioning is not created equal. The experience made me realize that I’ve been pretty spoiled in New York City, with a relative abundance of well-qualified captionists at my disposal. The following bullet points are largely to serve as a comparison of CART captioning and C-Print, because after extensive googling I found zero qualitative comparisons.

The first observation is not a comparison. Rather, it is a direct experience with the phenomena of “discount captioners,” as described by Mirabai Knight, one of the most proficient and strongly activist captionists I’ve used. So-called “CART firms” will troll court reporting schools for mid-level students and use them to charge cash-strapped schools extremely low rates. The result is a terrible experience for students, and a blemish on the reputation of CART captioning.

At one school, I actually pulled a professor aside as we were changing rooms and said, “I’m going to have to rely 100% on reading your lips, because I have literally no idea what the captioner is writing.” As Mirabai’s article explains, this is unfortunately all too common, as many schools do not realize that only highly-proficient, highly-trained captioners can provide a sufficient experience for deaf and hard-of-hearing students.

CART vs C-Print

Mirabai provides a bunch of great reasons why C-Print can fall short of CART captioning. I only used C-Print twice, whereas I’ve been using CART multiple times a week for the better part of 3 years. I’d strongly encourage anyone interested to check out Mirabai’s article.

Overall, C-Print was…fine. But when it comes to hearing, “fine” ≠ “adequate.”

C-Print does not advertise itself as a literal, word-for-word transcription. Rather, they only “ensure” that the most important things are transcribed. But “importance” is completely at the discretion of the captioner. There were a few occasions where I know the C-Print captioner did not transcribe words that I would consider important, such as the name of an institution where a researcher was located.

A C-Print captionist uses a QWERTY keyboard, and depends on a program where they type many abbreviations that the program expands to full words. This usually works well enough, but C-Print is definitely at least 1-2 seconds slower than CART. While 1-2 seconds may not sound like a long time, I would defy you to try having a conversation with someone where things lag 1-2 seconds behind. You’ll quickly see just how significant 1-2 seconds can be.

C-Print can be advantageous in noisy situations where an in-person captioner is not available. I used C-Print at a lunch, in an environment that definitely could not have used remote captioning. In this case, a slower, more summarizing transcription is better than a word-for-word transcription that cannot eliminate a high level of background noise.

tl;dr: C-Print captioning is an okay substitution when in-person captioning is not available. But in no way should an institution feel that providing C-Print captioning is the equivalent of providing the transcription provided by CART captioning.

[Warning: These findings are minimal and preliminary. A much more thorough analysis needs to be done, and many many more statistical tests need to be run.]

I’ve been meaning to study variation in language production, specifically on a word-by-word basis. For example, how does one typist or one population of typists produce the word giraffe versus another typist or population of typists?

The first thing to note is that pauses before a word are much longer than pauses within a word. This finding is well-established, though.

More interesting (to me, at least) is what happens at syllable boundaries. In the two compound words because and someone, the pause at the syllable boundary is more pronounced. An unpaired t-test shows a significant difference in pause times between syllable-liminal and syllable-internal pause times (p < 0.01), whereas differences between other syllable-internal pauses are not significant.

In typing research, a more pronounced pause time indicates “more cognition” is happening. There is some process, such as downloading a word into the lexical buffer, that causes a slowdown in figuring out which key to strike next. It is possible that we are observing a phenomenon where lexical retrieval occurs at the syllable level when a word is made up of multiple words, even if those words do not “compose” the compound word.

Specifically, the word someone can reasonably be decomposed into some + one. It might make sense that someone is downloaded syllable-by-syllable, and we see that delay in typing as the next word/syllable is retrieved.

More surprisingly, though, we do not think of because as being composed of be + cause, even though these are two perfectly good words. Nonetheless, we see _something_happening when the next word/syllable is retrieved.

None of these delays, though, are observed in the words people and about, although I supposed about = a + bout.

tl;dr: Something fun is going on with multisyllabic, compound words. It needs a lot more investigation, and I plan on doing just that over the holidays.

Here are the books I read in 2015, with some statistics below. I neglected to do this last year, which was disappointing. Here is my 2013 list. I got this idea completely from Robin. Here is her 2014 list [Update 12/26/16: And Robin’s 2015 list.]

The table is arranged by the order in which I read the books.

The Last Firewall

William Hertling

How We Got to Now

Steven Johnson

The Innovators

Walter Isaacson

Breakfast of Champions

Kurt Vonnegut

The Windup Girl

Paolo Bacigalupi

Measuring Up

Daniel M Koretz

The Language of Food

Dan Jurafsky

Wool

Hugh Howey

The Turing Exception

William Hertling

Sphere

Michael Crichton

Good Omens

Neil Gaiman

Seveneves: A Novel

Neal Stephenson

Neuromancer

William Gibson

The Martian

Andy Weir

Armada

Ernest Cline

Old Man's War

John Scalzi

The Fold

Peter Clines

Station Eleven

Emily St. John Mandel

We, the Drowned

Carsten Jensen

American Gods

Neil Gaiman

Modern Romance

Aziz Ansari

The New York Nobody Knows

William B. Helmreich

The Little Drummer Girl: A Novel

Le Carre

Statistics/Notes

In 2015, I read 23 books, for an average reading time of 15.9 days per book.

Of the 23 books, 17 (74%) were fiction.

Of the 17 fiction books, 5 (29%) can be classified as dystopian stories. I wonder if this is more a reflection of the overall zeitgeist, or just my own reading interests. While I’m none too happy about the current state of politics and policy, I consider myself an optimist at heart.

During 2015, I read 2 of the 4 books in William Hertling’s amazing Avogaro series, about an A.I singularity in the not too distant future. The books are engaging and well thought out. Hertling knows his technology, and doesn’t try to create a completely ridiculous/far flung singularity. Rather, the cause of the singularity is subtle and seems within reason, and the far-reaching consequences are profound and well thought out. I mention this for 2 reasons:

Read his books! He’s fun, his books are cheap and he deserves a lot more readers.

This is a good insight into how capricious the publishing industry is. I think Hertling is just as good as an Ernest Cline or Peter Clines, he just hasn’t been “discovered” yet.

This is a continuation of a series of posts exploring the process of relearning language and sound processing with my new hearing implants, Auditory Brainstem Implants. The first two posts can be found here and here. Although it’s difficult to distill my experiences down to a single theme, I am slowly realizing that a vast amount of understanding speech comes down to making useful discrimination of phonemes.

What is an phoneme, you might ask?

Great question! A phoneme is one of the most basic units of sound within phonology. The word red, for instance, consists of three distinct phonemes: /ɹ/, /ɛ/ and /d/. However, there is not always a one-to-one correspondence of letter to phoneme. For example, the word through also consists of only three phonemes, corresponding to th, r, and oo. (/θ/, /ɹ/ and /u/ in IPA).

At the most basic level, we can discriminate between two different words if there is at least one different phoneme. When only one phoneme differentiates the pronunciation of two different words, these words are known as a minimal pair. The words knit (nɪt) and gnat (/næt/) are a minimal pair, because they only differ by one phoneme, the middle vowel.

But each phoneme can actually have different variations, called allophones. For example (stolen straight from wikipedia), the /p/ phoneme is actually pronounced differently in actually pin (/pʰɪn/) versus spin (/spɪn/). Most native speakers are unaware of these variations in pronunciation, and if a different allophone is used for the same phoneme, the word will probably still be understandable, but just sound “weird.” Two different words will always differ by at least one phoneme, not by just one allophone. For the sake of this post, I’ll call discriminating between allophones “non-useful sound discrimination.”

Useful Sound Discrimination

If some sound discrimination really isn’t all that useful, then what is useful? The ability to discriminate between phonemes that have a high neighborhood density. And what is neighborhood density? From a recent paper by Susanne Gahl and colleagues: “two words are considered neighbors if they differ by deletion, insertion, or substitution of one segment” (Gahl, et al. 2012). For instance, the word tad has a bunch of phonological neighbors, such as rad, fad, dad, toad and add. The word osteoporosis, on the other hand, has no phonological neighbors.

For me, the important thing is to relearn how to discriminate between phonemes that often live in the same phonological neighborhood. This is something that normal hearing individuals do effortlessly, and our very sophisticated auditory system is an expert at differentiating between these different frequencies in a sound signal.

For my limited auditory system, consisting of an ABI that replaces tens of thousands of hair cells with a few dozen electrodes, this discrimination is a nontrivial task. This hit home for me during a therapy session in which I could not, for the life of me, differentiate between the sounds /oo/ and /mm/. For my ABI, both of these sounds activated the exact same electrode pattern.

When I am practicing phoneme discrimination, my therapist covers his mouth, so I cannot also use lipreading. When I can use lipreading, discriminating between /oo/ and /mm/ becomes easy. Moreover, /oo/ and /mm/ rarely are phonological neighbors. That is, there are very few words where /oo/ could be replaced with /mm/, and this would result in a different, intelligible word. The only exception might be an addition/deletion, such as zoo and zoom. Nonetheless, zoo and zoom are not contextual neighbors, i.e. I cannot think of a sentence where zoo and zoom could be used to fill the same slot (The rhinoceros at the zoo stole my lollipop vs *The rhinoceros at the zoom stole my lollipop).

So, am I screwed?

Probably not. What’s remarkable about human communication and information transmission is that we find ways to adapt and filter out the most critical information. For instance, Esteban Buz and Florian Jaeger (Buz & Jaeger, 2012) found that context also plays a significant role in how much or how little we articulate or hyper-articulate certain words. And as long as you’re not a “low talker,” I should be fine.