The following is an invited tribute to Dr. Norman Paskin that will appear in an upcoming Data Science Journal special issue on persistent identifiers (PIDs)…

We were shocked and saddened to learn of the death of our longtime friend and colleague Dr. Norman Paskin in March 2016.

Norman Paskin

Norman will be remembered as the thoughtful and tireless founding director of the International DOI Foundation (IDF). Some of us were fortunate to have known him in the earliest days, during those formative pre-DOI Foundation meetings when technical and political views came together to form the foundation of what the DOI is today. The early days of DOI involved many lengthy, sometimes heated, email and face-to-face discussions, and we fondly remember Norman as a sensible voice calling out in the wilderness.

Establishing sound technical foundations for the DOI was surely only a first step; the DOI’s long-term success and sustainability would depend on its widespread adoption, which in turn would require a clear message, sensible policies that would benefit a wide range of stakeholders, and constant evangelism. To the surprise of no one — except, perhaps, the man himself! — Norman Paskin was chosen in 1998 as the founding director of the IDF, and set out to spread the gospel of persistent identifiers while defining the mission of the IDF. Norman conveyed the message so well that twenty years later it is hard to imagine arguments against the DOI; indeed, its example is so compelling that in domains that can’t directly adopt the DOI, we see parallel object identifier systems emerging, modeled directly after the DOI.

A critical component of the DOI’s success is the robustness of its underlying infrastructure, the Handle System(tm), created and administered by Bob Kahn’s team at Corporation for National Research Initiatives (CNRI). Not long into the life of the DOI and IDF, it became clear that the long-term success of the DOI and other emerging object naming systems based on the Handle System would in turn depend on a well-considered set of Handle System governance policies. In order to consider the needs of a range of current and future stakeholders, a Handle System Advisory Committee (HSAC) was formed in early 2001; on the HSAC Norman naturally represented the interests of the IDF and its members, but also understood the perspectives of CNRI, then the operator of the Handle System, as well as other Handle System adopters.

It was our pleasure to work directly with Norman on DOI matters, including early technology demonstrators that we demoed at the Frankfurt Book Fair and other conferences in the late 1990s, and later mutually participating in HSAC meetings and various DOI strategy sessions. Whenever we saw each other, in New York, Oxford, Washington, London or Frankfurt, we would resume conversations, from yesterday to last year, via email or in person. To all who knew him, Norman Paskin set the standard both literally and figuratively; his friends and colleagues miss him tremendously, but he will persist in our professional memories and in our hearts.

Over the past few days there has been renewed discussion of the controversial W3C Encrypted Media Extension proposal with the publication of a revised draft. (07 Jan 2014). Today I’d like to provide a bit of background, based on my long experience in the digital rights management “game” and my familiarity with the W3C process.

Who are the players? The primary editors of the W3C EME draft are employed by Google, Microsoft and Netflix, but corporate affiliation really only speaks to one’s initial interest; W3C working groups try to work toward concensus, so we need to go deeper and see who is actually active in the formulation of the draft. Since W3C EME is a work product of the HTML Working Group, one of the W3C’s largest, the stakeholders for EME are somewhat hidden; one needs to trace the actual W3C “community” involved in the discussion. One forum appears to be the W3C Restricted Media Community Group; see also the W3C restricted media wiki and mailing list. A review of email logs and task force minutes indicates regular contributions from representatives of Google, Microsoft, Netflix, Apple, Adobe, Yandex, a few independent DRM vendors such as Verimatrix, and of course W3C. Typically these contributions are highly technical.

A bit of history: The “world” first began actively debating the W3C’s interest in DRM as embodied by the Encrypted Media Extension in Octover 2013 when online tech news outlets like Infoworld ran stories about W3C director Tim Berners-Lee’s decision move forward and the controversy around that choice. In his usual role as anti-DRM advocate, Cory Doctorow first erupted that Ocober, but the world seems to be reacting with renewed vigor now. EFF has also been quite vocal in their opposition to W3C entering into this arena. Stakeholders blogged that EME was a way to “keep the Web relevant and useful.”

The W3C first considered action in the digital rights management arena in 2001, hosting the Workshop on Digital Rights Management (22-23 January 2001, INRIA, Sophia Antipolis, France), which was very well attended by academics and industrial types including the likes of HP Labs (incl. me), Microsoft, Intel, Adobe, RealNetworks, several leading publishers, etc.; see the agenda. The decision at that time was Do Not Go There, largely because it was impossible to get the stakeholders at that time to agree on anything “open,” but also because in-browser capability was limited. Since that time there has been a considerable advancements in support for user-side rendering technologies, not to mention the evolution of Javascript and the creation of HTML5; it is clear that W3C EME is a logical, if controversial, continuation in that direction.

What is this Encrypted Media Extension? The most concise way to explain EME is, that it is an extension to HTML5’s HTMLMediaElement that enables proprietary controlled content handling schemes, including encrypted content. EME does not specify a specific content protection scheme, but instead allows for vendor-specific schemes to be “hooked” via API extensions. Or, as the editors describe it,

“This proposal allows JavaScript to select content protection mechanisms, control license/key exchange, and implement custom license management algorithms. It supports a wide range of use cases without requiring client-side modifications in each user agent for each use case. This also enables content providers to develop a single application solution for all devices. A generic stack implemented using the proposed APIs is shown below. This diagram shows an example flow: other combinations of API calls and events are possible.”

Why is EME needed? One argument is that EME allows content providers to adopt content protection schemes in ways that are more browser- and platform-independent than before. DRM has a long history of user-unfriendliness, brittle platform dependence and platform lock-in; widespread implementation could improve user experiences while given content providers and creators more choices. The dark side of course is that EME could make content protection an easier choice for providers, thereby locking down more content.

The large technology stakeholders (Google, Microsoft, Netflix and others) will likely reach a concensus that accomodates their interests, and those of stakeholders such as the content industries. It remains unclear how the interests of the greater Internet are being represented. As an early participant in the OASIS XML Rights Language Technical Committee (ca 2002) I can say these discussions are very “engineer-driven” and tend to be weighted to the task at hand — creating a technical standard — and rarely are influenced by those seeking to balance technology and public policy. With the recent addition of the MPAA to the W3C, one worries even more about how the voice individual user will be heard.

John Erickson is the Director of Web Science Operations (DirWebSciOps) with the Tetherless World Constellation at Rensselaer Polytechnic Institute, managing the delivery of large scale open government data projects that advance Semantic Web best practices. Previously, as a principal scientist at HP Labs John focused on the creation of novel information security, identification, management and collaboration technologies. As a co-founder of NetRights, LLC John was the architect of LicensIt(tm) and @ttribute(tm), the first digital rights management (DRM) technologies to facilitate dialog between content creators and users through the dynamic exchange of metadata. As a co-founder of Yankee Rights Management (YRM), John was the architect of Copyright Direct(tm), the first real-time, Internet-based service to fully automate the complex copyright permissions process for a variety of media types.

…(D)espite the insistence by the president and other senior officials that only “metadata,” such as phone numbers and email addresses, is being collected, 63% think the government is also gathering information about the content of communications – with 27% believing the government has listened to or read their phone calls and emails…Nonetheless, the public’s bottom line on government anti-terrorism surveillance is narrowly positive. The national survey by the Pew Research Center, conducted July 17-21 among 1,480 adults, finds that 50% approve of the government’s collection of telephone and internet data as part of anti-terrorism efforts, while 44% disapprove. These views are little changed from a month ago, when 48% approved and 47% disapproved.

A famous conclusion of the 9/11 Commission was that a chronic and widespread “failure of imagination” led to the United States leaving its defenses down and enabling Bin Laden’s plot to succeed. This is a bit of an easy defense, and history has shown it to not be completely true, but I think in general we do apply a kind of double-think when contemplating extreme scenarios. I think we inherently moderate our assumptions about how far our opponents might go to win and the range of methods they will consider. How we limit our creativity is complex, but it is in part fueled by how well informed we are.

The Pew results would be more interesting if the same questions had been asked before the Edward Snowden thing, because it would have created a “baseline” of sorts for how expansive our thinking was and is. What the NSA eruption has shown us is that our government is willing to collect data at a much greater scale than most people imagined. The problem lies with that word, imagined. What if we asked instead, “What is POSSIBLE?” Not “what is possible within accepted legal boundaries,” but rather “what is possible, period, given today’s technology?” For example, what if the NSA were to enlist Google’s data center architects to help them design a state-of-the-art platform?

Key lawmakers no doubt were briefed on the scale of the NSA’s programs years ago, but it is unlikely most of the legislators or their staffers were or are capable of fully appreciating what is possible with the data collected, esp. at scale. One wonders who is asking serious, informed questions about what is possible with the kind and scale of data collected? Who is evaluating the models, etc? Who is on the outside, using science to make educated guesses about what’s “inside?”

Many versions of the web science definition declare our motivation ultimately to be “…to protect the Web.” We see the urgency and the wisdom in this call as we watch corporations and governments construct massive platforms that enable them to monitor, analyze and control large swaths and facets of The Global Graph. It is incumbent upon web scientists to not simply study the Web, but to use the knowledge we gain to ensure that society understands what influences the evolution of that Web. This includes the daunting task of educating lawmakers.

Why study web science? Frankly, because most people don’t know what they’re talking about. On the issues of privacy, tracking and security, most people have no idea what is possible in terms of large-scale data collection, what can be learned by applying modern analytics to collected network traffic, and what the interplay is between technological capabilities and laws. Fewer still have a clue how to shape the policy debate based on real science, especially a science rooted in the study of the Web.

Web science as a discipline gives us hope that there will be a supply of knowledgeable — indeed, imaginative — workers able to contribute to that discussion.

Thank you for contacting me about the need to reform the Computer Fraud and Abuse Act (CFAA). I appreciate your writing to me about this pressing issue.

In my position as Chairman of the Senate Judiciary Committee, I have worked hard to update the Computer Fraud and Abuse Act in a manner that protects our personal privacy and our notions of fairness. In 2011, I included updates to this law in my Personal Data Privacy and Security Act that would make certain that purely innocuous conduct, such as violating a terms of use agreement, would not be prosecuted under the CFAA. This bill passed the Judiciary Committee on November 22, 2011, but no further action was taken in the 112th Congress. I am pleased that others in Congress have joined the effort to clarify the scope of the CFAA through proposals such as Aaron’s law. Given the many threats that Americans face in cyberspace today, I believe that updates to this law are important. I am committed to working to update this law in a way that does not criminalize innocuous computer activity.

As technologies evolve, we in Congress must keep working to ensure that laws keep pace with the technologies of today. I have made this issue a priority in the past, and will continue to push for such balanced reforms as we begin our work in the 113th Congress.

Again, thank for you contacting me, and please keep in touch.

Sincerely,

PATRICK LEAHYUnited States Senator

Thanks again for your great service to Vermont and the United States, Sen. Leahy!

Like many other civil liberties advocates, I’ve been annoyed by how the media has spilled more ink talking about Edward Snowden than the issues that he’s trying to raise. I’ve grumbled at the “Where in the World is Carmen Sandiego?” reality show and the way in which TV news glosses over the complexities that investigative journalists have tried to publish as the story unfolded. But then a friend of mine – computer scientist Nadia Heninger – flipped my thinking upside down with a simple argument: Snowden is offering the public a template for how to whistleblow; leaking information is going to be the civil disobedience of our age.

For several weeks I’ve debated with friends and colleagues over whether Mr. Snowden’s acts indeed represent civil disobedience and not some other form of protest. I’ve argued, for example, that they might not because he didn’t hang around to “face the consequences.” danah’s post provoked me to examine my views more deeply, and I sought out a more formal definition (from the Stanford Encyclopedia of Philosophy) to better frame my reflection. Based on how Mr. Snowden’s acts exhibit characteristics including conscientiousness, communication, publicity and non-violence, I do now see his whistleblowing as an example of civil disobedience.

Conscientiousness: All the evidence suggests that Mr. Snowden is serious, sincere and has acted with moral conviction. To paraphrase the Stanford Encyclopedia, he appears to have been motivated not only out of self-respect and moral consistency but also by his perception of the interests of his society.

Communication: Certainty Mr. Snowden has sought to disavow and condemn US policy as implemented by the NSA and has successfully drawn public attention to this issue; he has also clearly motivated others to question whether changes in laws and/or policies are required. The fact that he has legislators from both sides of the aisle arguing among themselves and with the Omama Administration is testimony to this. It is not clear to me what specific changes (if any) Mr. Snowden is actually seeking, and he certainly has not been actively engaged in instigating changes e.g. behind the scenes, but I don’t think this is required; his acts are clearly about effecting change by committing extreme acts of transparency.

Publicity: This is an interesting part of the argument; while e.g. Rawls and Bedau argue that civil disobedience must occur in public, openly, and with fair notice to legal authorities, Smart states what seems obvious: to provide notice in some cases gives political opponents and legal authorities the opportunity to suppress the subject’s efforts to communicate. We can safely assume that Mr. Snowden did not notify his superiors at the NSA, but his acts might be still be regarded as “open” as they were closely followed by an acknowledgment and a statement of his reasons for acting. He has not fully disclosed what other secret documents he has in is possession, but it does not appear he has anonymously released any documents, either.

Non-violence: To me this is an important feature of Mr. Snowden’s acts; as far as we know, Mr. Snowden has focusing on exposing the truth and not on violence or destruction. This is not to say that forms of protest that do result in damage property (e.g. web sites) are not civil disobedience; rather, the fact that he did not deface web sites or (to our knowledge) violate access control regimes does qualify his acts as non-violent.

I have no idea whether Mr. Snowden read Thoreau’s Civil Disobedience or even the Wikipedia article, but his acts certainly exhibit the characteristics of civil disobedience and may serve as a “template” for whistleblowers moving forward. As a technologist, my fear is that his acts also provide a “use case” for security architects, raising the bar for whistleblowers who aim to help us (in danah’s words) “critically interrogate how power is operationalized…”

…To publish LOD which is interesting for the usage beyond research projects, datasets should be specific and trustworthy (another example is the German labor law thesaurus by Wolters Kluwer). I am not saying that datasets like DBpedia are waivable. They serve as important hubs in the LOD cloud, but for non-academic projects based on LOD we need an additional layer of linked open datasets, the Trusted LOD cloud…

Due mostly (I think) to language an/or cultural barriers, the core message — or what I believe is the core message — is not coming across very well. I believe the core point is this: data published without explicit expressions of (a) provenance and (b) rights is of limited use, especially to commercial (and presumably responsible) consumers. In a “private” cloud it might be easier to make explicit assertions, but to be honest following linked data best practices it really shouldn’t be that hard to do today.

We’ve been here before: The problem of missing or ambiguous rights and provenance metadata was an issue back when content — images, audio, etc — first went online, with a notable difference: content usually has inherent utility without metadata, but data usually doesn’t. Back in the day, some of us used to talk about “copyright as an enabler,” evangelizing the idea that decorating content with useful rights metadata would be a great thing because it would facilitate communications with the people “behind” that content (Ester Dyson’s notion of Intellectual Value). Such an argument really only resonates with responsible derivative users who want to “do the right thing” w.r.t. copyrights, which is likely a tiny percentage of users and producers, and indeed is increasingly moot with the popular use of Creative Commons licenses. With published data, this becomes more critical; many types of data are simply not valid without at least an understanding of its provenance, and usually also whatever rights have been asserted by the creator.

Early on (i.e. 2009…) Leigh Dodds and several others created versions of the LOD cloud illustrating the known rights domains . I think the key argument in the ongoing “No Money in Linked Data” thread is the uncertainty imposed by unknown licensing state, which is clearly a big problem when one studies these diagrams…

My thanks to Michael Pendleton of the EPA for provoking me to write this…

UPDATED 07 Jan 2016: Since late May 2009 I have been a Linux fanboy. My initial motivation for taking the plunge was learning that I would soon be euphemized from the research arm of a major computer corporation and would be on my own later that year. I was also interested in migrating toward a more researcher-friendly environment; many of the reference implementations for radical new directions in Web technology, including and especially Linked Data, were easier to get working on either a Linux derivative or MacOS, and I was increasingly frustrated by Windoze, the official corporate platform.

I first dipped my toe in the Linux pond ten years earlier, having set up Red Hat Linux on a test machine as a platform for breaking (mostly) server-side code, but was not comfortable with it for “primetime” use. All that changed with my first evaluation of Ubuntu Jaunty Jackalope (ca. April 2009). I found the shell to be more than usable; the selection of open source code was amazing, literally every application I needed; the performance on my tired machine was a radical improvement over Windoze; and certain essential tasks that had been extremely difficult under Red Hat (esp. VPN) were now clean and easy. I “sandblasted” my main work machine and haven’t gone back. For my remaining months with Giganticorp, if I needed to execute some stodgy Windoze-only crapware I fired up Windoze on VirtualBox, ever-amazed that it actually worked.

I’ve become an Ubuntu and esp. Linux Mint evangelist among my friends. Since the Linux kernel is so much more efficient than Windoze, I show anyone who will listen how they can prolong the life, and generally decrapulate their computing experience, by sandblasting their machine and installing the most recent release of Ubuntu. I continually win converts, to my utter amazement! My ultimate “feat-of-strength” is probably sandblasting a ca. 1991 iMac G3 “Blueberry” and successfully installing Ubuntu, thus (in theory) prolonging its life.

Sadly, good things can be negatively effected by entropy. With Natty Narwhal the geniuses in charge started messing around with the shell (previously Gnome), introducing an abomination called Unity with 11.04, ultimately committing to it with Oneiric Ocelot. This is when Linux Mint sauntered by my office window; I was soon out of my chair and chasing it down the street!

I think of Mint as “a more careful release of Ubuntu, without the crap and knee-jerk changes.” For a recent feature comparison see Linux Mint vs. Ubuntu. Mint is self-described as being “conservative” with updates and being sensitive to its users, especially from the developer community. The key is that Mint uses Ubuntu’s code repositories seamlessly, so the user does not sacrifice anything by choosing Mint over Ubuntu. Currently all my machines are running Linux Mint 17.3 “Rosa” (MATE) using the MATE shell.

John’s Linux Mint customizations: Immediately after installing a new distribution of Mint I install the following “essential” applications, using either the command line or Synaptic Package Manager:

NOTE: Be sure to disconnect external monitors before installing Linux Mint on laptops. If you don’t, the installer may get confused and mess up the hardware configuration. Linux Mint handles external monitors nicely after installation.

Dries Buytaert — the original creator and project lead for the Drupal open source web publishing and collaboration platform, and president of the Drupal Association — shares his experiences on how he grew the Drupal community from just one person to over 800,000 members over the past 10 years, and, generally, how large communities evolve and how to sustain them over time.

As Dries recounts in his talk, the Drupal platform has experienced massive growth and adoption over the past decade, including significant penetration among web sites hosting open government data around the world — including the United States Data.gov site and numerous other federal government sites.

I highly recommend this talk to those interested in Drupal, in the open source ecosystem, and generally in the care and feeding of communities. I found Dries’ thoughts on the economic relationship between the platform, its developers and their level of commitment to be particularly interesting: if developers depend upon a platform for their income, they are more likely to be passionate about advancing it as loyal contributors.

Drupal seems to be more than that; there seems to be an ethic that accepts re-fractoring of the platform to keep it and the Drupal community current with new technologies, giving developers the opportunity to explore new skills. There is a fascinating symbiotic relationship between economics and advancing technology that favors adopters and contributors passionate about being on the cutting edge.

This talk “re-factored” my own thinking about Drupal, and tweaked my thinking about the open source ecosystem!

This morning on my town’s listserv a neighbor quoted an Esotonian colleague who observed (during a recent conference call),

“Internet access is a human right.”

I’m very familiar with this meme but was curious if the right to access communications infrastructure (of any kind) had any official standing.

Although the freedom to participate in communications networks is not specifically mentioned in the Universal Declaration of Human Rights, in June 2011 the UN Human Rights Council did release a report declaring the Internet to be “an indispensable tool for realizing a range of human rights, combating inequality, and accelerating development and human progress” and that “facilitating access to the Internet for all individuals, with as little restriction to online content as possible, should be a priority for all States.” See analysis here and here. You may remember that this caused headlines like “Internet access is a human right” to go around the
world; you may also remember Secretary of State Hillary Clinton’s earlier remarks regarding Internet freedom. Here is a powerful excerpt from her statement:

There are many other networks in the world. Some aid in the movement of people or resources, and some facilitate exchanges between individuals with the same work or interests. But the internet is a network that magnifies the power and potential of all others. And that’s why we believe it’s critical that its users are assured certain basic freedoms. Freedom of expression is first among them. This freedom is no longer defined solely by whether citizens can go into the town square and criticize their government without fear of retribution. Blogs, emails, social networks, and text messages have opened up new forums for exchanging ideas, and created new targets for censorship.

In reading through the UDHR I was a bit surprised that speech is mentioned only once, in the Preamble, as what seems like an aspirational goal, and never in the thirty articles. Does anyone know the history of this omission? When the UDHR was written, was actual freedom of speech too much of a hot button? And, what official status do these UN reports have?

BTW: Vint Cerf, the co-inventor (with Bob Kahn) of the Internet (and current VP at Google), opined in Jan 2012 that while access to the Internet may be an enabler of human rights, access to the Internet itself is not. As I read the UN report and Hillary Clinton’s remarks, I believe the notion of Internet-as-enabler is their larger point, and Vint Cerf is perhaps splitting hairs…

After a short tutorial period by TWC RPI staff and distinguished guests, participants will compete with each other to develop Semantic Web mashups using linked data from TWC and other sources, web APIs from Elsevier SciVerse, and visualization and other resources from around the Web.

Prizes
The contest will encompass building apps utilizing the SciVerse API and other resources in multiple categories, including Health and Life Sciences and Open classes. Overall, there will be three winners:

First place: $1500

Second place: $1000

Third place: $500

Judging
A distinguished panel of judges has assembled that includes domain experts, faculty and senior representatives from Elsevier: