Brad says that he’s not a success story. If you want to know how to make millions, thousands, or even hundreds, you should write a book about how to write succcessful books. Or vampires, he says. Instead, he’s going to give us thoughts about authorship and publishing.

1. Feudal: pre-modern, from antiquity to production publishing. It relied upon patrons who offered a living wage, and could bring interest and favor to the works. In return, the author might offer a celebration of the patron in the work. Or, Virgil who established the lineage of Emperor Augustus all the way back to the gods. Or dedicate the work. Brad points to Sterne’s dedication in Tristram Shandy. But this arrangement produces resentment: the authors feel they are the creators, but the patrons take some of the glory. (He reads a scathing letter to Chesterfield from Samuel Johnson in response to a request for a dedication, who lived on the cusp of Phase 2.)

2. The industrialization of publishing. It put the means of reproduction (Marxist pun intended by Brad) into the hands of the publishers. Thus, authors were once again dependent. This is because there’s always a super-abundance of manuscripts trying to get into the market. This selection process has of course become highly professional. “The problem is that we didn’t choose these people to be the gatekeepers…Ultimately their responsibility is to their shareholders.” This works better than the Feudal system, but the criterion is what an editor thinks will sell. (Brad points out that his work was rejected by publishers.) “The superabundance problem persists.” There are now two barriers of entry to works of fiction: Works have to come from literary agent before publishers will consider them. “If you want to be a writer, you’ll probably be better off writing for yourself and buying scratch tickets, because you won’t be as frustrated when the scratch ticket tells you that you’ve lost.”

So, he asks, is there any hope for someone like him, who thinks his works are good but who cannot get a publisher to publish them? Yes, he says, digital publishing is the hope. “We can make our works directly available to readers. We don’t need publishers any more.”

But, readers rely on publishers to winnow away at the super-abundance of manuscripts. Without publishers, “we move the slush pile to around the ankles of readers.” “We can create a ground-based, critical reader culture” in which people can publish their own reviews, accrue authority, etc. “Amazon does this a bit of course…but we can be more substantive than that.” “Everyone has the means of reproduction. So, hooray.”

So, why did it take him 11 years to publish his own work? “I’ve got all sorts of excuses…but the truth is that traditional publishing offered a better prospect for me.” First, digital reading hasn’t been as appealing. That’s obviously beginning to change. Second, publishers put their chosen works on the fast track. If you can get two people to like your work — agent and publisher — you can cut to the front of the line. So, he tried for ten years to sell his books. His agent was very good at getting flattering rejection letters from publishers. His first novel, In Defense of Cactus Kelly in the late ’90s, didn’t get a publisher. He blogged the second book — NJFTPW — and added popup multimedia. But no one came.

Time passed. Self-publishing became a more promising prospects because of the emergence of digital marketplaces where people can find what they want to read. At certain point, he decided to just publish NJTPW. He uploaded it, pressed the buttons about royalty schemes, and it’s up on Amazon. “But then there’s the super-abundance problem.”

The book is currently at #164,296 at Amazon. A couple of days ago, it was over #300,000. “It doesn’t take much to bump up your book.” “If you can use social media to overthrow an Egyptian dictator, you can probably get people to buy my book,” Brad says, adding “These are probably at comparable levels of difficulty.” He has a handful of followers at Twitter. He’s posted some ads at Facebook, and has 421 Likes. “But Likes on FB don’t translate to sales and reading of your book. Maybe they translate at a 1% rate.” Brad isn’t willing to conclude anything about the effectiveness of social media, since he is “ham-handed” in its use.

He shows his sales from the last month on Kindle, which was his worst week: 4. But in the three days he had a promo offering it for free, he had 350 downloads. The promotions get you channeled into Kindle’s promotions. During the promo, he was in the top 20 for literary fiction, along with public domain classics. He thinks he did that well in part because he has all 5-star reviews [one of which is mine].

This gets him thinking about the reader-based review culture. People do write blog posts about books, some on book sites. “Even the reviewing culture suffers from the super-abundance problem. If you want a good book blogger to review you book, you have to pitch them.” The Kirkus Indie program wants $425 to review your book. “I stand here fairly clueless…but hopeful in a general sense that we’re on the cusp of creating a situation in which publishers are not the final answer….Readers need to believe that books that are not traditionally published can still be a good book. Readers need to look outside the walled garden.” “Writers need to trust that readers will do these things.” If so, those who own printing presses won’t get decide what we get to read.

Q&A

Q: How did you pick Kindle, and not Nook, etc.?

A: It was my choice for an initial platform. You can participate in Amazon’s free promos if you commit to exclusivity to Kindle Select for 90 days. It also lets your books be lent for free to Kindle Prime program. You get paid pro rata for those loans. I am thinking about printing on demand.

Q: In the spiritual self-help area, a lot of people promote their books via their blogs. They refer to one another mutually.

Q: I appreciate your intersection of analysis and emotional experience. What you say about publishing is the same as in music. And Louis C.K. And Patton Oswald a couple of days ago gave a keynote called “A Letter to Gatekeepers,” saying that if they continue to think narrowly, they’ll kill their industry. Also, on FB you can pay to promote your post. Finally, people want to participate in things that other people are participating in. That can work for us or against us in the attention economy. Finally finally, a combination of all three of your phases: fan-funding, kickstarter.com, etc. This gets people in as patrons, and then they evangelize for you.

A: Publishers encourage you in their rejections not as a tactic to maintain hegemony, but because they’re being polite. BTW, my agent left the biz, and went back to school in anthropology.

Q: What about copyright? People can disseminate it without your knowledge. We’re looking at self-publishing because the royalties are better, but are you protected?

A: I’d take the trade in a minute. It’s not a coincidence that the first copyrights were given first to the publishers (“stationers privileges”). They wanted to avoid undercutting each other, and the Crown wanted to keep an eye on what was being published. The copyright concerns come first and foremost from publishers…

Q: Creative people are concerned also.

A: I won’t say categorically they’re not. But many of us would put it out for free, since I’m not depending on my books to make a living.

Q: [doc searls] Cluetrain is free online but still sells well. But, Brad, why not just make it freely available in an open format, and put out a tip jar? How comfortable to do you feel inside the silo that is Amazon?

A: I’m trying to understand how useful it is to have Amazon. It might be a deal with the devil.

Q: [me] How many of you here in the audience are going to buy the book? [About 5 hands go up.] Why not?

Answers: It’s not on Nook. …I’ve got too much to read…I don’t know enough about it…

Q: Publishers play an important curatorial function. I’d love to circumvent it because they look for a formula. But putting it on line isn’t enough. Where is the inter-connect?

Q: I edit an online literary magazine. Finding folks who are already reading at open mics, making a connection is great. We have gatekeepers of a sort, but they’re made up of writers and readers already in the community. Also, there are independent publishers who are not motivated by profit. Getting the novel excerpted in a journal like ours helps. Also: BestIndieLitNewEngland.org There’s something inbetween self-publishing as an individual and commercial success. There are communities.

A: Yes, my social media work was aimed at reating a community.
ti

Q: Have you tried open mic readings? Or do you need to be a published author?

A: One of the reason I write is because I do it better than I speak. A judge once told me to find a job where I write things to people, rather than talking to them, I elected to take it as a compliment. I still see myself as someone who’ll put something out and broadcast it, stand behind it. T’hat’s not getting me to where I need to be. I thought maybe I’d get NJFTPW out of the way so I could write the next thing to submit to a conventional publisher. Now I’m not sure. I’m trying throwing our more content.

Q: Your expectations of traditional publishers are overstated. Publishers often do nothing but print. Also, digital publishing has taken us to a place as bad as traditional publishing. Charlie Stross (sf writer, former sw guy) has an excellent analysis of what Amazon is doing to the market. Single publisher, single format, own their own hardware.

A: Traditional publishing has worked wonderfully for us. People can make a living as a writer. The Amazon issue is a trade-off, which I re-examine all the time. People complain that there’s too much junk at Amazon., e.g. people re-selling Wikipedia content. Rather than putting in a spam button, let people write reviews.

Q: I’m writing a book for a publisher. Even with a publisher, it’s up to the author to build a market. I’m writing a memoir of my father, a queer poet, self-published before the digital age. It was all shoe leather: printing stuff up, going to bookstores, doing readings. It was about finding community, promoting writers like himself, and putting out ideas.

A: Copyright is an incentive for people to do something creative, but I don’t think it’s anything close to the whole ball of wax. E.g., I enjoy communicating to myself — re-reading something I wrote when younger. But, more important, I want to communicate something.

“According to the 2011 Library Journal E-Book Survey, 82% of libraries currently offer access to e-books, which reflects an increase of 10 percentage points from 2010. … Libraries maintain an average of 4,350 e-book copies in a collection.”

“[T]he publisher-to-library market across all formats and all libraries (e.g., private, public, governmental, academic, research, etc.) is approximately $1.9B; of this, the market for public libraries is approximately $850M”

92% of libraries use OverDrive as their e-book dealer

Of the major publishers, only Random House allows unrestricted lending of e-books.

Rick Klau points [g+] to a feature of Google Docs spreadsheets I didn’t know about (although I’m far from a spreadsheet maven): It can automatically include a table from any HTML document accessible on the Web. It turns out it can also include the contents of lists.

It’s not the most intuitive feature. Into a cell you type:

=ImportHTML(“[URL]”,”[query]”,”[index]”)

Except you put in the HTML page’s url instead of [URL], “table” or “list” instead of [query], and which the number of the tables or list you have in mind instead of [index]. For example:

=ImportHTML(“http://www.hyperorg.com/blogger/index.html”,”list”,1)

gets the first list (ul or ol) on Joho The Blog (this page you’re reading), which turns out to be the one on the left called “Other Stuff.” If you ask for 2 instead of 1, you’ll get my blogroll.

That imports AccuWeather’s table of weather for Anchorage (where Rick is headed for vacation.)

The data updates every time you open the spreadsheet.

ImportCVS does the same for CVS data. And Kingsley Idehen explains how you can update your spreadsheet with Linked Open Data by going through SPARQL. (SPARQL lets you query a database for linked data.) (Yes, it’s over my head.)

Wouldn’t it be useful to be able to import a single element into a Google spreadsheet, even if it’s not in a list or a table? For example, suppose I want to get the headline of the first posting at, say, DailyKos.com. That element has an id of “article-1”. (I know this because I looked at the source.) So, why not let me specify the url and the id, and plop the contents into a cell in the spreadsheet? Or suppose I want the content of one particular cell of a table?

No, we’re never satisfied.

Two seconds after I pressed the “Publish” button, Rick Klau responded to my questions on the Google Plus thread where he talks about this feature. He suggests importXML for grabbing an item by its id. And to get a frozen copy of the data, copy and paste it. He also points to a post from 2007 about these features. (Oh, yeah, you can trust Joho to stay on top of the news!) In fact, that post gives an example of how to obtain the latest headline from the NYT:

Wow. An adviser has explained to the Brits that Romney better understands and appreciates the UK because Romney is Anglo-Saxon:

We are part of an Anglo-Saxon heritage, and he feels that the special relationship is special,” the adviser said of Mr Romney, adding: “The White House didn’t fully appreciate the shared history we have”.

This is as close to a casually racist remark as we’re likely to get, at least I hope. I’m finding it hard how to take it otherwise. So, maybe the adviser thought he (she?) was making a positive statement about shared heritage, the way President Clinton might have talked about feeling a special bond with Ireland because of his Irish heritage. But I think this goes beyond tone deafness. This is not a statement of warm feeling, but a negative statement that without that shared heritage, you can’t really understand the UK. It is (to me) very clearly an attempt to boost Romney while declaring Obama to be Other: Obama can’t understand America because he’s not really one of us, where the “us” means Anglo Saxons. If there’s a more charitable way of taking this and its implications, let me know.

I only wish that the first stop had been Germany so that the adviser could have talked about how to fully appreciate the shared history we have with that country, we need an Aryan president.

“‘It’s not true. If anyone said that, they weren’t reflecting the views of Governor Romney or anyone inside the campaign,’ she told CBSNews.com in an email. Saul did not comment on what specifically was not true.”

Christie Moffatt of the National Library of Medicine talks about a project collecting blogs talking about health. It began in 2011. The aim is to understand Web archiving processes and how this could be expanded. Three examples: Wheelchair Kamikaze. Butter Compartment. Doctor David’s Blog. They were able to capture them pretty well, but with links to outside, outside of scope content, and content protected by passwords, there’s a question about what it means to “capture” a blog. The project has shown the importance of test crawls, and attending to the scope, crawling frequency and duration. The big question is which blogs they capture. Doctors who cook? Surgeons who quilt? Other issues: Permissions. Monitoring when the blogs end, change focus, or move to a new url. E.g., a doctor retired and his blog changed focus to about fishing.

Terry Plum from Simmons GSLIS talks about a digital curriculum lab. It was set up to pull in students and faculty around a few different areas. They maintain a collection of open source applications for archives, museums, and digital libraries. There are a variety of teaching aids. The DCL is built into a Cultural Heritage Informatics track at Simmons.

Daniel Krech of Library of Congress works at the Repository Development Center. The RDC works with people managing collections. The RDC works on human-machine interfaces. One project involves “sets” (collections). “We’ve come up with some new and interesting ways to think about data.” They use knot, set, and hyper theory, but they also sometimes use a physical instantiation of a set — it looks like knotted yarn — to help understand some very abstract ideas.

Kelsey [Keley?]Shepherd of Amherst represents the Five College Digital Task Force. (She begins by denying that the Scooby Gang was based on the five colleges.) They don’t share a digital library but want to collaborate on digital preservation. They are creating shared guidelines for preservation-ready digital objects. They are exploring models for funding and organizational structure. And they are collaborating on implementing a trusted digital perservation repository. But each develops its own digital preservation policy.

Jefferson Baily talks about Personal Digital Archiving at the Library of Congress. He talks about the source diary for The Widwife’s Tale. That diary sat on a shelf for 200 years before being discovered as an invaluable window on the past. Often these archives are the responsibility of the record creators. The LoC therefore wants to support community archives, enthusiasts, and citizen archivists. They are out and about, promoting this. See digitalpreservation.gov

Carol Minton Morris with DuraSpace and the NDSA (National Digital Stewardship Alliance) talks about funding archiving through “hip pocket resources.” They’re looking into Kickstarter.com. Technology and publishing projects at Kickstarter have only raised $9M out of the $100M raised there; most of it goes to the arts. She points to some other microfinance sites, including IndieGoGo and DonorsChoose.org. She encourages the audience to look into microfinancing.

Kristopher Nelson from LoC Office of Strategic Initiatives talks about the National Digitial Stewardship Residency, which aims at building a community of professionals who will advance digital archiving. It wants to bridge classroom education and professional experience, and some real world experience. It will start in June 2013 with 10 residents participating in the 9 month program.

Moryma Aydelott, program specialist at LoC talks about Tackling Tangible Metadata. The LoC’s digital data is on lots of media: 300T on everything from DVDs to DAT tapes and Zip disks. Her group provides a generic workflow for dealing with this stuff — any division, any medium. They have a wheeling cart for getting at this data. They make the data available “as is.” It can be hard to figure out what type of file it is, and what application is needed to read it. Right now, it’s about getting it on the server. They’ve done about 6.5T of material, 700-800 titles, so far. But the big step forward is in training and in documenting processes.

Michael begins with a comparison to environmentalism: Stewardship of valuable resources, and long-term planning. There are cognitive challenges, and issues in providing institutional incentives. (He recommends sucking in as much data as possible, and worrying about adding the metadata later, perhaps through crowdsourcing.)

The court in Golan upheld Congress’ right to restore copyright for works published outside the US. This puts the public domain at risk, he says. He also points to the Hathi case in which they’ve been sued for decisions they made about orphan works. There is a dangerous argument being made there that if archiving occurs within the library space, fair use goes away. The legal environment is thus unstable.

Now that copyright is automatic and lasts for 70 years after the author’s death, managing the rights in order to preserve the content is fraught with difficulty.

He reminds us that making a copy to preserve the work is unlikely to have market harm to the copyright owner, and thus ought to be legal under fair use, Michael says. “You ought to have a bias toward believing you have a Fair Use right to preserve things.”

He asks: “Can the preservation community organize itself to be the voice of tomorrow’s users on issues of copyright policy and copyright estate planning?” For orphan works, copyright term shortening, exceptions to DRM rules, good practices open licensing in the long term…

And he asks: How can you get the FBs and Googles et al. to support long-term preservation? Michael suggests marking things that already in the public domain as being in the public domain. Otherwise, the public domain is invisible. And think about “springing” licenses, e.g. an open license that only goes into effect after a set time or under a particular circumstance.

Anil Dash (one of my heroes, and is also hilarious) is talking at a Library of Congress event on Digital Preservation, part of the National Digital Information Infrastructure and Preservation Program. Anil’s talk is called “Make a Copy.” (Anil is now at ThinkUp.)

Live Blogging

Getting things wrong. Making fluid talks sound choppy. Missing important points. Not running a spellpchecker. This is not a reliable report. You have been warned, people!

Anil says he’s a geek interested in the social impacts of tech on culture, govt, and more. He started Expert Labs a few years ago to enable tech to talk with policy makers. Expert Labs built ThinkUp. He wants to talk about the issues that this group or archivists confronts every day that the tech community doesn’t know about. He warns us that this means he’s starting with depressing stuff. So…

…Picture the wholesale destruction of your wedding photos, or other deeply personal mementos. They are being destroyed by an exclusive, private, ivy league club: Facebook. FB treats memories as disposable. “Maybe if I were a 25 year old billionaire, I’d think of these as disposable, too.” “The terms of service of digital social networks trumps the Constitution in terms of what people can share and consume.” Our ordinary conversations are treated as disposable, at Facebook, Twitter, Microsoft, etc. They explicitly say that they can delete all of your content at any time for any reason. “100s of millions of Americans have accepted that. That should be troubling to those of us who care about preservation.”

You can opt out, but not without compromising your career and having severe social cost. And you can’t rely upon the rest of the Web, because “there’s a war ranging against the open Web.” “The majority of time spent on the Web in the US is spent in an application,” not on pages. Yet we’re still archiving Web pages but not those applications. “They are gaslighting the Web,” Anil says, referring to the old movie. E.g., you can leave FB comments on Anil’s blog, but when you click from FB to his blog, FB gives you a warning that the site you’re going to is untrustworthy. “I don’t do that to them,” he says, even though they’ve consistently “moved the goal posts” on privacy, and he has registered his site with FB.

After blogging this, Anil got a message from a tech at FB saying that it was a bug that’s being fixed. But suppose he hadn’t blogged it, or FB had missed it? “The best case scenario is that we’re left fixing their bugs.” He adds, “That’s pretty awful, because they’re not fixing our bugs. And we’re helping them to extend their prisons over the Web.” And is the only way to get our words preserved is to agree to Twitter’s ToS so that we’ll get archived by the Library of Congress, which has been archiving tweets. Anil says that he’s conscientiously tried to archive his own works for his new baby, but it shouldn’t rely on that much effort by an individual.

And, he says, that’s just the Web, not the apps. You can’t crawl his phone and preserve his photos. And when FB buys Instagram which has a billion photos, and only 5% of the content FB has bought has been preserved…? And yet the Instagram acquisition is considered a success by the Valley. If you’re a Pharaoh, your words are preserved. Anil is worried about the rest of the conversations.

“If I were to ask you what is the most watched form of video, what would you say?” Anil guesses that it’s animated gifs. And we don’t archive them. “We’re talking about the wrong things.” We’re arguing that we should be using Ogg Vorbis, but the proprietary forms are the ones that are most used. The standards ecology is getting more complicated. “We need to reflect back to the tech community that they have an obligation to think about preservation.” They’ve got money and resources. Shouldn’t they be contributing?

We’re losing metadata, he says. You can’t find Instagram photos because they have no Web presence and are short on metadata. Flickr, on the contrary, has lots of metadata. The Instagram owners are now multi-millionaires and are undermotivated to fix this problem. Maybe we’ll get something in 5 years, but then we will have lost a full decade of people’s photos. There’s no way to assign Instagrams open licenses at this point.

Indeed, “they are bending the law to make archiving illegal.” You can’t hack your own phone. You can’t copy your own photos from one device to another.

“Content tied to devices dies when those devices become obsolete.” The obsolesence cycle is becoming faster every year.

So, what should we do?

The technologists building these devices don’t know about the work of archivists. They don’t know that what this group is doing is meaningful. Many are young and don’t yet have experiences they want to preserve. They may not have confronted their own mortality yet.

But, the Web at its base level is about making copies. So, if we get things on the Web as opposed to in apps, we win. Apps should be powered by, or connected to, a Web experience. How can we take advantage of the fact that every time you go to a Web page, you’re copying it? How can we take advantage of the CDN’s, which are already doing a lot of the work needed for preservation?

“There is also a growing class of apps that want to do the right thing.” E.g., TimeHop, that sends you an email reminding you of what you tweeted, etc., a year ago. This puts a user experience around the work of preservation. They’re marketing the value of the preservation community, but they don’t know it yet. Or Brewster, an iPhone address book that hooks up to all the address books you have on social services, reminding you to connect with people you haven’t touched in a while. This is a preservation app, although Brewster doesn’t know.

Then, how do we mine our personal archives? (He notes that his company’s tool, ThinkUp, is in this space.) His Nike fuel band captures data about his physical activity. The Quantified Self movement is looking at all sorts of data. “They too are preservationists, and they don’t know it.”

Then there are institutions. People revere the Library of Congress. Senior people at Twitter speak in a hushed voice when they say, “The tweets go to the LoC.” Take advantage of the institution’s authority. Don’t be shy. Meet them halfway. And say, “By the way, look at my cool email address.”

“PR trumps ToS.” ThinkUp archived the FB activity of the White House. At the time, FB’s ToS forbid archiving it for more than 24 hours. But the WH policy requires it. I said, “Please, FB, please cut off the White House’.” It turns out that FB was already planning on revising the policy. “What a great conversation we would have gotten to have.” You are our advocates, says Anil. You have an obligation to speak on our behalves.

The public is already violating “Intellectual Property” rules. “We don’t look at YouTube as the Million Mixers March, but that’s what it is.” It’s civil disobedience: People violating the law in public under their own names. These are people who recognize the value of preserving cultural works that otherwise would disappear. Sony won’t sell you a copy of Michael Jackson’s Thriller, but there are copies on YouTube. The heart and soul of those posting those videos is preservation. “All they want to do is what you do: make a copy of what matters to them.”

Hallelujah! For years — literally years — I’ve been limping along with a blender full of spaghetti to do something that should be really simple: capture control key combos (like CTRL-S or CTRL-I) via Javascript in all the major browsers. I finally found some simple code that seems to work beautifully.

The problem is that the browsers don’t agree about what’s going on when a user presses a control key and another key simultaneously, which is, after all, the usual thing people do with the control key. Some of the browsers think that it’s two events, so you have to record the control keypress, remember it, and treat the next keypress differently. Other browsers think of it as a single keypress that you can just process as a if CTRL-S were a unique key. Then, depending, you may or may not have to nullify the S press. The way I was doing it (cribbed from multiple sources, of course) involved first checking on which browser the Javascript was running in, and then process keystrokes, looking for an initial control press. Pain in the butt, and it was fragile.

I am certain that this is not a problem for actual developers. For example, jquery handles keystrokes, although I had trouble getting it to work (becaues, if it’s not clear, I am a ham-fisted hobbyist who mainly just copies in other people’s code. Thank you, other people!)

First, include jQuery. Place the following into your Javascript the following block. Put it toward the top, and don’t put inside a function. You want it to run whenever your Javascript loads. (Well, you could put it into an initialization function if you want.)

It’s a testament to Doc and also a hopeful sign of the times that the WSJ today features on its weekend cover a story by Doc about the theme of his new book, The Intention Economy. The title of the piece is “The Customer as a God,” a headline Doc didn’t write and isn’t entirely comfortable with. But the piece is strong. And getting it on the cover of WSJ is like getting a story about VRM on the cover of CRM Magazine. Which Doc also did.

A sample:

big business continues to believe that a free market is one in which customers get to choose their captors. Choosing among AT&T, Sprint, T-Mobile and Verizon for your new smartphone is like choosing where you’d like to live under house arrest. It’s why marketers still talk about customers as “targets” they can “acquire,” “control,” “manage” and “lock in,” as if they were cattle. And it’s why big business thinks that the best way to get personal with customers on the Internet is with “big data,” gathered by placing tracking files in people’s browsers and smartphone apps without their knowledge—so they can be stalked wherever they go, with their “experiences” on commercial websites “personalized” for them.

It is not yet clear to the perpetrators of this practice that it is actually insane…

Congrats, Doc.

The headline brings to mind the most embarrassing headline I ever found one of my articles placed under. The article was about the need for human leeway in decisions about what constitutes copyright infringement. The title Wired supplied without my knowledge (that’s how magazines work) was: “Copy protection is a crime against humanity.” I can see the pun they intended, but taken at face values, it implies I think copy protection is on a par with genocide. I of course don’t even think copy protection is a crime.

And, yes, I am aware that the title for this post is also guilty of wild overstatement. I’m assuming — no offense, Doc — that even casual readers will understand that it’s hyperbole for humorous effect. Haha.

I’m at the “Symposium on Digital Curation in the Era of Big Data” held by the Board on Research Data and Information of the National Research Council. These liveblog notes cover (in some sense — I missed some folks, and have done my usual spotty job on the rest) the morning session. (I’m keynoting in the middle of it.)

Alan Blatecky [pdf] from the National Science Foundation says science is being transformed by Big Data. [I can’t see his slides from the panel at front.] He points to the increase in the volume of data, but we haven’t paid enough attention to the longevity of the data. And, he says, some data is centralized (LHC) and some is distributed (genomics). And, our networks are unable to transport large amounts of data [see my post], making where the data is located quite significant. NSF is looking at creating data infrastructures. “Not one big cloud in the sky,” he says. Access, storage, services — how do we make that happen and keep it leading edge? We also need a “suite of policies” suitable for this new environment.

He closes by talking about the Data Web Forum, a new initiative to look at a “top-down governance approach.” He points positively to the IETF’s “rough consensus and running code.” “How do we start doing that in the data world?” How do we get a balanced representation of the community? This is not a regulatory group; everything will be open source, and progress will be through rough consensus. They’ve got some funding from gov’t groups around the world. (Check CNI.org for more info.)

Now Josh Greenberg from the Sloan Foundation. He points to the opportunities presented by aggregated Big Data: the effects on social science, on libraries, etc. But the tools aren’t keeping up with the computational power, so researchers are spending too much time mastering tools, plus it can make reproducibility and provenance trails difficult. Sloan is funding some technical approaches to increasing the trustworthiness of data, including in publishing. But Sloan knows that this is not purely a technical problem. Everyone is talking about data science. Data scientist defined: Someone who knows more about stats than most computer scientists, and can write better code than typical statisticians :) But data science needs to better understand stewardship and curation. What should the workforce look like so that the data-based research holds up over time? The same concerns apply to business decisions based on data analytics. The norms that have served librarians and archivists of physical collections now apply to the world of data. We should be looking at these issues across the boundaries of academics, science, and business. E.g., economics works now rests on data from Web businesses, US Census, etc.

[I couldn’t liveblog the next two — Michael and Myron — because I had to leave my computer on the podium. The following are poor summaries.]

Michael Stebbins, Assistant Director for Biotechnology in the Office of Science and Technology Policy in the White House, talked about the Administration’s enthusiasm for Big Data and open access. It’s great to see this degree of enthusiasm coming directly from the White House, especially since Michael is a scientist and has worked for mainstream science publishers.

Myron Gutmann, Ass’t Dir of of the National Science Foundation likewise expressed commitment to open access, and said that there would be an announcement in Spring 2013 that in some ways will respond to the recent UK and EC policies requiring the open publishing of publicly funded research.

After the break, there’s a panel.

Anne Kenney, Dir. of Cornell U. Library, talks about the new emphasis on digital curation and preservation. She traces this back at Cornell to 2006 when an E-Science task force was established. She thinks we now need to focus on e-research, not just e-science. She points to Walters and Skinners “New Roles for New Times: Digital Curation for Preservation.” When it comes to e-research, Anne points to the need for metadata stabilization, harmonizing applications, and collaboration in virtual communities. Within the humanities, she sees more focus on curation, the effect of the teaching environment, and more of a focus on scholarly products (as opposed to the focus on scholarly process, as in the scientific environment).

She points to Youngseek Kim et al. “Education for eScience Professionals“: digital curators need not just subject domain expertise but also project management and data expertise. [There’s lots of info on her slides, which I cannot begin to capture.] The report suggests an increasing focus on people-focused skills: project management, bringing communities together.

So, what are research libraries doing with this information? The Association of Research Libraries has a jobs announcements database. And Tito Sierra did a study last year analyzing 2011 job postings. He looked at 444 jobs descriptions. 7.4% of the jobs were “newly created or new to the organization.” New mgt level positions were significantly higher, while subject specialist jobs were under-represented.

Anne went through Tito’s data and found 13.5% have “digital” in the title. There were more digital humanities positions than e-science. She posts a lists of the new titles jobs are being given, and they’re digilicious. 55% of those positions call for a library science degree.

Anne concludes: It’s a growth area, with responsibilities more clearly defined in the sciences. There’s growing interest in serving the digital humanists. “Digital curation” is not common in the qualifications nomenclature. MLS or MLIS is not the only path. There’s a lot of interest in post-doctoral positions.

Margarita Gregg of the National Oceanic and Atmospheric Administration, begins by talking about challenges in the era of Big Data. They produce about 15 petabytes of data per year. It’s not just about Big Data, though. They are very concerned with data quality. They can’t preserve all versions of their datasets, and it’s important to keep track of the provenance of that data.

Margarita directs one of NOAA’s data centers that acquires, preserves, assembles, and provides access to marine data. They cannot preserve everything. They need multi-disciplinary people, and they need to figure out how to translate this data into products that people need. In terms of personnel, they need: Data miners, system architects, developers who can translate proprietary formats into open standards, and IP and Digital Rights Management experts so that credit can be given to the people generating the data. Over the next ten years, she sees computer science and information technology becoming the foundations of curation. There is no currently defined job called “digital curator” and that needs to be addressed.

Vicki Ferrini at the Lamont -Doherty Earth Observatory at Columbia University works on data management, metadata, discovery tools, educational materials, best practice guidelines for optimizing acquisition, and more. She points to the increased communication between data consumers and producers.

As data producers, the goal is scientific discovery: data acquisition, reduction, assembly, visualization, integration, and interpretation. And then you have to document the data (= metadata).

Data consumers: They want data discoverability and access. Inceasingly they are concerned with the metadata.

The goal of data providers is to provide acccess, preservation and reuse. They care about data formats, metadata standards, interoperability, the diverse needs of users. [I’ve abbreviated all these lists because I can’t type fast enough.].

At the intersection of these three domains is the data scientist. She refers to this as the “data stewardship continuum” since it spans all three. A data scientist needs to understand the entire life cycle, have domain experience, and have technical knowledge about data systems. “Metadata is key to all of this.” Skills: communication and organization, understanding the cultural aspects of the user communities, people and project management, and a balance between micro- and macro perspectives.

Challenges: Hard to find the right balance between technical skills and content knowledge. Also, data producers are slow to join the digital era. Also, it’s hard to keep up with the tech.

Andy Maltz, Dir. of Science and Technology Council of Academy of Motion Picture Arts and Sciences. AMPA is about arts and sciences, he says, not about The Business.

The Science and Technology Council was formed in 2005. They have lots of data they preserve. They’re trying to build the pipeline for next-generation movie technologists, but they’re falling behind, so they have an internship program and a curriculum initiative. He recommends we read their study The Digital Dilemma. It says that there’s no digital solution that meets film’s requirement to be archived for 100 years at a low cost. It costs $400/yr to archive a film master vs $11,000 to archive a digital master (as of 2006) because of labor costs. [Did I get that right?] He says collaboration is key.

In January they released The Digital Dilemma 2. It found that independent filmmakers, documentarians, and nonprofit audiovisual archives are loosely coupled, widely dispersed communities. This makes collaboration more difficult. The efforts are also poorly funded, and people often lack technical skills. The report recommends the next gen of digital archivists be digital natives. But the real issue is technology obsolescence. “Technology providers must take archival lifetimes into account.” Also system engineers should be taught to consider this.

Among his controversial proposals: Require higher math scores for MLS/MLIS students since they tend to score lower than average on that. Also, he says that the new generation of content creators have no curatorial awareness. Executivies and managers need to know that this is a core business function.

The next gen of executive leaders needs to understand the importance of this.

Digital curation and long-term archiving need a business case.

Q&A

Q: How about linking the monetary value of the metadata to the metadata? That would encourage the generation of metadata.

Q: Weinberger paints a picture of flexible world of flowing data, and now we’re back in the academic, scientific world where you want good data that lasts. I’m torn.

A: Margarita: We need to look how that data are being used. Maybe in some circumstances the quality of the data doesn’t matter. But there are other instances where you’re looking for the highest quality data.

A: [audience] In my industry, one person’s outtakes are another person’s director cuts.

A: Anne: In the library world, we say if a little metadata would be great, a lot of it would be great. We need to step away from trying to capture the most to capturing the most useful (since can’t capture the most). And how do you produce data in a way that’s opened up to future users, as well as being useful for its primary consumers? It’s a very interesting balance that needs to be played. Maybe short-term need is a higher thing and long-term is lower.

A: Vicki: The scientists I work with use discrete data sets, spreadsheets, etc. As we get along we’ll have new ways to check the quality of datasets so we can use the messy data as well.

Q: Citizen curation? E.g., a lot of antiques are curated by being put into people’s attics…Not sure what that might imply as model. Two parallel models?