The first, Qentis, was covered here previously. Qentis isn't actually a company. It appears to be the trolling byproduct of artist Marco Marcovici. The "company" claims to be algorithmically generating millions of photos and pages of text at a rate that will soon see it creating copyrighted material faster than the creators themselves. At some point, Qentis will hold the copyright of everything that can possibly be created, making every new creation instantly infringing.

Never mind the fact that no one has the computing power to generate photos and text at the rate Qentis is claiming it can, or the fact that algorithmically banging out creative works in advance of others doesn't make independent creations automatically infringing. Never mind pretty much all of it because the claims are so blatantly false as to be laughable, especially considering the source.

On the other hand, Cloem's business model seems a bit more grounded in reality. VentureBeat describes Cloem -- and its aims -- this way:

[A] company that provides software (not satirically, it appears) to linguistically manipulate a seed set of a client’s patent claims by, for example, substituting in synonyms or reordering steps in a process, thereby generating tens of thousands of potentially patentable inventions.

Cloem describes its team as a mixture of patent experts and "computer linguistic specialists." The key element of its potentially-patentable variations lies within "seed lists," which draw from a variety of sources, including (according to Cloem) "70,000,000 patent documents." Its algorithms then brute force together lists of "new" patent claims, which can then be filed and used offensively or defensively.

Cloem's business model seems custom-built for patent trolls, who will be able to "expand" their already-broad patents to nail down even more IP turf. Cloem's service also makes it easy for non-inventors to jam up patent offices with me-too "inventions" based on minor iterations of existing patents. While there's a good chance some of these will be tossed due to prior art, more than a few will inevitably make their way past examiners. With millions of patents just waiting to be iterated into "new" methods, Cloem's service further separates "inventing" from "invention."

It's a system that's built for abuse, but Cloem doesn't see it that way. In response to a somewhat critical post at RatioIP, Cloem's rep offers up the defense of "Hey, we just make the tool. We can't control how it's used."

In our view, Cloem is a logical and natural evolution of the patent system. The technology in itself is neutral. Like a tool, we can use it in many ways, both offensive and defensive. It may well be that we could help to “raise the bar” and get rid of undue patents. Some see our system as an embodiment of the “skilled person” (i.e. which indicates what “routine work” can produce and reach), although we do think that cloem texts can be inventive, that is not excluded from patentability.

And that's mostly true. Entities wishing to protect their prior inventions could "fence off" adjacent territory and deter future lawsuits by producing and filing very closely-related patents. But a tool like this -- if it creates anything patentable at all -- will always be more attractive to the "offensive" side of the equation.

Cloem's pitch sets the company at the forefront of an IP revolution, but its envisioned future is no more heartening than Qentis' dystopian, IP-generating machines of loving grace. At least Qentis is a joke. Cloem's taglines only read like jokes.

With Cloem, you can invent more, faster and cheaper.

Except there's no "invention" taking place. Nothing generated by Cloem's algorithms will be any more "inventive" than all the re-skins and palette swaps clogging up the "Games" section in mobile app stores. Cloem hopes to bridge the gap between its "silos of knowledge" and its silos of synonyms, somehow coming up with worthwhile patents in the process. Sure, previous knowledge always informs new creations, but it takes more than swapping the sentence "a plurality of discrete content items arranged chronologically" around in the method description to generate inventive, worthwhile patents.

Permalink | Comments | Email This Story
]]>a-mass-transit-vehicle-for-abusehttps://www.techdirt.com/comment_rss.php?sid=20150220/08414930087Tue, 17 Mar 2015 17:00:00 PDTDailyDirt: Will Computers Have 20/20 Vision?Michael Hohttps://www.techdirt.com/articles/20100803/10502110475/dailydirt-will-computers-have-2020-vision.shtml
https://www.techdirt.com/articles/20100803/10502110475/dailydirt-will-computers-have-2020-vision.shtmlcomputer vision is still very different from how humans look at images. Computers aren't capable of describing an image as well as a typical 5-year-old, but they can sift through millions of images before a kid can blink. Here are just a few examples of algorithms getting better at seeing the same things that we see.

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20100929/10154211224Wed, 14 Jan 2015 17:00:00 PSTDailyDirt: Computers Like To Sit In Front Of Computers And Play Games All Day, TooMichael Hohttps://www.techdirt.com/articles/20100708/11123510129/dailydirt-computers-like-to-sit-front-computers-play-games-all-day-too.shtml
https://www.techdirt.com/articles/20100708/11123510129/dailydirt-computers-like-to-sit-front-computers-play-games-all-day-too.shtmlown games. Games like Connect Four and Checkers are already solved, and while we humans might like to point out that there are games like Othello, Go, Diplomacy and Calvinball that still favor human players, it may only be a matter of time before computers outwit us at those games, too. Check out a few more games that algorithms are learning to play better than human brains.

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20100708/11123510129Wed, 10 Dec 2014 09:38:00 PSTNo, Tech Companies Can't Easily Create A 'ContentID' For Harassment, And It Would Be A Disaster If They DidMike Masnickhttps://www.techdirt.com/articles/20141209/03232129366/no-tech-companies-cant-easily-create-contentid-harassment-it-would-be-disaster-if-they-did.shtml
https://www.techdirt.com/articles/20141209/03232129366/no-tech-companies-cant-easily-create-contentid-harassment-it-would-be-disaster-if-they-did.shtmltech companies could "end online harassment" and that they could do it "tomorrow" if they just had the will to do so. How? Well, Valenti claims, by just making a "ContentID for harassment."

If Twitter, Facebook or Google wanted to stop their users from receiving online harassment, they could do it tomorrow.

When money is on the line, internet companies somehow magically find ways to remove content and block repeat offenders. For instance, YouTube already runs a sophisticated Content ID program dedicated to scanning uploaded videos for copyrighted material and taking them down quickly – just try to bootleg music videos or watch unofficial versions of Daily Show clips and see how quickly they get taken down. But a look at the comments under any video and it’s clear there’s no real screening system for even the most abusive language.

If these companies are so willing to protect intellectual property, why not protect the people using your services?

See? Just like that. Snap your fingers and boom, harassment goes away. Except, no, it doesn't. Sarah Jeong has put together a fantastic response to Valenti's magical tech thinking, pointing out that ContentID doesn't work well and that harassment is different anyway. As she notes, the only reason ContentID "works" at all (and we use the term "works" loosely) is because it's a pure fingerprinting algorithm, matching content against a database of claimed copyright-covered material. That's very different than sorting out "harassment" which involves a series of subjective determinations.

Furthermore, Jeong goes into great detail about how ContentID isn't even particularly good on the copyright front, as we've highlighted for years. It creates both Type I and Type II errors: pulling down plenty of content that isn't infringing, and still letting through plenty of content that is. Add in an even more difficult task of determining "harassment" which is much less identifiable than probable copyright infringement, and you would undoubtedly increase both types of errors to a hilarious degree -- likely shutting down many perfectly legitimate conversations, while doing little to stop actual harassment.

The more aggressive the tool, the greater the chance it will filter out communications that aren’t harassing — particularly, communications one wishes to receive. You can see this in the false positives flagged by systems like Content ID. For example, there’s the time that Content ID took down a video with birds chirping in the background, because it matched an avant-garde song that also had some birds chirping in the background. Or the time NASA’s official clips of a Mars landing got taken down by a news agency. Or the time a livestream was cut off because people began singing "Happy Birthday." Or when a live airing on UStream of the Hugo Awards was interrupted mid-broadcast as the awards ceremony aired clips from Doctor Who and other shows nominated for Hugo Awards.

In the latter case, UStream used something similar but not quite the same as Content ID—one in which blind algorithms automatically censored copyrighted content without the more sophisticated appeals process that YouTube has in place. Robots are not smart; they cannot sense context and meaning. Yet YouTube’s appeals system wouldn’t translate well to anti-harassment tools. What good is a system where you must report each and every instance of harassment and then follow through in a back-and-forth appeals system?

None of this is to suggest that harassment online isn't a serious problem. It is. And it's also possible that some enterprising folks may figure out some interesting, unique and compelling ways of dealing with it, sometimes via technological assistance. But this sort of "magic bullet" thinking is as dangerous as it is ridiculous -- because it often leads to reframing the debate, sometimes to the point of shifting the actual liability of the issue from those actually responsible (whether copyright infringers or harassers) to intermediaries who are providing a platform for communication.

The idea that tech companies "don't care enough" about harassment (or, for that matter, infringement) to do the "simple things" to stop it are arguments of ignorance. If there were some magical silver bullet to make online communications platforms more welcoming and accommodating to all, that would be a huge selling point, and one that many would immediately embrace. But the reality is that some social challenges are problems that can't just be solved with a dollop of javascript, and pretending otherwise is a dangerous distraction that only leads to misplaced attacks, without taking on the underlying problems.

Permalink | Comments | Email This Story
]]>not-how-it-workshttps://www.techdirt.com/comment_rss.php?sid=20141209/03232129366Tue, 19 Aug 2014 12:15:42 PDTFacebook To Ruin Our Good Time With 'Satire' Disclaimer; The Onion Responds With SatireTimothy Geignerhttps://www.techdirt.com/articles/20140818/11570528243/facebook-to-ruin-our-good-time-with-satire-disclaimer-onion-responds-with-satire.shtml
https://www.techdirt.com/articles/20140818/11570528243/facebook-to-ruin-our-good-time-with-satire-disclaimer-onion-responds-with-satire.shtml
Satire: some people just don't get it. More specifically, some folks out there don't have the capacity to read what is an obviously satirical news piece and/or headline and recognize it as such. You all know what I'm talking about: you jump on Facebook and see an article shared by a "friend" that contains the headline, "Barack Obama Admits To Being A Muslim Terrorist Puppy-Puncher" and the accompanying "I told you so!" commentary from your friend sends you into a snigger as you see that it's a link to The Onion, Clickhole, or Infowars. You know, sites that are clearly filled with joke articles that nobody in their right minds would believe. This is one of the great joys of Facebook and social media in general: watching your friends fall for bullshit. In fact, I'm pretty sure that's what Facebook is for.

We can only assume this was implemented as a reaction to users believing that Onion links are nonfiction reports (you can lose hours flipping through Literally Unbelievable, a site that catalogs such boneheaded moments), but we're not sure what compelled Facebook to go so far as to assert editorial control. What's more confusing is this limited implementation, which itself takes a while to explain. Original posts on friends' feeds and The Onion's official Facebook page don't come with a tag. If users save the article to a read-later list, the tag will vanish as well. And other satiric sites, particularly The Onion's newest sibling site, Buzzfeed-spoof Clickhole, are immune to the tag.

Forget confusing, this is yet another inch down the slippery slope in the war on humor and me-getting-to-make-fun-of-people, and I won't stand for it, damn it. People I haven't seen since high school getting fooled by The Onion has been one of the great pleasures in my life and it's just not right for Facebook to chip away at that fun just because it appears to have finally acknowledged that its users are, by and large, idiots.

DOYLESTOWN, PA—Describing him as frequently frustrated and overwhelmed, sources confirmed Monday that local Facebook user Michael Huffman is incredibly stupid. “I need stuff easy,” said the absolute dipshit, adding that he finds many things confusing, and that those things must be changed so that they make sense to him. “I like looking at things on Facebook, but I don’t understand a lot. Help, please.” At press time, someone had reportedly fixed everything for the goddamn imbecile.

Funny, but here's an idea. Instead of ruining everyone's righteous good time by tagging satire articles for people, how about instead we work on some kind of integration between Facebook and Snopes? That would be twice as useful and none of the nonsense I regularly combat with Snopes on Facebook makes me laugh, so no harm no foul. Guys? Yes?

Permalink | Comments | Email This Story
]]>peeling-away-the-layershttps://www.techdirt.com/comment_rss.php?sid=20140818/11570528243Fri, 15 Aug 2014 10:21:08 PDTFree Speech, Filters, Algorithms & Net Neutrality: How Big Company Nudging Can Influence Your World ViewMike Masnickhttps://www.techdirt.com/articles/20140814/12493628214/free-speech-filters-algorithms-how-big-company-nudging-can-influence-your-world-view.shtml
https://www.techdirt.com/articles/20140814/12493628214/free-speech-filters-algorithms-how-big-company-nudging-can-influence-your-world-view.shtmlalgorithmic filtering plays a role in how we view the world -- with a specific focus on what's happening in Ferguson, Missouri. As more than a few people have pointed out, much of the public discussion about the mess in Ferguson was happening on Twitter -- while it seemed eerily absent from Facebook (and the mainstream media at first...):

And then I switched to non net-neutral Internet to see what was up. I mostly have a similar a composition of friends on Facebook as I do on Twitter.

Nada, zip, nada.

No Ferguson on Facebook last night. I scrolled. Refreshed.

She notes that eventually the story did break through on Facebook, but not until the next morning when Facebook's algorithm finally caught up to the idea that something important was happening.

This morning, though, my Facebook feed is also very heavily dominated by discussion of Ferguson. Many of those posts seem to have been written last night, but I didn’t see them then. Overnight, “edgerank” –or whatever Facebook’s filtering algorithm is called now — seems to have bubbled them up, probably as people engaged them more.

But, as she notes, it's entirely possible that Facebook's algorithm wouldn't have ever found it important if the story wasn't gaining more and more attention on Twitter. And, of course, even as the story was being told on Twitter, there are questions about whether or not Twitter's algorithms suppressed some of it as well. "#Ferguson" only very briefly trended nationally, though it did trend in certain local markets.

So, there were fewer chances for people not already following the news to see it on their “trending” bar. Why? Almost certainly because there was already national, simmering discussion for many days and Twitter’s trending algorithm (said to be based on a method called “term frequency inverse document frequency”) rewards spikes… So, as people in localities who had not been talking a lot about Ferguson started to mention it, it trended there though the national build-up in the last five days penalized Ferguson.

As she points out: Algorithms have consequences.

This is not unlike Eli Pariser's idea of the "filter bubble" and the idea that companies may be effectively nudging you in ways that may not actually be that great. Frankly, that argument is a little strained, since it suggests that everyone only lives within these bubbles, and doesn't do things that exposes them further, but there is a valid point at the core of it worth exploring.

Tufekci notes, however, that this is also why net neutrality is so important. Because without it, not only do you have to worry about internet services determining what's important to you, but also the broadband infrastructure as well. And both will be focused on what enables them to profit the most. She points out the example of locals live-streaming what the police in Ferguson were doing -- including when the police announced over loudspeakers to "turn off their cameras" (a fairly chilling request in its own right). And she ponders what happens to those live streams on a non-neutral network:

But I’m not quite sure that without the neutral side of the Internet—the livestreams whose “packets” were fast as commercial, corporate and moneyed speech that travels on our networks, Twitter feeds which are not determined by an opaque corporate algorithms but my own choices,—we’d be having this conversation.

Obviously, there are lots of other issues at play in Ferguson that go well beyond the internet and things like net neutrality. But they are related. The discussion of those issues -- race, police brutality, police militarization, free speech, etc. -- are all enabled and enhanced by the issues of the internet and what it enables... and what it stifles. If the police could have kept this story from getting attention, it's likely that (1) there would have been even more abuse and (2) that all of those other discussions wouldn't be happening. Who knows if many of those discussions will be able to create real change, but you at least need to have that discussion to start the process of change. And if the technology is getting in the way of that, through non-neutral networks or algorithms that ignore important events like this, it seems like a problem worth solving, if only to speed up all those other important conversations as well.

Permalink | Comments | Email This Story
]]>which-way-should-we-nudgehttps://www.techdirt.com/comment_rss.php?sid=20140814/12493628214Thu, 14 Aug 2014 17:00:00 PDTDailyDirt: Computers Are Really Good At Math, So When Will Shalosh B. Ekhad Get Tenure?Michael Hohttps://www.techdirt.com/articles/20100527/0930259602/dailydirt-computers-are-really-good-math-so-when-will-shalosh-b-ekhad-get-tenure.shtml
https://www.techdirt.com/articles/20100527/0930259602/dailydirt-computers-are-really-good-math-so-when-will-shalosh-b-ekhad-get-tenure.shtml

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20100527/0930259602Mon, 4 Aug 2014 17:00:00 PDTDailyDirt: Waiting In Line Isn't FunMichael Hohttps://www.techdirt.com/articles/20100705/14594810069/dailydirt-waiting-line-isnt-fun.shtml
https://www.techdirt.com/articles/20100705/14594810069/dailydirt-waiting-line-isnt-fun.shtmlRetailers of all kinds are interested in this kind of math because it can improve customer satisfaction and get more products out the door. Apple reduces long cashier lines with employees who can accept payments anywhere in its stores. Fry's Electronics has the giant single line that feeds into a massive array of cashiers (aka the serpentine line). There are self-checkout lanes at the grocery store, but there's no silver bullet to eliminate waiting in lines. Here are just a few more links on this problem of civilization.

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20100705/14594810069Fri, 18 Jul 2014 07:52:35 PDTLatest CAFC Ruling Suggests A Whole Lot Of Software Patents Are Likely InvalidMike Masnickhttps://www.techdirt.com/articles/20140717/14503027923/latest-cafc-ruling-suggests-whole-lot-software-patents-are-likely-invalid.shtml
https://www.techdirt.com/articles/20140717/14503027923/latest-cafc-ruling-suggests-whole-lot-software-patents-are-likely-invalid.shtmlruling last month in the Alice v. CLS Bank case, there has been some question about how the lower courts would now look at software patents. As we noted, the Supreme Court's ruling would seem to technically invalidate nearly all software patents by basically saying that if a patent "does no more than require a generic computer to perform generic computer functions" then it's no longer patentable. But that, of course, is basically all that software does. Still, the Supreme Court's ruling also insisted that plenty of software was still patentable, but it didn't give any actual examples.

Now in the first post-Alice ruling on a software patent at CAFC (the appeals court that handles all patent cases, and which is infamous for massively expanding the patentability of software over the years), the court has smacked down a patent held by one of the many (many, many) shell companies of patent trolling giant Acacia. The shell, Digitech Image Technologies, got control of US Patent 6,128,415, which had originally held by Polaroid. The patent supposedly describes a setup for making sure images are consistent on a variety of different devices. Acacia/Digitech did what patent trolls do and basically sued a ton of companies, including NewEgg, Overstock, Xerox, Toshiba, Fujifilm and more.

A lower court kicked out the patent, and now CAFC has upheld that ruling, making use of the Alice ruling to make it doubly clear this isn't patentable. The court doesn't waste too much time, as the ruling is quite short. The key bits:

There is no dispute that the asserted method claims
describe a process. Claims that fall within one of the four
subject matter categories may nevertheless be ineligible if
they encompass laws of nature, physical phenomena, or
abstract ideas..... The Supreme Court recently reaffirmed that fundamental concepts, by themselves, are ineligible abstract ideas. Alice
Corp. v. CLS Bank... In determining whether a process
claim recites an abstract idea, we must examine the claim
as a whole, keeping in mind that an invention is not
ineligible just because it relies upon a law of nature or
mathematical algorithm. As noted by the Supreme Court,
“an application of a law of nature or mathematical formula to a known structure or process may well be deserving
of patent protection.” ... A claim may be eligible if it includes additional inventive features such that the claim scope does
not solely capture the abstract idea.... But a claim reciting an abstract idea
does not become eligible “merely by adding the words
‘apply it.’”

The method in the ’415 patent claims an abstract idea
because it describes a process of organizing information
through mathematical correlations and is not tied to a
specific structure or machine.

[... Discussion of specific claim in the patent ...]

The above claim recites a
process of taking two data sets and combining them into a
single data set, the device profile. The two data sets are
generated by taking existing information—i.e., measured
chromatic stimuli, spatial stimuli, and device response
characteristic functions—and organizing this information
into a new form. The above claim thus recites an ineligible abstract process of gathering and combining data that
does not require input from a physical device. As discussed above, the two data sets and the resulting device
profile are ineligible subject matter. Without additional
limitations, a process that employs mathematical algorithms to manipulate existing information to generate
additional information is not patent eligible. “If a claim is
directed essentially to a method of calculating, using a
mathematical formula, even if the solution is for a specific
purpose, the claimed method is nonstatutory.”

Consider Google's famous PageRank patent, which covers the algorithm at the heart of Google's search engine. In the language of the Federal Circuit, it claims the use of "mathematical algorithms" (involving eigenvectors) to "manipulate existing information" (a list of links between web pages) to "generate additional information" (a ranking of the pages).

The number of software patents out there that use algorithms to manipulate existing information to generate additional information is... rather large. And they may all be invalid.

Bell Labs is working on single pixel, lensless cameras. The technique used here is called "compressive sensing" and relies on a randomized array of apertures to collect multiple snapshots that can re-create a high-resolution image. The applications aren't exactly obvious, but perhaps astronomers or photographers of slow-moving subjects would be interested. [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

About half of all the edits on Wikipedia are made by bots. Algorithms keep spam links from flooding the site, and they also create whole entries based on online data, as well as perform tedious tasks such as grammar and spelling corrections. Not surprisingly, the biggest bot job on Wikipedia is detecting vandalism. [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20100415/1606279033Tue, 25 Mar 2014 17:00:00 PDTDailyDirt: Making Robot MusiciansMichael Hohttps://www.techdirt.com/articles/20100331/1408148815/dailydirt-making-robot-musicians.shtml
https://www.techdirt.com/articles/20100331/1408148815/dailydirt-making-robot-musicians.shtmlevil robots killing off music anymore, but as more and more technology gets into the field of music, there could be a new wave of neo-Luddite musicians. Software can compose music, and robots can play some musical instruments. What's left for humans to do? Check out some of these robot musicians, and you'll see why human musicians aren't that worried about losing their jobs to robots any time soon.

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20091110/0132336871Tue, 15 Oct 2013 17:00:00 PDTDailyDirt: Faster Than A Speeding Bullet...Michael Hohttps://www.techdirt.com/articles/20110815/04302615527/dailydirt-faster-than-speeding-bullet.shtml
https://www.techdirt.com/articles/20110815/04302615527/dailydirt-faster-than-speeding-bullet.shtmlflash crash" caused the stock market to plunge for a few minutes, and the SEC published a report on its findings of what happened on that day, but there may be a lot more market instability caused by machines -- and we're only started to recognize the implications. Checkmate, humans!

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20110620/04202914753Thu, 9 May 2013 17:00:00 PDTDailyDirt: Can Computers Grade Written Essays?Michael Hohttps://www.techdirt.com/articles/20110226/12421713271/dailydirt-can-computers-grade-written-essays.shtml
https://www.techdirt.com/articles/20110226/12421713271/dailydirt-can-computers-grade-written-essays.shtmlcalls for automated grading software from various organizations (like the Hewlett Foundation).
But at the same time, the National Council of Teachers of English argues that computers simply can't grade essays. Here are just a few more links on this debate over the use of algorithms over English professors (or grad students).

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20110226/12421713271Wed, 3 Apr 2013 00:03:13 PDTProgramming The News: The Future Of Reporting Is AlgorithmsTim Cushinghttps://www.techdirt.com/articles/20130331/21015322519/programming-news-future-reporting-is-algorithms.shtml
https://www.techdirt.com/articles/20130331/21015322519/programming-news-future-reporting-is-algorithms.shtml
This may seem like the sort of statement usually delivered by an overblown narrator as rockets and lasers go zooming* by, but here goes: In the world of journalism, the future is now! Granted, it's the kind of future that often makes waves in the present and raises at least as many questions as it answers, but if you wanted a bright, problem-free future, you'd have to travel back to the divergence point somewhere between Philip K. Dick and The Jetsons... and then eliminate the dystopians.

*Yes, I realize lasers don't make noise or "zoom" by, but that hasn't prevented George Lucas from becoming insanely rich, has it?

Journalist Ken Schwencke has occasionally awakened in the morning to find his byline atop a news story he didn’t write.

No, it’s not that his employer, The Los Angeles Times, is accidentally putting his name atop other writers’ articles. Instead, it’s a reflection that Schwencke, digital editor at the respected U.S. newspaper, wrote an algorithm — that then wrote the story for him.

Instead of personally composing the pieces, Schwencke developed a set of step-by-step instructions that can take a stream of data — this particular algorithm works with earthquake statistics, since he lives in California — compile the data into a pre-determined structure, then format it for publication.

His fingers never have to touch a keyboard; he doesn’t have to look at a computer screen. He can be sleeping soundly when the story writes itself.

This isn't exactly new news. (Then again, neither is the morning paper, but that's a discussion for another time...) Algorithmic story generation has been around for a few years now, with Narrative Science leading the field. A couple of years ago, Narrative Science was the story, rather than just the automated recap. George Washington University's website had covered a GWU baseball game with a longish recap that only got around to mentioning the opposing pitcher's perfect game in the seventh (out of eight) paragraph. Speculators wondered if a bot was behind this "ignoring the forest for the trees" recap. Narrative Science's techies were highly offended and responded by producing two algorithmically-generated recaps -- one from the home team POV and a more neutral piece.

The first concern with robo-journalism is often expressed by the journalists themselves: are we getting pushed out?

This robonews tsunami, he insists, will not wash away the remaining human reporters who still collect paychecks. Instead the universe of newswriting will expand dramatically, as computers mine vast troves of data to produce ultracheap, totally readable accounts of events, trends, and developments that no journalist is currently covering.

This is somewhat echoed by L.A. Times reporter Schwencke, who sees the algorithmic output as a boon for busy journalists.

Schwencke says the use of algorithms on routine news tasks frees up professional reporters to make phone calls, do actual interviews, or dig through sophisticated reports and complex data, instead of compiling basic information such as dates, times and locations.

“It lightens the load for everybody involved,” he said.

Schwenke's "bot" is rather simple, functioning best with a limited dataset and a minimum of formatting. Narrative Science's output is a bit more complex, allowing customers to adjust the "slant" of the generated stories. Not only that, but the software can cop an attitude, if requested.

The Narrative Science team also lets clients customize the tone of the stories. “You can get anything, from something that sounds like a breathless financial reporter screaming from a trading floor to a dry sell-side researcher pedantically walking you through it,” says Jonathan Morris, COO of a financial analysis firm called Data Explorers, which set up a securities newswire using Narrative Science technology. (Morris ordered up the tone of a well-educated, straightforward financial newswire journalist.) Other clients favor bloggy snarkiness. “It’s no more difficult to write an irreverent story than it is to write a straightforward, AP-style story,” says Larry Adams, Narrative Science’s VP of product. “We could cover the stock market in the style of Mike Royko.”

This leads to the ethical quandary presented by the use of bots. Is robo-generated journalism really journalism, and is the use of algorithms a betrayal of readers' trust, especially when a familiar name is on the byline? If factual errors are discovered, does the blame lie with the software, or with the journalist who agreed to let the article "write itself?"

The answer here isn't simple (and the question likely isn't even fully formed yet), but the key is transparency.

“People are already reading automated data reports that come to them, and they don’t think anything of it,” said Ben Welsh, a colleague of Schwencke’s at the Times.

Welsh says that responsibility for accuracy falls where it always has: with publications, and with individual journalists.

“The key thing is just to be honest and transparent with your readers, like always,” he said. “I think that whether you write the code that writes the news or you write it yourself, the rules are still the same.”

“You need to respect your reader. You need to be transparent with them, you need to be as truthful as you can… all the fundamentals of journalism just remain the same.”

Questions involving intellectual property are also raised, although they aren't discussed in these articles. Who holds the copyright on the generated articles? In Schwencke's case, these rights are likely retained by the L.A. Times. In the case of Narrative Science, it's probably defined by contractual terms with the end user. Once the contract is up, the generated articles' copyright reverts to the end user.

Schwencke's homebrewed algorithm is a different IP animal. If he switches papers, does he retain the right to the "bot?" Or is that algorithm, developed while employed with the L.A. Times, considered a "work for hire," and thus, the paper's property? Arguably, his algorithm is an extension of him, covering his area of expertise and designed to emulate his reporting. What if Schwencke generates a similar piece of software for his new employer? Would he be permitted to do this, or would this be prevented by additions to "non-compete" clauses? Is it patentable?

The more ubiquitous "robo-journalism" becomes, the more issues like these will arise. Hopefully, IP turf wars will remain at a minimum, allowing for the expansion of this promising addition to the journalist's toolset. With bots handling basic reporting, journalists should be freed up to pursue the sort of journalism you can't expect an algorithm to handle -- longform, investigative, etc. This is good news for readers, even if they may find themselves a little unnerved (at first) by the journalistic uncanny valley.

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20101111/18110611828Tue, 16 Oct 2012 17:00:00 PDTDailyDirt: Looking For Love In Some Of The Wrong PlacesMichael Hohttps://www.techdirt.com/articles/20100711/21351810161/dailydirt-looking-love-some-wrong-places.shtml
https://www.techdirt.com/articles/20100711/21351810161/dailydirt-looking-love-some-wrong-places.shtmlcounts, we're on the third iteration of improvement for internet dating. So that means we should be pretty close to perfecting these services, right? (Third time's the charm?) Matching algorithms will probably get better and better with time, but then so will expectations. Here are just a few interesting links for geeky singles out there.

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post.

Permalink | Comments | Email This Story
]]>urls-we-dig-uphttps://www.techdirt.com/comment_rss.php?sid=20101011/04435411362Fri, 22 Jun 2012 09:12:00 PDTSpeech-Via-Algorithm Is Still Speech, And Censoring It Is Still CensorshipMike Masnickhttps://www.techdirt.com/articles/20120621/14440519421/speech-via-algorithm-is-still-speech-censoring-it-is-still-censorship.shtml
https://www.techdirt.com/articles/20120621/14440519421/speech-via-algorithm-is-still-speech-censoring-it-is-still-censorship.shtmlpast, I've often found him to be thoughtful and (very) insightful on various topics concerning internet policy and regulations. But, at times, he seems to go off the deep end, such as with his recent claims that big automatically means a monopoly. But, his latest piece in the NYTimes goes way further than anything I've seen before: claiming that search results shouldn't get First Amendment protection because it's "computers" speaking, not humans.

Is there a compelling argument that computerized decisions should be considered speech? As a matter of legal logic, there is some similarity among Google, Ann Landers, Socrates and other providers of answers. But if you look more closely, the comparison falters. Socrates was a man who died for his views; computer programs are utilitarian instruments meant to serve us. Protecting a computer’s “speech” is only indirectly related to the purposes of the First Amendment, which is intended to protect actual humans against the evil of state censorship. The First Amendment has wandered far from its purposes when it is recruited to protect commercial automatons from regulatory scrutiny.

This is wrong. And dangerous.

Let's be clear here: what search engines do is present opinions. They are opinions based on data programmed by humans. There is nothing special in that it's a computer that cranks through the data to output the opinion. Taken to its logical conclusion, Wu's argument is that we should protect uninformed opinions not based on an algorithmic exploration of the data -- but the second you add in a computer to crunch the numbers, that opinion is no longer protected. Contrary to Wu's assertion that this "wanders" from the point of the First Amendment, I'd argue the exact opposite. The First Amendment should protect all kinds of speech, but we should be especially happy about that which is based on data.

Two responses to Wu's piece help highlight this point nicely. Julian Sanchez notes that Wu's argument seems to suggest that any computer generated content doesn't get First Amendment protections -- and that would include computer-generated video games and movies:

Consider an argument for denying First Amendment protection to movies and video games. Human beings, we all agree, have constitutional rights—but mere machines do not. When the computer in your game console or DVD player “decides” to display certain images on a screen, therefore, this is not protected speech, but merely the output of a mechanical process that legislatures may regulate without any special restrictions. All those court rulings that have found these media to be protected forms of expression, therefore, are confused efforts to imbue computers with constitutional rights—surely foreshadowing the ominous rise of Skynet.

Probably nobody finds this argument very convincing, and it hardly takes a legal scholar to see what’s wrong with it: Computers don’t really autonomously “decide” anything: They execute algorithms that embody decisions made by their human programmers.

Similarly, Paul Levy takes Wu to task by comparing his argument to things like university rankings:

As the alumnus of a college that proudly rejects the proposition that the quality of educational institutions can be “measured by a series of data points,” I will take any opportunity to denigrate the Useless News and World Distort rankings of colleges, law schools and institutions of higher education. But it would never have occurred to me to offer a “speech by computer” theory as a basis for denying that the ranking is speech or that it is protected opinion. Maybe a stupid opinion, but that is not a basis for shutting the raters down, or enjoining them to change their rating criteria. Indeed, this theory seems to me absurd — it is not the computers that have free speech rights, any more than printing presses have free speech rights. It is the media companies that own the printing presses that have free speech rights, and by the same token it is the people and companies who program the computers and publish the results of their calculations that enjoy protection under the First Amendment.

Pretending that Google is a computer that magically generates answers, absent humans regular and consistent input into its algorithm, is a strange position to take, and it really does suggest that we should only protect opinions that don't include a component that uses a computer to analyze the data. I can't see how that's smart policy or anything close to what was intended in the concept of free speech.