Posted
by
samzenpus
on Thursday June 09, 2011 @08:05AM
from the all-due-credit dept.

bizwriter writes "Google announced that it will support authorship HTML tags, a way to associate Web content with the individuals who create it. Suddenly, search engines know when one person was responsible for a body of work, no matter where content appears on the Web. If Google incorporates this into page relevance and ranking, as it is considering, the result could change the balance of power between those who create and those who publish."

So you can add these tags that mean google will direct people to the original author rather than your click-through blog - but why would you?

Because anything that helps put Gawker Media out of business is OK by me.

More seriously, because if I'm reading your blog's link to an article, it's because I want your commentary on the article. I might want the Fark thread about it, but I certainly don't want Gawker's take on BoingBoing's post about that dude on Reddit who read a NASA press release. If you're ju

It is made to sound more uncontrolled that it is. This is what really happens:

The markup uses existing standards such as HTML5 (rel=”author”) and XFN (rel=”me”) to enable search engines and other web services to identify works by the same author across the web.

This is handy, allowing search engines to find content by a specific author. It's not like Google will automatically decide what content links to which author.

We can't expect Google to give purely weighted search results based on this either. More like they will keep their existing page rankings, and include this extra author meta-data in specialized searches.

We know that great content comes from great authors, and we’re looking closely at ways this markup could help us highlight authors and rank search results.

The bnet article seems to over dramatize it, possibly due to a lack of understanding what this means for content creators.

Yes, but does that apply to the source code or to the displayed content? Copyright law doesn't seem to support HTML tags, whereas a direct statement "Copyright 2011 by Firstname Lastname" passes muster.

(Note than in the USA we all know you don't need a copyright statement to have the copyright. That's not what this is about.)

Yes, but does that apply to the source code or to the displayed content?

I just checked, and the answer is in the link provided to you. But I'm not going to tell you what the answer is, because that would be enabling your asshat behavior.

By my reading of the law... it makes no distinction between source or displayed content, but I see nothing in the law that would prohibit a copyright holder from claiming that someone else was the author. Perhaps some other law would, particularly if the claim could be construed as defamation, but I don't see anything in copyright law that addresses this issue.

Will this help or hurt? A little before the turn of the century I researched Quake and Quake II console commands, tested them all, and wrote short descriptions of how to use them and what they did. It was copied on dozens of other web sites, word for word, usually with no attribution and usually with someone else's name on it.

Meta tags were badly misused to spam search engines. And what if you're putting content [slashdot.org] on someone else's site and have no control over the meta tags?

Will this help or hurt? A little before the turn of the century I researched Quake and Quake II console commands, tested them all, and wrote short descriptions of how to use them and what they did. It was copied on dozens of other web sites, word for word, usually with no attribution and usually with someone else's name on it.

I'm not sure that would even be covered by copyright law. You aren't allowed to copyright "facts" or "factual data". Maybe if your "short descriptions" were long enough, or expounded on the command beyond being a simple summary, it could be considered an original work. But for the most part, a simple compilation or list of factual information is not considered a copyrightable work.

The data can't be copyrighted, but its presentation is. If you write a book about chemistry I can read it, learn from it, and write my own chemistry book using the facts from your book as long as I present those facts in my own words. The plagarists copied the entire thing whole cloth, even using the same IP address I used in one of the examples. Although my question here is about plagarism rather than copyright infringement (I had no problem with someone republishing it provided they gave me credit and a l

Now I wonder - it's an html5 tag. Should I already implement it on my own website which isn't html5 or would google then just ignore it ?I can already put it on my own site, blog, facebook,.. but if it's going to be ignored then I won't bother..

Google's engine does not distinguish between the various versions of HTML. As long as Google successfully detects the page as html (and it is quite good at determining that), you can use any feature from any version and Google could not care.

For what it is worth, this markup is also valid HTML 4, but HTML 4 simply does does not define the meaning of the "me" or "author" values of the rel attribute, while HTML 5 does define the meaning (although I have not actually verified that).

Which is exactly what will happen. Current link farms will cross pollinate each other and it will be nearly impossible to tell who really wrote anything. Least likely will be the person who did write the original content.

So you can totally spoof random people's names into any webpage? So searches for author=Obama come up with doctored pics of Osama-Obama slash or something?

Thanks for the imagery, but what is it that makes you think you can't _already_ claim any random person wrote something? Do you think the normal non-tag text in an HTML document is under a magic spell that present misattribution?

Of course twerps can claim stuff. So far people can just laugh stuff off.

Now the obvious use of the tag is for the copyright police... they're gonna try to make the author tag a statement almost akin to under oath. So all those tv show clips on youtube that don't have the network=author tag are instant slam-bait.

But now the more dangerous case is when Da Gov wants to do False Flag cases, and posts pics of Democrats sharing lingerie, and they put "A

I pick a respected author, perhaps academic, who writes about similar things as me. I publish my crap whitepaper claiming to be him. It's likely that no human will notice the deception. Depending on my goals, the human-readable text of the whitepaper will claim the author to be him or me.

I used a little humor. But yes, you absolutely have a clear case - you submit something in an intelligent style, and the first pass no one notices, until it accidentally gets picked up and then they slam the original creator.

What for example if that math paper that got hosed last week was *spoofed*? It's bad enough if the original author goofed, but since he got pulverized for "not checking", what if it was a classy defamation attack?

That's probably true. But if I understood this right, the point is to make the authors more visible on the internet - for example if I find a blog I like, I can easily find more writings by the same author, no matter what site they're on.

That's probably true. But if I understood this right, the point is to make the authors more visible on the internet - for example if I find a blog I like, I can easily find more writings by the same author, no matter what site they're on.

Unless the author has a common name like John Doe...

The only way a tag like this *might* work would be to make the tag value a public-key signature of the content enclosed inside the tag. Which would allow you to see that content A was signed by key XYZ, as was conten

Judging from the Google blog this doesn't sound much like a rip protection, but more as a way to allow searches like "Show me everything else the author of this particle has written". That said, rip protection should be possible, when they would mark the first page that they find with content as special and then everything with the same content as copy.

They can't. The fact that it is just basic HTML means that detect-and-strip will be downright trivial; but there is nothing(outside of the darkest fantasies of the "trusted computing" set) that could actually stop such activity.

It seems like this falls into the category of 'potentially useful incremental change'. It isn't resistant to rip-offs(but neither was the status quo) and it makes it somewhat easier for good-faith actors to make a pertinent piece of metadata easily accessible. The metadata dreams

If you include the host domain in the digital signature, you'd be able to prevent people from re-hosting the work (or at least detect it and ignore copies). You'd still need the priority system you suggested to identify THE author (otherwise, as you say, somebody could rip and re-sign the content for a new host).

It's probably too much work for the benefit you'd get, but it might be worth the experiment, and Google is exactly the people to do that experiment. It means a vast amount of crunching, possibly t

This tells search engines: "The linked person is an author of this linking page." The rel="author" link must point to an author page on the same site as the content page. For example, the page http://example.com/content/webmaster_tips could have a link to the author page at http://example.com/authors/mattcutts. Google uses a variety of algorithms to determine whether two URLs are part of the same site. For example, http://example.com/content, http://www.example.com/content, and http://news.example.com can all be considered as part of the same site, even though the hostnames are not identical.

Most people add their HTML to a server in one way or another. Isn't that publishing? It isn't like there are private web sites with articles that where written by an author then transferred to HTML to be posted to the web. Oh wait. No. AOL isn't that way any longer.

See details here [google.com], where it is explained that all works authored by someone in a domain should be linked to a unique author page at that domain, and that authors can associate/link their author pages between various domains using reciprocal linking.

I was wondering when it would be possible to quote and requote the amazing debate that will change our society as we know it and transform us all into peace loving philanthropists who respect life. Oh wait! that debate happened already in irc chat.

that it will be easy to randomize/ spoof/ rip off, and a stupid tag doesn't change anything:

FIRST APPEARANCE of author tag means something. and no, it doesn't mean i can change the publish date on the file to June 1st, 1896 and always be the first author: when did SEARCH ENGINES first see content XYZ with author tag ABC?

that's case closed, right there. you can't spoof this system, unless you have a time machine, or you can hack google

now, if anyone rips off your content, you will be able to point to google'

In that case, ripped off content of sites that Google scans hourly will get credit while real authors who maintain sites that are scanned less frequently won't get the credit they deserve. The new SEO will be "use blogger" which gets scanned (at least by Google) when you press "Publish". Unless Google can collaborate with other sites which allow users to publish data for the "first published" data? Does WordPress have hooks for such a collaboration? Would such a system be able to track plagiarism that i

besides, the problem is easily corrected: if you write something valuable to you that you fear someone will rip off, you ACTIVELY submit the page to the search engines, rather than waiting for them to be passively scanned