What if the linked-to text changes?

Several people pointed to the New York Times Emphasis project, which builds IDs from initial letters of sentences in a paragraph to provide some degree of resilience against the linked-to text being changed. It also tries using edit distances if it can't find the text.

Whether you still want to link to changed text is a tricky problem - if it has completely been removed, then the annotation or point of linking may have gone (pointing out a typo, or misstatement). Even a small change (adding the word 'not' for example) can mean that the point of linking has changed, so my first thought is that changing the text breaking the link can be reasonable.

If you want some fuzzy matching to go on, having more of the linked-to text in the fragment can only help the linked-to page identify where in the text was intended. Indeed, if enough is included, you could show the difference between what was linked to and what is there now.

What if the linked-to text occurs more than once?

By default, go to the first instance. If you want a different link, use more words to create a unique reference. While there have been proposals to link to the nth occurrence using more complex syntax, I don't think this is actually a natural choice, and likely to be more fragile. The NYT Emphasis tool mentioned above switched from a nth sentence type model to a content dependent one for this reason; fragmentions simplify and extend this idea.

I have tried to come up with a use case that fits this goal - the closest I can think of is referring to a particular repetition of a line of poetry, for example in a villanelle.

The only reason I can think of to link to specific lines would be to discuss them in the context of surrounding lines, so I think this works adequately.

If a tool is made to allow readers to construct a link to a specific phrase, indicating that that phrase is not unique to encourage them to choose a longer phrase may be worth it.

Could you combine an id and a fragmention?

A link of form #id##some+words has been suggested, but again I'm not sure I see the utility. This is the nth occurrence idea in a different guise. It combines two addressing models in one, making it harder to construct and more fragile to resolve.

Can we get rid of ## and just use #?

The value must be unique amongst all the IDs in the element's home subtree and must contain at least one character. The value must not contain any space characters.

There are no other restrictions on what form an ID can take; in particular, IDs can consist of just digits, start with a digit, start with an underscore, consist of just punctuation, etc

is that we may not need the ## (which technically makes an invalid URL) at all.

If an HTML5 id cannot contain a space, then a fragment that contains one like #two+words can never match an id (as id="two words" would be invalid). If it can't be an id it should be treated as a fragmention.

If you really want a one-word fragmention a trailing space like #word+ could be used.

This means that the idea of fragmention could be simplified: if a fragment contains a space it MUST be a fragmention, and should be searched for in the text. If it doesn't match any IDs in the page, it COULD be a fragmention and should be searched for in the text anyway.

Fragmentions become a fallback to be used when an id can't be found.

Do not go gentle into that good night

Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.

Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.

Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.

Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.

And you, my father, there on the sad height,
Curse, bless, me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.

About Me

Kevin Marks works on IndieWeb and open web tech. From 2011 to 2013 he was VP of Open Cloud Standards at Salesforce. From 2009 to 2010 we was ay BT as VP of Web Services. From 2007 to 2009, he worked at Google on OpenSocial. From 2003 to 2007 he was Principal Engineer at Technorati responsible for the spiders that make sense of the web and track millions of blogs daily. He has been inventing and innovating for over 20 years in emerging technologies where people, media and computers meet. Before joining Technorati, Kevin spent 5 years in the QuickTime Engineering team at Apple, building video capture and live streaming into OS X. He was a founder of The Multimedia Corporation in the UK, where he served as Production Manager and Executive Producer, shipping million-selling products and winning International awards. He has a Masters degree in Physics from Cambridge University and is a BBC-qualified Video Engineer.One of the driving forces behind microformats.org he regularly speaks at Conferences and Symposia on emergent net technologies and their cultural impact.