Re-Thinking In-Line Linking: DITA Devotees Take Note!

The question of whether links on a web page should be inline in the text or relegated to one of the margins is not a new one. It deserves re-examination because the increasing popularity of the Darwin Information Typing Architecture (DITA) introduces a technical bias into the question. Managing inline linking in reusable content in DITA is complex, which makes it expensive, which makes it rare. It is commonly considered a DITA best practice to avoid inline links in favor of links created by external reltables, which means that the links are listed at the bottom of the page. (DITA 1.2 keyrefs provide some support for inline linking, but at the cost of significant complexity).

This strikes me as the tail wagging the dog — a design approach being dictated by the limits of a particular tool. Naturally, DITA advocates justify this approach by referring to those studies and style guides that come down on the no inline-linking side of the question. But I think it is a design approach with serious flaws and dubious justification, and this limitation should give people pause when they consider jumping on the DITA bandwagon.

There have been several small studies on the appropriate location of links in a web page, and they have produced conflicting results. There are a few reasons for this, including the time at which the studies were done, the design of the studies, and the presumptions that the researchers made about what kind of reading web pages should be optimized for.

The IBM Style Guide (a very comprehensive guide for writing software documentation, and now available to the general public) recommends placing links in a list at the bottom of a topic (page) rather than inline. Their research has found that a reader is less likely to become distracted when the links are at the bottom, where they won’t entice the reader to go off-topic but where they still can be used if needed.

I don’t know how old the research is on which this recommendation is based, but I do know that 15 years ago I certainly found inline links distracting. My brain saw the underlining and the color change as emphasis in the text. But my brain is now fully used to seeing links (which are generally more subtly presented today) and I read right past them without any sense that the text is emphasised or inflected in any way. Concern on that front, therefore, seems somewhat out of date.

There are other ways links could be considered distractions, but I will get to them later. Another criticism of inline linking is that readers may fail to see the links. This is the exact opposite worry: the worry that inline links are so inconspicuous that people won’t even notice them, let alone be distracted by them. This opposite fear leads, interestingly, to the same conclusion: place the links at the bottom of the page.

This conclusion is contradicted, however, by a 2001 study from SURL (The Software Usability Research Lab at Wichita State University), which found that links were equally easy to find, no matter where they were located. Here is where we can begin to see how the design of the experimental influences the results. In this study, the participants were asked questions which required them to follow the links to find the answers. Thus they had to explicitly look for links. This suggests that when people have to find links, it doesn’t much matter where they are. But the success rate of people explicitly looking for links does not tell us whether people not looking for them would be more likely to notice them if they were inline or in the margins.

Another problem with this type of study is that, precisely because it is a study, and the participants know that it is a study, they will expect that the answers are to be found within the content or its links. The text being studied exists in an implicit box, from which no participant is likely to break out by Googling for the answers on the test. But in real life, where there is no guarantee that the text before you contains all the answers you seek, Google beckons constantly.

Interestingly, while the study found no difference in the ability to find links based on their location, it found a strong anecdotal preference for inline linking:

[T]here were significant subjective differences between the link arrangements favoring the embedded links. That is, participants indicated that they believed that embedding the links within a document made it easier to navigate, easier to recognize key information, easier to follow the main idea of the passages, and promoted comprehension. Moreover, participants significantly preferred the embedded link arrangement to the other arrangements. Conversely, placing links at the bottom of a document was perceived as being the least navigable arrangement, and was consequently least preferred.

This strong user preference is apparently at odds with studies that actually set out to test comprehension and retention, at least according to Nicholas Carr’s alarmist The Shallows (an excerpt here from Wired). Carr worries that the web is full of distractions, such as inline links, which are rewiring our brains and causing us not to think deeply or retain what we read. All this is deeply at odds with the many tracts that tells us that Web users don’t read, they skim, and that therefore we should make every page a frenetic mix of titles, tables, lists, graphics, and other geegaws, lest the sight of two plain paragraphs in succession should send the reader spinning off into space.

On the web, you can’t win, it seems. Readers won’t read plain text, and they won’t retain text filled with distractions.

All of this concern seems to me to rest on two deeply flawed presumptions: that reading on the web should be like reading on paper, and that people read in order to retain the text. Here’s what’s wrong with both these ideas:

Distinguishing information seeking from information consumption

All the studies of reading on the web that I have seen seem to miss one very basic fact. Before information consumption, comes information seeking.

In the book world, information seeking begins with a trip to the library or to the bookstore. It then involves a lot of flipping through card catalogs, a lot of walking through the stacks with your head tilted sideways reading the spines, and a lot of flipping through introductions, TOCs, and indexes as you select which books to actually borrow or buy. Finally, you proceed to the checkout and take yourself back home with your stack, settling into your easy chair, and start reading. That is a lot of time, effort, and cash invested in information seeking before any real reading begins.

By contrast, on the web, information seeking starts with a Google search, followed by opening all the likely looking search results in browser tabs, followed by going through the tabs one by one skimming for promising content, closing unpromising tabs, and finally going back and reading more thoroughly the tabs that seemed promising (at least, that’s my process). All this involves a lot of reading that is pure information seeking, equivalent to the time spent in the card catalog and the stacks at the library or bookstore.

But when people compare web reading to book reading, they count all of the information seeking behavior as “reading” on the web, whereas they only count the time after you settle into your easy chair as “reading” for the book world (if they measure it at all). This is a gross distortion. In the library to find your way to texts; on the web you find your way through texts.

Another difference between the library and the web is that once you have found a likely text, your investment in that text in the book world is much greater than your investment in the web world. Even if the books you got from the bookstore or library prove disappointing at first, you are likely to stick with them because it will take a lot more time, effort, and money to replace them. Sticking with them to see if they will eventually prove at least adequate for your needs is a good economic strategy because the cost of starting over is high.

On the other hand, if you find a text on the web that proves disappointing, your investment is very small, and it makes much better economic sense to look for a better text, since doing so is very inexpensive. This does not represent a loss of the ability to concentrate, just a rational change in behavior based on the reduced cost of acquiring alternate texts.

So, most of the eyeballs on your text are not in settled reading mode, they are in wayfinding mode. They will skip and skim, because that is what you do in wayfinding mode. (This is not a web thing at all, since you do the same thing standing in the stacks at the library. But the eye tracking tools can’t measure you there.)

In fact, the mostly-wayfinding/sometimes-reading pattern is exactly what shows up in the studies. It is simply that the studies all seem to interpret it as a difference between reading paper and reading on the web when a simpler explanation is that information finding in a paper world mostly takes place before the seeker sits down to read.

Some people think of links as exit points, but for someone in wayfinding mode, anything you are not interested in or don’t understand is an exit point. If I am reading (or skimming) a text, and it makes a reference to a concept I don’t understand, or a task I don’t know how to perform, what do I do? My choices are:

Go back to Google and find another text written more to my background.

Highlight the text that refers to the unfamiliar concept or task, then right click and select “Search Google for…” (that is, make my own link).

Go hunting around the margins of the page to see if there are helpful links there.

Click on the handy link that the thoughtful author has provided right on the troublesome reference and go to content on the same site that clarifies the concept or describes the task.

Options 1 and 2 both take the reader away from your site, perhaps for good. Option 3 is improbable, given that:

Most sites don’t do this, so why would you be in the habit of looking for them?

Even if there are links, their relevance may not be obvious, and I have to go through the entire list to find one that might be relevant.

Option 2 is easier and works universally.

Only option 4 keeps the user on your site. A useful link turns an exit point into a re-entry point. It also has the added bonus that it is quite difficult to select text that is a link, so it actively discourages option 2. Not providing a link is not likely to keep the disappointed wayfinder on your page; providing a link creates the possibility of keeping them on your site.

The goal of most reading is not retention

Five or six times a year I make a six hour drive to visit one of the scattered parts of my family. Along the way, I read hundreds of road signs, yet by the end of my journey, I don’t remember any of them.

Did those road signs fail as communication devices because I don’t retain them? Of course not. Those signs exist to inform me of specific local situations requiring specific local decisions and actions on my part. For them to do their job, all this is required from me is that I decide and act correctly in the moment. Once I have taken the correct action, there is no reason to remember the sign. Forgetting past signs is actually a good thing, allowing me to concentrate on the immediate road ahead.

Most of our reading is like this — in the moment. When we read a novel, we read for the visceral impact of the imagined events, not to retain the text (unless we are reading for school, which, of course, makes the reading joyless).

When we read technical documentation, we read to complete our immediate task, not to commit the procedure to memory (if we do remember a procedure, it will be through repetition, not reading to retain).

When we read the news or a family letter, what we read may pass into the mass of experiences that shape our hopes and opinions, our sense of how the world works, but they are seldom individually recalled, unless highly affecting and frequently repeated, like the images of the planes hitting the twin towers.

No one reads tweets for retention.

When we read articles or blogs in our professional sphere we read mostly to confirm or to dispute. Let’s be honest about this: if you are pro inline linking, you are reading this to confirm your preference, and if you are anti, you are searching for weaknesses, omissions, and flaws in the argument so that you can rebut. And I would suggest this is a perfectly healthy way to approach this. Antagonism leads to debate, and debate leads to discovery and new understanding.

Even in a school setting, we don’t generally read for lifetime learning; we read for test passing. If we are going to talk about communications media rewiring our brains, we have to acknowledge that the school system rewires our brains to retain texts up till exam time, and to discard them as soon as the exam is complete.

And while we are on the subject of schools, we should note that a great many of the studies on this issue use what is essentially a school model of testing. The subjects are assigned to read a text and then to answer questions, just as they would be if they were in school. As in school, they are not motivated by their own tasks or their own interests, and, as in school, they are consequently easy prey to distraction. Purpose and focus makes all the difference in concentration, retention, and resistance to distractions, but purpose and focus are hard to reproduce in laboratory conditions.

Reading to do

It is comparatively rare, then, that we actually read with the goal or retention and long term learning, either in school or out of it. Most of the time we are reading to complete a specific short term task. Hypertext and search are great for this because they allow us to move very quickly through a large body of content to narrow in on the piece of information we need to complete our immediate task. There is no reason to read thoroughly every text we encounter on this search and no reason to remember any of them once we have completed the task. To complain that people reading for this purpose do not read thoroughly or retain what they have read is to entirely miss the point of why they are reading in the first place. They are reading to do, not to learn.

Reading to learn

On those relatively rare occasions when we do actually sit down to read with the intention of learning and retaining, I think we naturally fall into the mode of reading thoroughly and with attention. At least, I do, and I, at least, don’t find that I even notice links in the text when I am in that mode. I am motivated by a goal to which I am committed and in which I am deeply interested. I’m not seeking distractions, and I am not easily distracted.

On the other hand, if I do come across a concept in what I am reading which is not explained in the text, I do want to take a side trip to understand that concept so that I can return to what I am reading and continue with full comprehension. Again, therefore, an inline link is welcome.

That said, I do find that it is when I am reading in this mode that I tend to move away from the web and pick up a book or an e-book. If I am set upon a course of study, it is generally easier to pick up a good long-form volume that is designed specifically to instruct in that subject area. I will usually keep a tablet nearby in case I need to look up something that the author does not explain, but generally, I do this type of reading off the grid.

This is important too, because if other people are like me in this regard, that means that an even higher percentage of reading on the web is done not for retention but for action, immediate experience, reference, or debate. To condemn inline linking on the grounds that it is detrimental to retention, therefore, is to apply an almost entirely irrelevant standard. The issues should be, does inline linking support navigation, information finding, and getting to action?

Escaping the limits of paper

In the book world, it was reasonable to assume that your reader was going to read your text in a single narrative flow. This was due to the limits of paper and the effort required to switch to another text. Linear reading was an optimal strategy in the paper-and-libraries world, and so it became a cultural norm, something that people were trained to do, not because it was optimal in abstract, but because it was optimal within the limits of the available technology.

On the web, it is not reasonable to expect the same linear reading behavior to be the norm, nor should we wring our hands or imagine that civilization is coming to an end because people are discovering new ways to optimize their information finding in an online world.

Helping the unprepared reader

Readers come to texts from impossibly diverse backgrounds. Even in the most carefully planned book-based curriculum, no two students arrive equally prepared (or identically unprepared) for the same text. With web texts, arrived at by search, link, or social curation, readers, if anything, arrive even more unprepared.

The only way that the unprepared reader is going to successfully comprehend your text is by filling in the gaps in their preparation. Keeping their attention focussed on the single narrative line of your text is not the right answer, because they are not prepared to understand that narrative line.

Forcing them to continue can only produce an uncomprehending recollection of the text itself (a phenomenon anyone who has spent time in a classroom is thoroughly familiar with). But on the Web, there is no way to force them to continue. They can leave anytime they want to. And once they leave, why should they come back to your site, since it failed them? But if you provide them with a rich set of links, you provide them with a way to fill their gaps and equip themselves to comprehend your text. You thus give them a reason to stay, and a reason to return.

Sometimes not linking can be the best strategy

This is not to say that you should always prefer inline linking. It is certainly possible to imagine situations in which inline links should be avoided. Ginny Reddish points out, for instance, that links are distracting for low-literacy readers.

Of course, we don’t design most content for low-literacy readers We don’t design most things for novices. If we did, the Tour de France would be contested on tricycles. We design things for regular users, and expect novices to push through their novice difficulties and learn. But even so, there will be times when you design a site for low-literacy readers.

Different types of content may also call for different approaches to linking. More contemplative content may be better served by fewer links, while more action oriented content may benefit greatly from rich linking. Commercial content clearly benefits from rich linking that keeps the reader browsing and seeing more products. Public service or compliance oriented content, on the other hand, may benefit from providing a fixed path (providing the content itself works well enough to keep the reader on it).

If you choose not to link, however, don’t fool yourself that the mere absence of links will induce people to read your content to the end. Google is always just a swipe and a click away, and everyone but a rank novice web user knows exactly how to make a link for themselves anytime they want one.

A good link strategy is important

Given the choice to link, it is, of course, important to link in the right way. Links should provide context so that the reader has a reasonable expectation about where the link may lead. Links should be useful, rewarding, and deliver on what they promise. A poor or inconsistent linking strategy, or poor selection of link targets, will frustrate the reader and send them away from your site.

Fundamentally, links should fulfill a useful purpose for the reader and/or the writer. Purposes may differ for different sites, but the main purposes I see are these:

Assist the wayfarer for whom the current text is not their final destination reach the text they really need (preferably on your site).

Assist the reader who is not fully prepared for the text to fill in gaps in their knowledge or preparation so they can complete their task using your text (as opposed to someone else’s).

Support your arguments or claims with evidence or argument from other sites. (Yes, this involves sending the reader to another site, but if doing so improves your credibility, that may still be a win.)

Keep the reader on your site as long as possible. (Your site is what matters here, not the individual page. For most commercial purposes, having the reader visit multiple pages on your site, rather than just one, is highly desirable.)

Achieving a consistent link strategy

I began this essay by complaining about link strategy being dictated by the limitations of tools. Unfortunately, most tools on the market limit your linking strategy in one way or another. Sometimes this means limiting the kinds of links you can create or maintain, but the more common problem is that they simply make the creation and management of links so expensive that in practice organizations cannot afford to provide rich linking.

Whatever your linking strategy, you want to make sure that that strategy is implemented consistently. You don’t want your tools to dictate your strategy to you, and you don’t really want to leave it all in the hands of individual authors either, especially if you are reusing content in different places where you might want to implement different linking strategies.

Link management does not — or should not — mean letting individual authors create links higgledy piggledy and then deploying expensive CMS systems and/or exotic mapping systems to keep them from breaking. What link management should mean is having a consistent linking strategy and being able to apply it in a consistent way.

This is really hard to do if your tools don’t support central control of link creation and you depend on individual authors to adhere to your linking policy. And it creates significant overhead for the author if they have to make these kinds of decisions as they write. If authors have to concern themselves with when to link, what to link to, and where to locate the links on the page, chances are they will not create very many links, and certainly you will not get consistency across the site especially if you have a large number of authors.

One solution is to use a soft linking approach, such as that supported by the SPFE architecture. Soft linking lets authors quickly mark up mentions of concepts, objects, and tasks that you might want to provide links for, without creating actual links. At build time, you can then apply a consistent link policy to these potential link points, selecting when to link, where to place the links, and which resource to link to. All this is done algorithmically based on available metadata, which means you get highly consistent linking, and the correct density and location of links for each media or publication, without having to police individual authors, and without your strategy being dictated by your tools.

We are still adapting to hypermedia systems. This applies equally to readers and writers, thought we all seem to adapt faster as readers than we do a writers, behaving one way when we read, but still writing as if everyone else read the old way. But hypermedia are very different from the old fixed media, and we cannot continue to think of the content we push to the web the same way as the content we used to print and bind and throw in the box. On the web, links are not a decorative gloss that we lay over a static publications. They are the nerves and sinews of hypermedia. We need to stop dismissing links as distractions or exit points, and start using them strategically as mechanisms for navigation, empowerment, and reader retention.

About The Author

Mark Baker, President of Analecta Communications Inc., is a twenty-year veteran of the technical communication industry, with particular experience in developing task-oriented, topic-based content. He has worked as a technical writer, a publications manager, a structured authoring consultant and trainer, and as a designer, architect, and builder of structured authoring systems. He blogs at everypageispageone.com.

About Scott Abel

Known as The Content Wrangler, Scott Abel is an internationally recognized global content strategist, a faculty member at University of California Berkeley, School of Information, and a vibrant speaker who is often a featured presenter at content industry events around the globe.
Scott's message is clear: Content is a business asset worth managing efficiently and effectively. He works to help content-heavy organizations adopt the tools, technologies, and techniques needed to connect content to customers.
Scott is a founding member of Content Management Professionals and co-produces the annual Intelligent Content Conference and Content Strategy Workshops. In addition, Scott writes regularly for content industry publications and and was listed by EContentmag.com as one of the top 50 content marketing influencers.

First of all, thanks for the great read. I would like to expand on a couple of points.

Novice vs. Advanced Users

You briefly danced around is how novice and advanced web users are different. One thing I have completely anecdotally noticed is how advanced web users open links in new tabs to be assimilated immediately if necessary or at at their leisure if not.

Novice users, again in my completely anecdotal experience, get confused by tabs and have trouble following when interrupting the main goal of the page to check tabbed reference content even when the reference content addresses comprehension gaps.

If you are writing for a technologically advanced audience, like say this blog, then in-line links shouldn’t be much of a problem.

A novice audience would likely understand the content better if it flows more like in a print document with any reference links at the end.

Content Strategy, Links and SEO

You can’t talk about web content and links without mentioning search engine optimisation. Links and content are the foundation of search rankings.

The very short, and grossly over-simplified, summary of how web pages rank in search results is that the pages with the most external (most important) and internal (less important) links pointing to it with the same anchor text will rank the highest for queries matching the anchor text as long as the query term is located somewhat prominently on the page.

That is the short version, but anything that simple is easily manipulated and the search engines have had to evolve to avoid manipulation by poor-quality or irrelevant sites.

The search engines have invested a lot in to analyzing links to devalue any manipulative links. Footer links, for example, are not particularly worthwhile anymore because of their abuse.

The two pages below have some particularly good coverage of search engine patents related to valuing links based on their position within a document.

While there is no hard evidence that listing links at the end of a document will devalue the ranking power of those links, I can not in good faith recommend limiting links to a section that is easily detected by machine intelligence. A list of links in the same position of an XML document like an HTML page is the definition of easily detected by machine intelligence.

SEO Risks of Automating Links

As I mentioned earlier, pointing lots of links to a page with the same anchor text is important to ranking a page for a query matching that anchor text.

As a rule, if it is important to ranking in search engines; it has been automated.

And anything automated is a target for search penalty.

Google just rolled out an over-optimisation penalty and one of the things that they are targeting is anchor text over-optimisation.

While I don’t think automating internal linking is enough to trigger the current penalty, it would be smart to give writers the leeway to create their own links (while automatically surfacing pages that are likely related to the document because they won’t put in the effort on their own) in combination with a soft linking approach in order to limit any automation footprint and future-proof the SEO value of a document.

While automation presents risks, it isn’t an issue in the context of most large, authoritative sites where various sections are controlled by different interests with different goals and processes.

But that shouldn’t absolve anyone involved in content to be aware of the issues.

Of course, nothing in a DITA topic precludes the use of semantic “mention markup” and associated soft linking–it just doesn’t exist in most of the standard build tools yet. The beauty of the newer “baby DITA” implementations is that they have been targeted at classes of users who are able to make use of the more natural connotative linking strategies that lighter workflows or delivery requirements allow. As you point out, the query-based links work very well with XQuery- or XPath-aware databases, so I expect to see more of this kind of exploration roll out among newer DITA uptake scenarios.

Which is also to say that I don’t expect much notice to be taken by those already invested in directed link management–those users have a reason to manage high volumes of entitled content with reliably determined, warrantable behaviors that only a rich CMS can provide. I think the ripe domains for soft linking would be activities like knowledge capture for support centers and SMEs, software architects and programmers, and high-level designers for the types of content where implicit relationships beg to be discovered.

Hard linking has its place too–I’m not about to replace my bookmarks file with a custom taxonomy that may not get me back to the precise resource I had in mind! But I do see things headed that way as the social media world progressively builds retrieval on likes and associations rather than on solid URLs. Let’s watch this space.
—
Don

The best advice I ever got about software was, “Don’t fight with the tool.” Either go along with how a particular tool/technology/product wants you to do things, or find a different tool. Trying to impose your preferred way of doing things on a system that was not designed to work that way is trouble.

Thus, spending a lot of time in DITA is likely to lead to a bias away from soft links and perhaps inline links (those are supported, but as you point out, annoying to implement.)

That said, I find myself limiting the use of inline links in more narrative documents. I don’t want my reader to get distracted (“SQUIRREL!!”) by a link when I am carefully trying to construct a complex argument. The links in the first paragraph of this article are a good example. I might use footnotes and park the links there, but I don’t want a reader to start reading and then wander off to the DITA specification. After all, the gruesome details of how reltables work are not critical to your argument.

In a document I just wrote, I had a section that compared several different tools. I provided links to each tool’s web site, but I stuck them in a related topics section at the bottom of the section, so that the reader could read the overall assessment (4-6 paragraphs) and then decide which tools to research further.

However, I’m not sure any of these approaches make sense for task-based documentation.

I, as the editor, am responsible for many of the additional hyperlinks that Mark didn’t include in his original article. I find inline links commonplace and think (using my psychic powers) that users are accustomed to them and that if they need them, they use them.

Actually, I don;t use psychic power at all. In the past I used http://www.usabilitytest.com/ which provided me with a video of each user’s cursor movements (as well as clicks and scrolls) as they navigate the site. I was SHOCKED to see how many features I thought didn’t matter were actually the ones site visitors used. I also watched in horror as they wondered aimlessly around the site try to find something? Or, were they just info-browsing/info-snacking?

Any way, the point is, I agree that links can be both useful and interruptions to a narrative flow, but I don’t believe there is one best way to do it, given the internet and the wide variety of users, experience levels, expectations, cultural differences, devices used to access sites and the like.

I think that it’s important to remember that where links generated from relationship tables are located is PURELY a style sheet choice. They can be at the bottom of the page, at the beginning, in a tasteful sidebar, or someplace else. Just because the DITA Open Toolkit renders them by default at the bottom of a topic, there’s no reason (other than style development time) to use the default rendering.

The reader’s potential for distraction may depend on one’s literacy level, but their method of surfing the web may affect reading behavior even more. I notice that StumbleUpon readers and mobile users (granted these are apples and oranges categories) behave quite differently than others. Photo links seem to attract these readers more than text links. In general, not sure many of us, independent of literacy level, actually read articles. We scan headlines and topic sentences! If it’s good stuff, I go back. On this fine article, I found myself going back.

Another great, discussion-provoking article, Mark. In the context of tech comms, I think inline links work really well for some things such as term definitions and optional procedures, or pre-requisite procedures that a user may or may not have completed. For more tangential topics, the bottom of the page may be better. In any case, there needs to be a consistent link strategy, as you say.

SPFE’s way of doing soft links is interesting and has lots of potential – I’m looking forward to learning more about it as the SPFE-OT develops.

An alternative approach is to enable soft links in the delivery platform itself (for example a Web CMS or a client app). The metadata persists through processing of the source and is still there when the content gets to the delivery platform, which then creates the links. That way it’s easy to dynamically link various pieces of content which were prepared in different ways and at different times. I’m aware of a couple of DITA-sourced implementations like this. Though the ones I know of display related topic listings below videos/articles or in menus, it wouldn’t be too much of a stretch to do this for inline links.

As for keyrefs, they can be made quite usable, though it takes thoughtful tool design. They can really come into their own where authors need to link to a given logical topic – one that may have great variation among its different concurrent versions, but fulfills the same general function wherever it appears. This is particularly useful where the relationship between the topics isn’t amenable to rules – for example it isn’t a term/definition or procedure/overview kind of relationship, but perhaps a one-off link to a prerequisite procedure. In this situation, a DITA-aware CMS could be used to define the canonical values for logical topics.

Actually, whatever linking techniques are used, authors always need a way to be sure their links will work OK (if not at the time when the link is created, at least when there is something available to link to). If soft linking relies on the spelling of element contents or attribute values, preferably there is something more sophisticated than a spellchecker or copying/pasting to ensure that strings match. On the other hand, if values are centrally controlled, it needs to be easy to update them, or people will find ways to circumvent the system. A CMS isn’t essential for this, but it can really help.

@Damon — Thanks for the comment. The issue of novice web users is a interesting one. The problem is, how do you figure out if the readers of your content are going to be web novices or not. You may know that you are writing for novices in your product or service, but that does not necessarily correlate to their being novices on the web. The two issues may be entirely orthogonal.

I think we have to recognize two kinds of SEO. One is following the appropriate standards to metadata and page structure — in other words, doing what Google wants you to do to make it easier to analyse your content, and the other is gaming the search algorithm, which, like the war between spammers and spam filters, is an endless (and time consuming) chess match. My view is that, in most cases as least, the best SEO is great content. I would not generally encourage writers to distract themselves from what they are writing, nor would I compromise the effectiveness of links for readers for the sake of tracking the latest wrinkle in the Google ranking algorithm.

@Don — you could certainly add soft linking to DITA, though the hard part is not the mention markup, but identifying the resources to link to. You need to design for that to make it effective. However, I’m not sure it would be wise to add yet more complexity into the standard.

But I disagree with you when you say: “reliably determined, warrantable behaviors that only a rich CMS can provide”. Given content that is created to support it, soft linking will be more reliable and more warrantable than links created and managed by individual human effort. A rich CMS attempts to mitigate the inherent fallibility of hand-woven linking, but at the expense of a lot of overhead. No other system that requires reliable warrantable linking relies on humans to manage individual links — they all generate links algorithmically from warrantable metadata.

The problem is that most content systems are not set up to capture warrantable metadata at authoring time and so have to hand weave relationships between the pieces later in the process.

@Sarah, I am with you absolutely on not fighting the tool. I think there is a real object lesson in the rise of JSON in the web service space. It was assumed that XML would be the lingua franca of the web because you could use it to represent any kind of data. But in being general, it is also verbose and complex, so frustrated JavaScript programmers looking for improved speed and simpler code, switched to JSON for much of the data they move around the web. JSON obviously will not work for complex text, but for most web data, its is proving to be the right tool for the job. Just because you can accomplish something with a tool does not mean it is the best tool for the job.

I agree, also, that this kind of content, which is making a general argument, benefits less from rich links than does task content, except for links to support information, like the studies I mentioned. But I’m not convinced that postponing links does much to keep people on the straight and narrow. If people are not fully engaged with your content, the can hop out any time they want. If they are going to hop out, I’d rather they did is using my link than by hopping over to Google.

@Scott — I think this is one of the most difficult parts of assessing any kinds of analytics data. We know what people look at and click on in enormous detail, but we don’t know what their motives of objectives are. We don’t know if they are in the right place or the wrong place. We don’t know if they are browsing or buying. We don’t know if they are getting it, or baffled; eating it up or disagreeing. We use the web for so many things that it is very hard to know how their actions relate to their motives.

And given that the audience is so diverse, I think we do have to just make a choice on how to write and organize information: we will never accommodate every pair of eyeballs that alights on our words. But the other factor here is that writers are often novices when it comes to contributing content to hypertext systems. I actually suspect that the audience is more sophisticated in navigating hypertext than writers are at creating it. The most potent argument for creating rich linking may be that we need to do it so that we learn how.

You wrote, “The question of whether links on a web page should be inline in the text or relegated to one of the margins … deserves re-examination”

I disagree. I think that question is unimportant. The real question you should be asking is, “What purpose does it serve to link from this topic to another, and how can I best present that link to my user?”. Without the question of purpose, any answer is meaningless.

Here’s some examples of why that matters.

1. If a user is at a task topic, and there is a required pre-requisite task they must have completed, the purpose of the link it to inform them of the requirement and to make it easy for them to accomplish it.

2. If a user is at a concept topic, one type of link you might want to provide is tasks they can accomplish with their new understanding of the concept.

3. If the user is at a reference topic, one type of link you might want to provide are related references (as in an API guide, where you might want to link from a function to the reference topic about the return type of the function).

Each of those three need, in my opinion, different types of links.

1. Pre-req links should be front and center for the user. They should be something that is hard to miss.

2. Tasks related to a concept are best at the end of the conceptual material.

3. Reference to reference links should be where the user needs that information, ie, inline.

If you approach your links like that, the question of inline or end of topic disappears. The links go where they belong.

By the end of this article, I think you are saying much the same thing.

That leads to another bit you wrote, “If authors have to concern themselves with when to link, what to link to, and where to locate the links on the page, chances are they will not create very many links” – I don’t think that’s a bad thing. An author should have to think carefully about whether to put a link in. No matter what system you use, managing links is challenging, at the very least from an information design perspective, and a link should only be added because it’s worth the effort of maintaining it.

I don’t believe that an information seeker will often bail out of a topic if, while scanning, they see a concept they don’t understand. In my experience, the rapidity of a user leaving a topic is directly related to the number of links in the topic, not how much they understand in the body of the topic. Clicking a link is an active, exciting, task. Reading, even scanning, isn’t. It’s work. Given the choice, people will usually click the link rather than continue scanning.

I guess my point boils down to this – I don’t agree that readers are good at navigating through hypertext to find the information they need to accomplish their goal. They are very good at clicking links. Those are completely different things.

@Kristen Agreed. The only place you can’t render reltable links is inline. They are, effectively, links from the topic as a whole rather than from a word or phrase in the topic.

That is actually a distinction worth noting. The difference between inline links and marginal links is not simply one of placement, but one of context. An inline link is a text to topic link, expressing a relationship between a particular word or phrase and one or more topics. A marginal link is a topic to topic link, expressing a relationship between one topic as a whole and another topic as a whole.

That’s enough of a difference to raise questions about more than the potential for a link to be distracting or to be overlooked. The user may well understand the link in a different way depending on if it appears in a textual context or in a general topic context. Where in the topic topic-context links appear is, as you say, a matter of style.

@Corky, I think this is the nature of information finding on the web. We read very little of what we scan. Then again, in the paper world we bought very few of the books whose blurbs we read in the bookstore.

A brilliant programmer once told me that the secret to information processing is fail fast/succeed slow. What he meant was that in most information processing, you are looking for a very small number of data points in a vast sea of data. You need to be able to eliminate the uninteresting pieces very quickly, and therefore you optimize your code to detect failure cases as quickly as possible. This provided much higher performance than optimizing the recognition of success cases, which are relatively rare.

Given the sheer volume of the web, readers have to adopt the same strategy. The reason most readers spend only seconds on a page is that they are using a fail fast approach. We should not let that behavior stop us from writing and designing for the people for whom our content is a success case. They are, in the end, the only people we can influence, and the only people that can get any value from our content.

@Joe, I agree that there is a place for both text-context (and therefore inline) links, and topic-context links (which could be anywhere, including inline). Not sure if “tangential” is the the criteria, though. It may depend on whether the link is tangential to an individual phrase or tangential to the topic as a whole.

It is definitely within the design intent of SPFE to support soft linking at runtime. This can address cases in online help where a user may have a combination of modules, perhaps at different version levels, on their machines. Rather than ship content with links to modules that may or may not be installed, your ship content with mention markup intact and link on the fly based on the link catalogs of the installed doc components. The beauty of this is that if they upgrade one component, all the help topics for all the other components automatically start linking to the new docs.

When you say, “Actually, whatever linking techniques are used, authors always need a way to be sure their links will work OK” I think you are highlighting a pervasive problem in content creation generally: writers do not trust any part of the production process and insist on manual verification of every output and every relationship.

This is a huge problem for anyone trying to reduce costs, shorten timeline, and improve quality through structured writing and automation. We need systems to be trustworthy, and we need writers to trust them. Without that, what Sarah O’Keefe refers to as the “velocity of publishing” will remain a crippling problem.

Trustworthy systems are particularly essential if you want to engage a wide range of collaborators. You can’t have every collaborator’s contribution made subject to manual verification in every iteration, or you will never get anything done.

To achieve this, we have to begin at the information model. If the information model is not designed to support reliable processes, then nothing you can do downstream will make the process really reliable. Much of the functionality of a modern CMS exists not to make the production process reliable, but to support human oversight and tinkering with an inherently unreliable process.

You talk about the cases where ” where the relationship between the topics isn’t amenable to rules “, but in a structured writing system, there should be no such cases. Relationships in a structured writing system should always be amenable to rules. But to achieve this, you have to look past the links. You have to get the topic types correct, and make sure that topics are created such that relationships can be reliably formed based on metadata. That is what structured writing is supposed to do for us — and what SPFE is really designed to facilitate.

@Steve, I certainly agree that the question of whether you should link and what you should link to are prior to, and thus more important than, the question of where the link should be located. But I still think that question is important, particular given the difference in context between a link in the text and a link in the margins.

That said, a lot of the criticism of inline linking is, implicitly or explicitly, criticism of linking at all. The real question at stake is often, should the author create a hypertext — something that is part of the web — or should they create a closed document — something that is on the Web, but not of the Web.

The answer to that question is, doubtless, it depends. But I think there is a huge prejudice among writers in favor of closed documents, and an equally marked preference among readers for hypertext. Thus I think the prejudice against hypertext needs to be challenged on many fronts, including, in this case, the location of links.

You say ” In my experience, the rapidity of a user leaving a topic is directly related to the number of links in the topic, not how much they understand in the body of the topic.” I wonder if you could comment on how you made these observations. It’s not that I have contradictory data, its just that these things seem very difficult to measure. How do you know how much people understand when they leave a topic? How do you know why they followed a link?

Beyond the quantitative difficulties, though, I have to ask if it is preferable to have a reader finish a topic despite not understanding the concepts it refers to, as opposed to providing the user with the means to gain the necessary understanding before continuing. If they read to the end, but remain baffled, how are either their goals or the writer’s goals being met?

True, if you send them off to somewhere where they can get the background they need, they may not come back (this is why ideally you would like to send them somewhere on your own site, to improve your chances of retaining them) but they might come back, and then read the topic to the end with understanding, achieving both their goal and the writer’s goal.

It seems to me that writers often set up a false goal of keeping the reader in the content at any cost. As I have said elsewhere, the real goal of all communication is to change behavior. Reading all the way to the end does not necessarily produce the behavior change you are looking for. The baffled reader is likely to leave with a bad taste in their mouth, never to return.

Business content exists either to sell more soap or to make people more satisfied with their soap-using experience (a thus buying more soap and recommending your soap to their friends). The linking strategy for your content should be driven by the goal of selling more soap, not of forcing people to read all the way to the end of whatever piece of content they have stumbled upon.

@Mark, authors can certainly become comfortable with not checking every output or every link destination, once they understand the benefits. What they do need, though, is a way to reliably input the thing that has the potential to generate a link. However linking is done, values need to match up somewhere, which means they have to be spelled or entered the same way each time. That’s what I meant by “whatever linking techniques are used, authors always need a way to be sure their links will work OK”. A CMS can provide one way to get this right at authoring time, by enabling authors to share a canonical list of values and use them easily. (Just as a CMS, used sensibly, can make content development and localization quite a lot easier, quicker, and cheaper. If people are doing excessive manual tweaking in the way you describe, perhaps they’re using an unsuitable CMS or have a potentially suitable one poorly configured?)

Regarding rules, I’m sure you’re right that there could be a rule written to describe any reasonable inter-topic relationship. But I’m yet to be convinced that this is the best solution for every team in every situation. For some teams, an all rule-based linking system could be just right. For others whose needs were different, it might introduce another technical bias – the tail wagging the dog again. In this case perhaps a hybrid system could be better. I’ve seen metadata-based linking work well alongside ID-based linking.

What a thoughtful and thought-provoking analysis. I love the way you always question accepted “wisdom,” combining research with common sense and an open, inquiring mind. (“Before information consumption, comes information seeking.” Etc., etc.) Clear, deep thinking–and beautiful writing to boot. Keep it coming!

@Joe — Many of the things you might want to link on already have canonical names, so you don’t necessarily have to create a canonical list yourself. There is certainly an upside to validating and enforcing canonical names at write time, but there is a lot of complexity involved in doing that, and for canonical names that authors know and will usually get right, a simple system that checks the names at compile time can be much much simpler to implement (no CMS needed) and pretty much equivalent in terms of productivity.

The other cases are things with no canonical names, and things with canonical names that authors are not familiar with (like topic ids, for instance). My preference is to avoid asking authors for canonical names that they are not otherwise familiar with. This is almost always possible, and a huge boost to productivity.

For things that have no canonical names of their own, the options are to invent canonical names (which, since they are invented, authors would not be familiar with) and to use a table of equivalents (or something equivalent) to match the various possible terms that the authors might use. The table of equivalents does have to be maintained, of course, but this can be done outside of the writing flow. To me, one of the key to achieving writing quality and productivity is to avoid breaking the writer’s flow with system tasks, so I consider this a big win, despite the need to manage the table of equivalents.

And yes, I agree that not every piece of content can be written this way. There is no one universal solution to the authoring challenge. Indeed, the majority of content will always be unstructured and unmanaged. We will always have to rely on content intelligence as well as intelligent content.

Boy, this discussion has drifted from the proper place of links in a topic! It does sort of illustrate how we move from topic to topic, which I guess says something about the role of links in content, though what it says may depend on how you feel about discussions that wander off topic. I love them.

@Marcia — What is conventional wisdom for except to question? Seriously, one of the problems with received wisdom is that it tends to get generalized far beyond the results of the original research, and then repeated and applied without much question. It really is necessary to ask from time to time if the emperor is fully dressed.

Credit for the title, however, belongs entirely to Scott. My original title was the much more prosaic “The Case for Inline Linking”. I rather fear that Scott may have been trying to get me into trouble with the DITA community, but I do have to agree that “rethinking inline linking” does trip off the tongue.