A DITA users group in the Portland, Oregon, metro area.

Main menu

Category Archives: instructional

At our October meeting last week, Roger Hadley and the technical communications team at Fiserv, a global financial service company with an office in Hillsboro, presented to a full house of PDX DITA users from such companies as Cisco, NetApp, InfoParse, and Harmonic.

Roger walked us through Fiserv’s implementation of faceted search that relies on DITA metadata and a SQL search server to search their 5,000 help topics. There is one search box to enter search terms and results are listed in a center pane containing links to help topics; you can further refine the results using filters shown in the left sidebar that are based on their metadata labels of Feature, Function, and Role.

The Stats

This awesome project took:

A team of 8 (5 writers, 1 lead, 1 architect, and 1 manager)

One year of planning

One year to implement

And some quick stats about their documentation ecosystem:

5,000 xml topics

6 product categories (each built from their own map)

Hundreds of conrefs and keyrefs embedded in the xml files

Filters based on the following:

Feature — 34 keywords

Function — 90 keywords

Role — 20 keywords

Planning and Implementation

To complete this project, the team:

Researched what their users needed from a better search experience (they had been hearing over the years that the search was getting worse and worse as their content was increasing)

Determined that a DITA metadata/SQL Search Server approach could yield a simpler and more accurate search experience

DITA Work

Decided that faceted search based on their existing tags for Feature, Function, and Role would best optimize search results for their particular content set

Created a homegrown utility that showed their Feature, Function, and Role tags so writers could easily add them to the top of each xml file

Manually edited all 5,000 help topics and tagged them with the appropriate Feature, Function, and Role metadata attributes

SQL/MS Search Server Work

The team used Microsoft Search Server Express to index pages based on their metadata. A SQL index file built during the XHTML transformation for each map/product, is used to populate a database of pages for each product. Breadcrumbs, navigation, and metadata-based search links are added to each content page based on the information in the SQL database, as well as the metadata in each page, via PHP.

Microsoft Search Server Express offered some built-in features that they used out-of-the-box or with relatively straightforward customization, such as:

Provides a single box on the search home page to enter search terms (which then searches all 25,000 topics)

Shows a list of search results in the form of links to help topics below the search box

Builds a TOC-like hierarchy in the left sidebar based on Feature, Function, and Role which you can refine results by

Join Us!

We hope you’ll join us December 2nd when we host a joint meeting with our local chapter of Write the Docs.

At the June meetup of PDX DITA I presented a brief talk about the <shortdesc> element, which had been puzzling me for a long time. Due to tragic audio failure, several remote attendees couldn’t tune in, so I promised a blog rendition of the talk. The post below is not a strict transcription, but it attempts to capture most of the main points, including some that emerged during the Q&A.

As the quotations above suggest, the humble <shortdesc> element has inspired a surprising amount of passion in writers about DITA. Pretty grandiose for a piece of text that’s often ignored and when it isn’t, takes up at most a couple of sentences per topic. Why is the shortdesc such an object of mystery?

I’m not sure I have an answer, so let’s start with a few non-mysterious facts.

What is <shortdesc>?

The shortdesc:

Is an optional element that precedes the topic body in a topic

Provides clues about topic content (enabling the “progressive disclosure model”–readers can scan it to see if they want to read on)

Shows up in search results and link previews as well as the body of the topic

Here are a few things the shortdesc isn’t (or shouldn’t be):

A lead-in or introduction

A promise about the contents of the topic

A sentence fragment

Why use <shortdesc>?

None of these facts really gets at what the shortdesc is for. Here is the OASIS reference’s summation of the purpose, which goes a little farther in explaining where the shortdesc shows up, but doesn’t quite get to the heart of things.

What is implied here, but not quite said, is that the shortdesc should answer the “so what”? question–in other words, what is the value in this topic, and why should I care about it? The shortdesc should either try and deflect the need to actually read the topic, by extracting the key and most actionable piece of information (especially effective with tasks), or it should attempt to help a reader decide whether this topic will actually be useful enough to be worth reading. The best case is that you can mouse over a shortdesc or find it in a mini-TOC and actually find the key detail you need without reading further–and there are topics that might well consist entirely of the shortdesc. (Imagine a topic of which the shortdesc is “You should install Service Pack 3 before attempting to install the latest version; otherwise your installation will fail.”) But the second best case is that you can tell whether or not this is the topic YOU need to read. For example, a shortdesc to a longer topic with several paragraphs might specify use case information about where the information is applicable. For example, a shortdesc that says “Users who need to use this technology in a distributed environment should understand the flow of information between servers.” You can achieve some of these goals by writing clear titles–but shortdescs give you a lot more space.

Because the content of shortdesc is promoted in search and shows up in a number of places as described below, making the shortdesc valuable in itself or a clear indicator of where to find value is very useful.

Where Does <shortdesc> Show Up?

<shortdesc> shows up helpfully in many places:

At the top of the topic in output

When you mouse over a cross-ref to the topic in HTML

In a mini-toc inside a top-level topic with several topics nested underneath it

In search results (internal to a Help system, or in a search engine)

The Need for Consistency

This multi-usefulness, however, creates consistency problems. If you don’t use a shortdesc in EVERY topic, you’ll see mini-TOCs in your output with gaps in them. If some of your short descriptions are sentence fragments, or if some are very long and some are very short, or if they use very different sentence structure, you can find yourself inadvertently creating TOCs with faulty parallelism. This isn’t the end of the world in terms of usability, but it looks sloppy and will annoy people who should be paying attention to your fantastic content.

Some Technical Limitations

<shortdesc> can’t contain any of the following items, so you can’t get too fancy with them. It’s best to think of them as a text-only element since they can’t have:

Conditional formatting (use <abstract> if you need to do this)

x-refs

codeblocks, lists, tables, or other fancy formatting

Challenges of <shortdesc>, Summarized

<shortdesc> is going to work best if your DITA implementation is already working well. If your content isn’t well-structured and concise with a clear purpose for each topic (concept, task, and reference) and a modular structure, it will be hard to write a clear shortdesc–in fact, difficulty in writing a shortdesc may be an early warning about content problems. If your team isn’t working with a clear idea of how to write a shortdesc, you’ll end up with consistency problems, so you need to communicate about what you’re doing. And if you have a lot of legacy content, you may find yourself writing this element in bulk which is probably not most people’s idea of fun.

Most of all, the <shortdesc> element needs to function in multiple contexts and be used consistently. Otherwise you’ll end up with confusing search results or hover text, and mini-TOCs that are gappy or not especially helpful. As we learned in our meetup this June, some people just decide not to deal with this tricky element.

Tips for Success

Nevertheless, <shortdesc> can have a lot of utility in highlighting valuable content. Here are some tips to make yours effective.

Use complete sentences.

Make sure your content can either stand alone (in which case consider formatting it distinctively in the output) or that it works as the opening of the topic.

Don’t be long-winded.

Use a consistent sentence structure. Statements work best.

Try and offer a takeaway that relieves someone from reading.

Don’t promise (e.g. “the following methods work:)

Be systematic in getting them done.

If a topic contains only one sentence, just make it the shortdesc.

If you have them, make shortdescs for “container topics” cover the nested topics succinctly.

How About You?

In our meetup, we learned that most people in our group are using <shortdesc> fairly traditionally, as described above, or else not using it at all. However, we keep hearing rumors about creative uses including special formatting and tool tips. If you have some ingenious ideas about how to use them, we’d love to hear from you in comments or at our next meetup.

I led a project earlier this year to convert my organization’s legacy Eclipse Help Center builds to HTML5 builds using Oxygen’s new WebHelp plugin for the DITA Open Toolkit. By changing the build output, we achieved the following:

Completely eliminated the costly Eclipse server maintenance

Improved our analytics results and potentially our SEO

Brought our output into HTML5 compliance

Unfortunately, actually implementing the plugin wasn’t just a matter of plopping the plugin into the Plugins directory. I had to upgrade the toolkit, rework scripts, and restructure directories to create a less-confusing, easily upgradeable build. Read on to understand my choices and learn from my experience.

Upgraded the Toolkit

First, I upgraded our toolkit, which I was happy to do because it’s good practice, as well as nice to take advantage of the latest that DITA has to offer. Unfortunately, I couldn’t upgrade it too far, because the plugin only supports DITA-OT 1.7.5, which is still better than the 1.5.4 version we’d been using.

This time, I customized using the Customization directory only. My first experience with DITA was a highly customized DITA OT 4.2.1 that still makes me twitch when I look at it. It worked fine and it had some decent customizations, but upgrading a hacked-up toolkit is about as simple as string theory.

Reworked the Scripts

I reworked the .sh script and our ant scripts. The WebHelp plugin includes a dita.sh file that you need to edit in order to set environmental variables for the build. This replaces the toolkit’s OOB startcmd.sh and was actually a little tricky to figure out because I was retrofitting to an existing build layout.

The instructions from Oxygen tell you how to use the dita.sh, but don’t mention anything about ant builds, which is how we build here (via Jenkins). Oxygen’s support team was also less familiar with this approach. The reason I didn’t want to use only the dita.sh was that I didn’t want to have every writer set TRANSTYPE or DITAVAL_FILE or DITA_DIR as environment variables every time she ran it from the command line.

For example, here’s what it would look like if the writer had to set all of the variables from the command line each time she ran a build:

Instead, I was able to revise the script so the build command now looks like this:

ant -Dargs.filter=docs/filters/on_prem_sys_admin_7_0.ditaval

My Goals for Reworking the Scripts

Simplify the build script so more than one user can build from their own environment

Automate the build

Produce PDFs along with HTML5 output

Make it possible to build using numerous maps, filters, and help sets

My Solutions for the Scripts

To reach these goals, I set the variables at build time using a series of build files that are triggered in one simple call on the command line that passes only the DITAVAL_FILE filter. Any writer, or build automation tool, can now use this command because the build scripts are no longer dependent on absolute paths to the DITA-OT or to an SVN repo.

On the command line or in Jenkins, any writer can run the new command in any help set directory, and the local build files reach out to a shared build file for all help sets. I put all of the targets in that one shared file, single-sourcing it like a good writer would.

I also wanted to preserve a previously-used script in our new build. It starts with a simple ant target that calls some .jars for creating PDFs. This target and its supporting .jars comb the main ditamap for submaps and create PDFs for each submap. This Chapter PDF output has been popular with our users, so I made sure we kept it.

Restructured the Directories

Because I was rewriting build files and creating new Jenkins scripts anyway, I figured it was a good time to flatten our directory structure a bit. I removed some unused directories, including some old build cruft and unnecessary branches (all highlighted in red).

Our former structure with extra layers:

productA/feature/ (used for eclipse output).

productA/version1/trunk/src/dita/

productA/version1/trunk/src/build/

productA/version1/trunk/output/

productA/version1/trunk/src/temp/

Here’s our new simpler way:

productA/version1/dita/

productA/version1/build/

productA/version1/output/

productA/version1/temp/

A next step is to research best practices for directory structures and DITA to further simplify our structure, but I’m happy to work in this slightly less cluttered place.

Customized WebHelp CSS and JavaScript

Knowing the hell of a customized toolkit made me extra concerned with customizing the Webhelp plugin. Although the plugin offers a lot, it does not provide a Customization directory. In order to change the look and feel of our HTML5 output, we had to edit the CSS and some JavaScript directly. This means that when/if we upgrade, it will take some work to transfer the changes to the new plugin. I already saw a huge reorg of files and directories between the pre-release and January versions of the same release.

Hold out for the most stable version of the plugin, no matter how excited you are about it.

The Search and CSS are way better than in any other product we evaluated, but still clunky.

My last gripe about customizing this plugin is the utter lack of documentation. The paragraph on customizing the CSS cracked me up because they make it sound like you can use any ol’ CSS file, but you still need to point to the html elements that the plugin uses, such as 5 nested UL elements!

Was It Worth It?

Hell, yeah! I’m known for jumping head first into complicated projects, and this was no exception. (Someday, I’ll blog about improving PDF output using XSLT.) But, should YOU use this plugin? The answer is yes if you:

Want HTML5 output for an okay price.

Like the out-of-the-box look, or you want to learn (or already know) JavaScript and CSS.

Are comfortable using Support forums or Technical Support for help.

Have any of you had to implement the Oxygen WebHelp plugin? If yes, I’d love to hear your challenges and how you overcame them.

In my last post about filtering, I tried to explain the theory of filtering, so you’d understand which kinds of problems filtering might solve. In this post, I’ll attempt to show how to get started on an actual project. This post is intended for folks who are just getting started with filtering.

Overall Process for a Filtering Project

Generally speaking, I find my process to be something like this:

Create a matrix of the product’s variations and the associated publications so that I have a reference to consult for my sanity. Basically, I’m trying to sketch out which topics I’ll include and how to filter them in or out (in other words, show them or not show them in the final publication). This might be organized by features, audience, revision number, platform, or a combo of any of these.

Use the matrix to define the values I’m going to need in my filter files. A “value” is the definition behind the features, audience, platform, etc. For example, the values might be “beginner,” “intermediate,” or “advanced,” or “Mac,” “Windows,” or “Linux”. Defining the values is basically figuring out who the audience is and the content they’ll need in the publication.

Create the actual filter files (based on the values in the matrix) so that the computer knows how to show or not show tagged topics and inline text appropriately at build/publish time.

Tag the content in the source files for the different features.

Create the final publications (.doc, html, pdf, etc.) using each filter.

Create a Matrix

In my last post, I used the robot product as an example. The robot came in different models, so the matrix I created for that was based on features. But I find that when I change the example, that sometimes produces a different “aha” moment for readers. So this time around, I’m going to use a cookbook example that’s based on audience values (beginner and advanced).

I need to create a cookbook for two culinary school courses: Beginning Baking and Advanced Baking. I’ve determined the chapters I’m going to need and which ones I’ll be able to share between the two different cookbooks. Here’s how it looks:

Define the Values

So in this case, we’re going to need two audience values:

Beginner

Advanced

Create the Filters

A filter is the actual file in which the values are defined. This may vary depending on which tool you’re using (Frame, Word, DITA, whatever), but the approach is generally the same between tools. As I mentioned in my last post, filtering is largely a mind-shift and less about which tool you’re using to accomplish the goal.

In my shop, we use DITA, so I would create a ditaval file for each of the cookbooks. (The filter files are called “ditavals” because they have a .ditaval filename extension).

Beginner

Advanced

Include/Exclude

Note that each of the ditavals includes/excludes the different audience values, like so:

Beginner — includes beginner, but excludes advanced.

Advanced — includes advanced, but excludes beginner.

The include/exclude action tells the computer which content to include and which to exclude when you publish your guide. So, for example, when you create the Beginner Guide, you want to see the following chapters:

Introduction

Cookies

Cakes and Pies

Basic Breads

Conclusion

You do not want to see:

Advanced Breads

French Pastry

Croissants

And when you create the Advanced Guide, you want to see:

Introduction

Cakes and Pies

Basic Breads

Advanced Breads

French Pastry

Croissants

Conclusion

You do not want to see:

Cookies

Tag at the Topic or Chapter Level

So, to accomplish this, you would “tag” the chapters so the computer knows what to show and what not to show at build/publish time. Again, in my shop, we use DITA, so my parent Cookbook file (called a “ditamap”) would look like this:

Breaking that Down

Introduction, Cakes and Pies, Basic Breads, and Conclusion are not tagged. That is because you want these to show (include) in both guides.

Here’s how the guides will look based on the include/exclude information in their associated filter file (the .ditaval):

If you don’t fully grasp this right now, try not to freak out. You’ll test it! You WILL get this to work. It just takes some time for your mind to adapt to this way of thinking.

Tag at the Inline Level

In addition to the topic-level tagging that you’ve done, you may have some text that you need to tag inline (e.g., specifying “audience=(value)” to specific elements). For example, in your Introduction chapter, you could do something like the following (apologies for crappy image; click it for better readability):

For the xml-phobic, here’s a different view of that same content with some nice red arrows to explain what’s going on (again, click the image for better readability):

Refer back to the “Breaking it Down” section above to understand how this tagged content will show or not show in your different cookbook versions.

QA the Tagging

Here are a few ideas for QAing your tagging:

Your tool may have a handy way to validate your tagging. In other words, if you have tagged something incorrectly, your tool may have a way to let you know that and show you exactly where you messed up. In my shop, we use Oxygen, which has a great validator.

You can also design in your own quick QA checks. For example, one thing I’ve done in the past is to create a topic that lists the PDF guides. Here’s an example where the table rows are tagged for either “sys_admin” or “end_user” audience value. In the final publication, I can quickly glance at this topic and see if the correct row is showing (in this case, my ditaval filters are such that only the end user row or only the system administrator row should be showing). Here’s how the XML looks:

And again for the xml-phobic, here is the same file in the WYSIWYG-ish view. The green indicates content that is tagged:

Compare the TOC of the final output against the filter. Is anything showing that shouldn’t be, or vice versa?

Look at your build output for any error messages. We’ll explore filtering challenges in another post, but for now, here’s a quick example of a filtering-related error message:
[FATAL] Failed to parse the input file ‘AdministeringIdeation.ditamap’ because all of its content has been filtered out. Please check the input file ‘AdministeringIdeation.ditamap’ and the ditaval file, and ensure that the input is valid.

Create the Cookbooks

Now that you have tagged everything and you’re ready to create the guides, you’ll push the magic button in whatever tool you’re using (for example, running the build on the command line; running the build in Oxygen; running the build via your CMS, or whatever). In Oxygen, I would choose my filter (e.g., beginning_baking.ditaval), my desired outputs (HTML, PDF, .doc, whatever), and then run the build. I would expect the result to be the Beginning Guide.

Start Small! But Start!

I tried to pick an example that was complicated enough to describe the power of filtering, but still accessible enough that it wasn’t completely overwhelming. My advice would be to pick a small project and practice on it until you get the hang of filtering.

If you have any questions, feel free to ask in the comments. Thanks for reading!

It can be difficult to understand the power of filtering if you’ve never done it before. I’ll attempt here to explain filtering for beginners with plans to write another post about how to actually get started with filtering.

Problem

Many times when you’re writing and publishing large amounts of content as a technical, sales, or marketing writer, you need the same snippet or large section of content in several different publications. For example, you use the same Terms & Conditions or Executive Statement in all of your publications (your company website, your brochures, your online help center, and so on).

Solution

Rather than writing content over and over again, or keeping it somewhere on your network in a bunch of slightly different versions, filtering allows you to easily reuse content in many different contexts (your company website, your brochures, your online help center). Write it once, use it often. How we do this is called “filtering”. You might also hear it called “single-sourcing” or “conditional text” or some other variation on those themes. Same thing. It means that you, the author, manually “tag” the content for the different versions of the content that you publish. You maintain one version but publish many versions.

To my way of thinking, filtering is really a methodology or way of approaching content and authoring content. Once you grasp the concept of filtering, you can accomplish the actual tasks in many different tools: any number of XML editors using DITA, FrameMaker, or even Microsoft Word.

Generally speaking, filtering is easy to implement. But, we’ll explore more about that in another post. For now, an example!

Example

Let’s say you make a robot product with the following features:

Pushes you out of bed

Makes your coffee

Turns on your computer

Checks your email

Asks Google what you should do today

Makes you a breakfast burrito

Drives the kids to school

Makes lunch

Makes dinner

Tells you a bedtime story

And you sell the robot in the following models. I’m going to color-code the shared items for you:

The 24/7 Robot

Includes all features except Tells you a bedtime story

The Morning Robot

Pushes you out of bed

Makes your coffee

Turns on your computer

Checks your email

Asks Google what you should do today

Makes you a breakfast burrito

Drives the kids to school

Makes lunch

The Mid-day Robot

Makes your coffee

Checks your email

Makes lunch

The Night Owl Plus Robot

Makes lunch

Makes dinner

Tells you a bedtime story

Pushes you out of bed

You need to create long, complex sales proposals, web content, and user guides for all of these versions. Wowsa. You’ll make that happen with content tagging. Read on.

Tagging

Tagging is the magic behind filtering. Using our robot example, let’s say you have an introductory section that lists all the features of your robot. Rather than maintain four different versions of the introductory section (24/7, Morning Person, Mid-day Person, and Night Owl), you would have ONE version in which the content is tagged for the different models. Like I said, magic.

Once again using our robot example, you would tag your introductory section as follows:

Our fantastic robot will revolutionize your entire life. Here's a list
of its features: [this content not tagged because you want this text to show
up in all versions]

Pushes you out of bed [tagged 24/7, Morning Person, and
Night Owl Plus]

Makes your coffee [tagged 24/7, Morning Person, Mid-day Person]

Turns on your computer [tagged 24/7 and Morning Person]

Checks your email [tagged 24/7, Morning Person, Mid-day Person]

Asks Google what you should do today [tagged 24/7 and
Morning Person]

Makes you a breakfast burrito [tagged 24/7 and Morning Person]

Drives the kids to school [tagged 24/7 and Morning Person]

Makes lunch [not tagged because you want this
text to show up in all versions]

Makes dinner [tagged 24/7 and Night Owl Plus]

Tells you a bedtime story [tagged Night Owl Plus]

After you had tagged all the content, you would push a button in whatever tool you use to publish things (again, could be an XML editor, FrameMaker, Word, etc.), and that tool would spit out the version you want to see. More on that in another post.

Version Matrix

I’ve found it helpful to maintain a version matrix for my sanity. The matrix can also be helpful as a reference for other team members. Continuing with our example, the robot’s version matrix would look like this:

Benefits

There are many benefits of filtering, but off the top of my head, here are some:

Produces reliable content every time. No more “Is this the version I put that improvement in a few weeks ago?”

Optimizes your content library. No more maintenance of ten slightly different versions.

You can show or not show whole topics, paragraphs, sentences, or a single letter (by tagging it).

You can produce many individual publications from one set of content files.

Bottom line is actually the bottom line: it saves time/money.

To be clear, the use case or need for filtering is only warranted in an environment in which you are already keeping multiple versions of the same content and using them in a variety of publications.

Start Slowly

As I mentioned earlier, filtering requires a mindset change. It takes some time to fully grasp and implement filtering. Start with one small project. In my next post, I’ll explain how to do that.

Feedback?

Are there posts out there that do a better job of explaining filtering? Please let me know in the comments. Also, any other comments and/or corrections are very welcome.