November 1, 2012

I like Portable Network Graphics for bitmap images. PNGs improve on the GIF format and are presentable in either PDF or web-based formats.

I was dealing with a client’s data this week that included PNG graphics. The output quality was really poor but when I opened each graphic in a program like GIMP the image quality was fine. Turns out the source PNG files were corrupt. GIMP was fixing them before rendering them to the screen.

I found the corruption by using a Linux based command line utility called pngcheck.

Once I discovered the problem, I used another command line utility called pngcrush with a shell script to loop through all the graphics and fix the errors.

March 18, 2010

Like most people I’m frequently asked what I do for a living. I tell people that I work with companies that have a large quantity of mission critical information that they need to be able to find in an instant and that information better be right every time.

The Economist magazine just had a really good special report on The data deluge that individuals, companies and governments face.

Some things that resonated with me from the report:

In 1971 Herbert Simon an economist wrote

“What information consumes is rather obvious: it consumes the attention of its recipients“.

I like his conclusion

“Hence a wealth of information creates a poverty of attention.“

The term “Data exhaust” was used to define the trail of Internet user clicks that are left behind from a transaction. This exhaust can be mined and useful. Google refining their search engine to take into account the number of clicks on an item to help determine search relevance is one example of using data smog. I really like this “Data exhaust” term and believe it fits well with trying to make sense of large data sets. Smoggy areas could indicate that instructions are not clear enough in service documentation or properly mined, it could also indicate an impending issue with a particular component in a product.

“Delete” written by Viktor Mayer-Schönberger argues systems should “forget” portions of a digital record over time. Systems could be designed so that parts of digital files could degrade over time to protect privacy yet items that remain could possibility benefit all of human kind. The concept of donating your digital corpse (medical reports, test results etc.) to science comes to mind as a good example of this concept. While I might not want people to be able to link my name to my medical records, the records themselves with no name attached would provide a lifetime of data that could be used to advance lots of different fields.

Being able to consistently create the right set of rules for the ethical use of various types of data exhaust will be tricky. The article in the Economist mentions six broad principles for an age of big data sets that I liked:

September 29, 2008

As a consultant, I’m often parachuted into complex projects and need to be able to appear intelligent in a short period of time to both technical staff as well as business people. In setting up a publishing system based on the Darwin Information Typing Architecture (DITA), I’m faced with trying to communicate the specialized structures and element names needed by an organization.

One of the most important principals of DITA specialization is the concept of inheritance. Inheritance allows you to use the structures and semantics previously defined by others as a starting point for your specializations. But how do you communicate this idea in a repeatedly simple clear concise manner during the design process? If a picture is worth a thousand words, a UML model is invaluable.

The Unified Modeling Language (UML) is a graphical notation that is particularly good at expressing object-oriented designs. UML went through a standardization process and is now an Object Management Group (OMG) standard.

I like using UML because it allows me to communicate certain concepts more clearly than alternatives like natural language. Natural language is too imprecise and subject to interpretation by the reader. A DITA DTD or Schema is very precise but not something that should be created during the design process. So I use UML to communicate and keep track of the important details like inheritance.

A lot of people talk about the learning curve associated with UML. I’m not advocating that team members need to be exposed to all that UML can provide. Only the parts necessary to convey the important details of the moment.

Let’s say that we want to build a new type of DITA topic for creating slide presentations. The following diagram conveys a lot of information:

My conversation with the assembled audience of technical staff and business users would go something like this:

Each green ellipse represents an element already in existence. Each yellow ellipse represents a proposed element. The double angle brackets in the ellipse are used in UML to define sterotypes. A sterotype is the vocabulary extension mechanism built into UML. In our case we are using sterotypes to indicate which organization (DITA or SGMLXML) an element reports to. The dotted lines represent which elements are included in others and which elements extend a base type (inheritance).

Once the audience agrees that the model is correct, it can be included in the specifications and developers can create the new elements and the properly-formed class attributes required.

In our example the definition of the class attributes in the resulting DTD would look like this:

Besides communicating DITA specialization information, I also use UML for other aspects of my deployments such as use case design, system deployment and various activity diagrams. No single model is sufficient to build the system but by using the same modeling language for all the models, it is easy to impart all of the important analysis, design and implementation decisions that must be made by an organization before deployment.

-Charles Angione

Comments Off on Using UML to describe DITA Specializations

July 30, 2008

The Arbortext crew in Ann Arbor is getting a new workspace. As I write this, many hardworking and dedicated employees at 1000 Victors Way are packing up their offices and preparing to move down the street to a new building.

Victors Way — what a noble address! I first came to the four-story building 11 years ago as a potential customer. It was the middle of the December, and I was meeting with a sales rep and some lead engineers to show them a proposal featuring Arbortext. I remember Ivan, my sales rep, later telling me that everyone was eager to meet someone willing to come to Ann Arbor in December!

I came back to Victors Way the following year for the Annual Users Group Conference (AUGI) held in Ann Arbor. What a fun and memorable event! Arbortext made sure that just about anyone a customer might want to speak with was at the conference. I remember watching the trepidation on the faces of some engineers as they walked across the street, their eyes full of dread and looks that said, “Oh no! I have to talk to a customer!”

But that kind of openness and innovation is what gave Arbortext the reputation it maintains today — fiercely dedicated to standards and open at all levels of the organization, from the CEO to the engineer that built a feature.

Eventually, I joined Arbortext as a consultant, and it was one of the best decisions I’ve ever made. In addition to an amazing career, I’ve gained a wonderful extended family. When I came in for my job interview, it took my interviewers 20 minutes to get me to the conference room because we kept running into people I knew along the way!

Ever since I became part of the Arbortext team, returning to Ann Arbor (no matter what the season) has always felt like coming home. I see friends that I stay in touch with but don’t get to visit with in person very often. I make sure to stop by every office, no matter what floor, and chat with people that I’m genuinely happy to see.

One of my earliest major assignments at Arbortext involved creating one of the first linkbases in existence. At that time, the XLink standard was brand new. During those early days, I often found myself standing outside in the middle of the night next to one of the walkway lamps that light the entranceway. I’d look up at the ARBORTEXT sign on the building while I smoked like a chimney and prayed that the application would actually work! Eventually it did work, but only after some pain and suffering.

The four floors of 1000 Victors Way represent more than a decade of my life. Although the facilities in the new building are far superior for the Ann Arbor crew of today, I will always look back at the Victors Way building with fondness, and I will stop by when I’m in town. The Ann Arbor crew and Arbortext customers should be proud of what was accomplished within those walls, and I hope they are as excited as I am about what will come out of the new facility.

October 30, 2007

I have recently been experimenting with various mind mapping software. A mind map is a diagram used to represent ideas, task and other things that are linked, arranged and then rearranged as more information becomes available. In the 80’s and 90’s sticking post-it notes on a conference room wall and connecting them with yarn would be similar to a mind map today. One of the advantages to mind maps is that in many instances you can put more detailed information behind the topics. That way any notes pertaining to a particular topic stays with that topic no matter where it ends up in the map.

In some ways mind maps remind me a bit of usecase diagrams in UML and I think that we will see more mind maps and usecases being used in topic oriented document design. Many of the mind map tools store or can export the maps as XML documents enabling a developer to write transforms that for example might create a DITA topic for each item in the map.

I’ve been experimenting with three applications:

FreeMind

Freemind is an open source project written in Java. I like the fact that the default storage format is XML based. The schema is simple to understand and allows you to create effective mind maps. It falls down in being able to associate additional notes behind topics.

Semantik

Semantik formally known as Kdissert is my favorite if you are running Linux. This is a must have application. Unfortunately there is no windows equivalent which limits its appeal in most business situations. Semantik allows you not only to create complex maps with lots of topics but store additional information behind the items as well as links to other files. The topics and information can then be exported as a single document. Currently there is an export template for Docbook which implies that it would be fairly easy to make one for DITA.

Mindjet

Mindjet is commercial software and not cheap. I’d say this is a company that Microsoft should acquire like they did VISO back in the 1990s. It’s good at creating maps and placing content behind them and I think it would fit in perfectly with their Office Suite. The XML export isn’t the cleanest thing in the world (namespaces) and the file order needs to be studied carefully. On the positive side, once I get past those challenges I didn’t have any problems creating XSLT transforms to either Docbook or DITA.

July 26, 2007

It’s two in the morning. I’ve been done with the content of the document I’m working on for two hours and I just finished selecting the fonts I’m going to use. This is kinda geeky even by my standards.

I blame “The NON-DESIGNER’S COLLECTION” by Robin Williams and John TOLLETT. After reading the three books I guarantee you will never look at serif and sans serif fonts the same way again. After a couple of arduous hours, I’ve finally selected “Franklin Gothic Medium” as my sans serif font and “Book Antiqua” as my serif font, although I would have preferred to use “RotisSemiSerif” but didn’t feel like paying 200 bucks for the font.

September 6, 2006

Adobe announced the end of support for their SVG Viewer arguably the best free robust SVG viewer currently available as of January 1, 2007. Disappointing for companies that have used the plug in within their own applications trusting that they had support for a major player in the industry. January of 2007 is not much time to consider alternatives! The claim is that there is now enough support in Browsers and other free plug ins that they do not need to continue to support the plug in. I’m not sure I buy this as much as it competes with other technologies that Adobe recently bought.

August 7, 2006

Just finished reading Rick Jelliffe’sPresentation at Open Publish 06. I like his definition of XML Governance – Ensuring that the correct schemas, skills, personnel, procedures, practices, politics and feedback are in place for well-managed XML.

He goes on to relate the 10 Guiding Principles for Effective XML Development in the Extensibility Manifesto to their governance issues.

June 8, 2006

A significant amount of time and energy has been spent by CMS vendors talking about the benefits of bursting or chunking monolithic XML documents into individual objects. The benefits that are touted are object reuse, lower translation costs, individual routing and approval workflows and the ability to enhance content instead of re-creating the bare minimum necessary.

For this to work it assumes that the organization or the individual creating and managing the content has consistently applied the concepts of reuse at each of the bursting or chunking levels. In practical terms, does each topic or section actually make sense as a re-usable object? Very rarely. More often there are a few sections/topics/paragraphs in a document that scream to be reused. In addition, these reusable objects can then be consciously put in a location for others to find and re-use. Thus in many instances, it seems more practical to provide templates that can be used to create re-usable objects where appropriate and leave the rest of a document a standalone monolithic object.