This Gilbane Group white paper sponsored by EMC
provides an excellent survey of technical issues, standards, and use cases relevant to the adoption of XML-based technologies for industrial strength applications supporting enterprise content management. The paper observes that "XML presents a number
of interesting challenges and opportunities for data storage. Relational
databases and full-text search mechanisms that have been the backbone
of many applications are not designed to manage XML content effectively.
A new class of databases has emerged that is designed specifically to
manage XML content. Typically called 'XML Native Databases' or just
'XML databases,' they incorporate functionality that greatly improves
the management, searching, and manipulation of XML to produce the most
effective XML data management solution. The World Wide Web Consortium
(W3C), the standards organization that developed XML, has also developed
many standards that can be used to access, search, process, and store
XML data. XML databases take advantage of these standards to provide
efficient and precise access, query, storage, and processing capabilities
not found in traditional database technology. The result is that
applications using XML databases are more efficient and better suited
for managing XML data. These W3C standards, including XML Schemas, XSLT,
DOM, XLink, and XQuery, are well established and tested in real world
applications. The XML databases that take advantage of them provide the
platform for industrial strength applications to manage XML content.
Like any new technology, adoption is slow at first. Then as the technology
matures and understanding on how to best deploy increases, applications
emerge that demonstrate the advantages of the approach. Today, we can
find many applications to manage XML content that demonstrate the power
and flexibility that can only be achieved through XML-native databases.
Information intensive companies such as the airline and manufacturer
described in this paper have achieved significant technical and business
benefits from their use of XML standards and database technology over
alternative approaches."

Sun's new rich Internet application framework should be a hit with Java
developers, but the promising preview trails Adobe Flex/AIR and Microsoft
Silverlight. Sun Microsystems recently unveiled the first public beta
of its JavaFX framework for RIAs (rich Internet applications). There's
a lot to like about the new SDK. It's rich in capabilities, and its
Java-like syntax makes it a good springboard to RIAs for Java developers.
But even in Java shops, Sun and JavaFX are behind not just one eight
ball but two. Heavyweight competitors Adobe and Microsoft, with Flex/AIR
and Silverlight, respectively, offer RIA toolsets that are not only far
more mature but also include tools that bridge the all-important gap
between designers and coders. The freely downloadable JavaFX Preview SDK
bundles the JavaFX compiler and runtime, the NetBeans IDE, and a NetBeans
plug-in for coding and debugging in the new JavaFX Script language. Sun
has also thoughtfully included a good number of coding samples and
templates... Java developers will no doubt find the declarative syntax
to make for speedier UI development and, ultimately, more appealing
interfaces than flat Swing calls. Interestingly, Sun has eschewed the
XML-based abstraction favored by, well, every other major RIA vendor.
Although I prefer XML for its clean interface declaration, there is
something to be said for the less-verbose, code-centric approach taken
in JavaFX... The JavaFX SDK is only a preview edition—and a good one,
at that. With Version 1 not set to launch until the fall, Sun still
has some time to shine up this project. Easy integration with existing
Java apps should make JavaFX an immediately attractive option for
creating enterprise dashboards or bringing a modern look to Java relics.
How far Sun's mature technology stack and the long reach of Java can
take JavaFX against Adobe and Microsoft remains to be seen, but the
Java camp finally has a heavyweight in the RIA game. It's long overdue.

This article introduces the EDI functionality within BizTalk Server
2006 R2, illustrating schema creation, document mapping, EDI delivery
and transmission, and exception handling. Electronic Document Interchange
(EDI) is a technology standard that has been around for decades. So
mixing it with a modern service-oriented architecture (SOA) and the
latest release of BizTalk Server may seem an unlikely combination.
Yet EDI encompasses the largest share of real-world business-to-business
commerce—nearly 90 percent of the current market—and is growing
rapidly year over year. As companies relying on EDI evolve their IT
architectures, the capabilities of BizTalk Server 2006 R2 are proving
to be a reliable, robust, extensible, supportable, and intuitive way
to solve both SOA and EDI infrastructure needs... BizTalk Server now
provides the same level of service that many value-added networks (VANs)
provide, with the additional benefit of the underlying BizTalk components
that have been essential to enterprise integration solutions and SOAs.
These include the development of business workflows through
orchestrations, access to a business rule engine, extensive
document-tracking capabilities, state management, and other similar
functions. EDI implementations in BizTalk Server 2006 R2 begin with
developing the schemas that relate to the documents being traded. Once
the documents have been defined, trading partners are created as BizTalk
parties and their specifications are configured to ensure the proper
processing and routing of EDI documents. Next, the specifics around how
documents will be delivered are implemented through a combination of
party configurations and BizTalk adapters. When solutions are in place,
document flow can be monitored in real time through the use of EDI
reports. All of these capabilities ride on top of the BizTalk
infrastructure and benefit from all the standard components such as the
MessageBox, orchestrations, ports, and pipelines... BizTalk Server
2006 R2 ships with thousands of predefined EDI schemas that function
as starting points for all documents exchanged by trading partners.
Generally these schemas are altered to reflect specific expected formats.
Though EDI has document standards, the reality is that two trading
partners who both exchange 810 Invoice documents may still have two
different representations of the 810 and therefore require two different
schemas. These schemas will be very closely related and may only differ
in one or two segments. For example, one may truncate a street address
at 50 characters while another allows for 100. But even this small
difference requires that the default 810 XML Schema Definition (XSD)
be modified and implemented separately for both parties...

That the textual phenomena of interest for markup are not always
hierarchically arranged is well known and widely discussed. Less
frequently discussed is the fact that they are also not always
contiguous, so that the units of our analysis cannot always correspond
to single elements in the document. Various notations for discontinuous
elements exist, but the mapping from those notations to data structures
has not been well analysed or understood. And as far as we know, there
are no standard mechanisms for validating discontinuous elements. We
propose a data structure (a modification of the Goddag structure) to
better handle discontinuous elements: we relax the rule that every pair
of elements where one contains the other be related by a path of
parent/child links. Parent/child links are then not an automatic result
of containment. We conclude with a brief sketch of the issues involved
in extending current validation mechanisms to handle discontinuity... A
number of questions remain open and will require further work: (1) Can
a principled set of criteria be found for assigning parent/child
relations to node pairs? What are they? Do the criteria apply at the
meta-language level, or are they a function of how document type designers
specify the document types they are working with? (2) Can discontinuous
elements be integreated into the notion of validity associated with
rabbit/duck grammars? (3) Can the algorithms for validation with
rabbit/duck grammars be extended to handle discontinuous elements?
(4) Can the ideas of multi-colored trees be applied successfully to
Goddag structures?

In his Extreme [Balisage 2008] "first person" talk, Patrick Durusau
asked some of the right questions about the recent explosive battles
over standardizing XML generated by Microsoft Office and OpenOffice.org.
I can't share his conviction, though, that getting through this
firefight is actually worthwhile. Durusau connected the Office Suite
battles to XML's long-standing quest for platform independence,
freedom from vendor lock-in, reliable longevity of data, and many of
the other ideals that have motivated the core community over the years.
He articulated the basic problem that has kept users from a deep
interest in XML for documents over the years: they don't care about
technical issues like XML validity, but rather about "you had it, I
got it, and now I can look at it." Durusau, though an editor of ODF,
has been remarkably even-handed in his discussions of the office format
scrum, seeing the battle as more of a distraction from the value of
the underlying project than a sign that the underlying project has a
deep flaw. I applaud his generosity, but find his even-handed
disposition all too positive about the value of what's actually been
accomplished... The underlying formats are both improvements in
openness, yes, relative to the previous pain of interpreting obscure
binary file formats for which interchange was an afterthought. Those
improvements, however, aren't particularly the reason for the
standardization battle. While I've perceived ODF as having a more open
process than OOXML, both formats had lots of software before they
arrived at the question of how best to share their data. The battle
over standardization is less a battle over formats and more a battle
over who gets to label their products as "open" to various markets...
It's not at all clear to me, however, that the "carrot" Durusau
mentioned, of 400 million users, really exists, at least from a markup
perspective. That may well be the market that the vendors are fighting
over, but it's hard for me to see any great benefit coming to those
400 million users.

Amazon's Web Services (AWS) are based on a simple concept: Amazon has
built a globe-spanning hardware and software infrastructure that
supports the company's Internet business, so why not modularize
components of that infrastructure and rent them? It is akin to a large
construction company in the business of building interstate highways
hiring out its equipment and expertise for jobs such as putting in a
side road, paving a supermarket parking lot, repairing a culvert, or
just digging a backyard swimming pool. More specifically, AWS makes
various chunks of Amazon's business machinery accessible and usable via
REST or SOAP-based Web service calls. Those chunks can be virtual
computer systems with X2GHz processors and 2GB of RAM, storage systems
capable of holding terabytes of data, databases, payment management
systems, order tracking systems, virtual storefront systems, combinations
of all the above... The AWS services fall into three categories:
infrastructure services, e-commerce services, and Web information services.
The infrastructure services are composed of the Elastic Computing Cloud
(EC2); Simple Storage Service (S3), a persistent storage system; the
Simple Database (SimpleDB), which implements a remotely accessible
database; and Amazon's Simple Queuing Service (SQS), a message queue
service and the agent for binding distributed applications formed from
the combination of EC2, S3, and SimpleDB... While Amazon S3 is designed
for large, unstructured blocks of data, SimpleDB is built for complex,
structured data. As with the other services, the name says it all.
SimpleDB implements a database that sits behind a lightweight, easily
mastered query language that nonetheless supports most of the database
operations (searching, fetching, inserting, and deleting) you'll likely
need. In keeping SimpleDB simple, Amazon has followed the principle
that the best APIs are those with minimal entry points: I count seven
for SimpleDB... Amazon SQS is a message queuing service in the vein
of JMS or MQSeries—only simpler... The Alexa Web Information Service
lets you dip into traffic data gathered by various Alexa tools deployed
about the Internet. You can query information for a specific URL, such
as site contact information, traffic statistics (going back five years),
and more. You can also discover how many links are on a given page,
how many URLs are embedded in JavaScript, or the more interesting
statistic of how may other sites link to the target. Some of the
important AWS components are still in beta. SimpleDB, in fact, was in
limited beta and not accepting new users at the time of this writing.

W3C's Web Applications Working Group invites implementation of the
Candidate Recommendation of the "Element Traversal Specification."
This specification defines the ElementTraversal interface, intended
to provide a more convenient alternative to existing Document Object
Model (DOM) navigation interfaces, with a low implementation footprint.
It does so by allowing script navigation of the elements of a DOM
tree, excluding all other nodes in the DOM, such as text nodes. It
also provides an attribute to expose the number of child elements of
an element. The DOM Level 1 Node interface defines 11 node types, but
most commonly authors wish to operate solely on nodeType 1, the Element
node. Other node types include the Document element and Text nodes,
which include whitespace and line breaks. DOM 1 node traversal includes
all of these node types, which is often a source of confusion for
authors and which requires an extra step for authors to confirm that
the expected Element node interfaces are available. This introduces an
additional performance constraint. ElementTraversal is an interface
which allows the author to restrict navigation to Element nodes. It
permits navigation from an element to its first element child, its last
element child, and to its next or previous element siblings. Because
the implementation exposes only the element nodes, the memory and
computational footprint of the DOM representation can be optimized for
constrained devices. The DOM Level 1 Node interface also defines the
childNodes attribute, which is a live list of all child nodes of the
node; the childNodes list has a length attribute to expose the total
number of child nodes of all nodeTypes, useful for preprocessing
operations and calculations before, or instead of, looping through the
child nodes. The ElementTraversal interface has a similar attribute,
childElementCount, that reports only the number of Element nodes,
which is often what is desired for such operations.

A beta release of the open source NetBeans 6.5 IDE is being offered
by Sun Microsystems. NetBeans 6.5 features a more user-friendly
interface and supports development with multiple languages. The beta,
available for download, has features such as an IDE-wide "QuickSearch"
shortcut, a more user-friendly interface, and an automatic Compile on
Save feature. Developers can build applications for Java, PHP, C/C++,
Groovy, Grails, Ruby, Ruby on Rails, and AJAX. Web frameworks are
supported such as Hibernate, Spring, and JavaServer Faces. The Glassfish
application server and databases also are supported. The beta release
serves as an update to a Milestone 1 release offered for NetBeans 6.5.
The new version officially is called NetBeans 6.5 Milestone 2. The
general release of NetBeans 6.5 previously has been set for October 2,
2008. Sun also has expressed intentions to eventually add Python
language support to NetBeans. Also on Wednesday, Jaspersoft is
announcing the availability of business intelligence development
capabilities for NetBeans and upgraded business intelligence support
for Sun's MySQL database. The Jaspersoft iReport plug-in for NetBeans
is a graphical report and dashboard tool for JasperReports, which is
an open source reporting product.