In a world where you will work 10 different jobs in your life, for the average person the key to winning is to know when to quit. If you give up too quickly you'll never win anything, but if you never give up you'll be bankrupt. It's hard to give up when you're passionate about the success of a pet project, but you must always be realistic about the market, and cut your losses when prudent.

How do you know when to quit? Well,Forbes magazine has a good article this month for entrepreneurs on When To Dump That Great Idea. Its only OK, but I agree with the top red flags:

Paying Customers Never Show Up

You Can't Sustain a Competitive Advantage

You're Not Ready To Quit Your Day Job

Decent advice for startups...

UPDATE: Money Magazine added this concept to their Myth #3 of What it Takes To Be Rich. Sure, Southwest Airlines and FedEx stuck to their guns... that's because they were right. But FedEx ditched their high-profile Zap-Mail service once fax machines became popular...

Some of these are obvious... like using GZIP, reducing DNS and HTTP requests, and leveraging the Expires HTTP header. Some are really innovative, such as CSS Sprites -- which sound really cool but difficult to maintain.

Others seem like bad ideas to be, like using the data: URL scheme to embed base64 encoded images directly in the HTML. I'd only do that as a last resort to squeeze out every last drop of performance... its much cheaper and easier to buy new hardware or become an Akamai client...

In all, its a good checklist to run through on every page you have, in order to make it load as quickly as possible.

The second edition of this book contains the original article from the New Yorker by Dubner about Levitt. Save your time: read the article online instead of this book. It's 5% the size, yet contains 80% of the same material.

There is a bit more info about Sumo wrestlers throwing games... and a good overview of cheating teachers. The book also contains info -- of questionable validity -- about Stetson Kennedy and the KKK.

However, what's missing is a good grounding of regression analysis, or an in-depth analysis of any of the subjects. Cheating, crime, incentives, information asymmetry, any of these would make a great book on their own... but the ADD-style of this book always left me feeling that something big was missing, and thus I couldn't trust that all arguments were presented.

The section on information asymmetry was so shallow, thet they didn't even mention The Market for Lemons by Akerlof. The coverage of cheating real-estate agents was so shallow, they didn't even cover that their book may create a self-defeating prophesy. Many sellers I know use the threat of firing the agent, and thus create the negative incentive of zero payment to a lazy realtor.

I was also shocked that nowhere in the book did he cover statistical significance or margin of error... He runs a few numbers, spits out a percentage, and we're expected to swoon. So what if his data says that realtors sell their own homes for 2% more than their client's homes? What's the frigging margin of error?

Throughout the book the authors joke about there not being an overriding theme to the book. Quite true: it did ramble on about disjointed things and left out a great deal of detail... perhaps that's a bad thing, and not something to laugh about.

Drupal hit a minor snag... the permissions data got a bit wacked on my blog yesterday, and anonymous users were getting an access denied message. So I re-set them, and things should be back to normal now. That must have happened during the last automated database backup...

I believe that mashups are the most powerful and underrated piece of the Web 2.0 puzzle. Mashups are lightweight applications, made mostly with JavaScript, that combine data from multiple sources to create innovative applications.

Blogs and wikis get coverage because they turn everybody into a web contributor. RSS feeds get coverage because they turn everyone into a massive consumer of web content. But mashups? Not as much coverage.

WebWare recently did a good overview of the three available (sort of) Mashup editors.

After Yahoo released Pipes, which allowed people with little programming experience create web mashup applications, Microsoft and Google had to follow suit... with Popfly and Google Mashup Editor respectively.

Neither are ready for prime time, as of yet. You need an invite, and they are highly stingy with them. However, early reviews lean towards Popfly... which I believe may be the first killer app built with Silverlight.

I personally believe that enterprise mashups will be huge business in the near term... not just for cool web widgets, but also for enterprise apps. Naturally, then need to be backed with SOA and some kind of distributed single sign on system, but those are good ideas regardless if enterprise mashups take the world by storm...

In case you use XPath as a query language to your repository (instead of SQL or something else), hopefully you are aware of a little problem called XPath Injection Attacks.

Anybody who knows web apps and security knows of the dangers of SQL injection attacks... many web apps are vulnerable to this. If you have a web form, and generate a SQL query with the data on that form -- without validating the data -- then you're open to attacks. People can inject whatever SQL they want into the web form, and trick your application into running their SQL instead of your SQL. This could cause data leaks, or even data deletion.

You can fix this problem simply by escaping all quotes in the data from the web form... as well as type checking dates and numbers. You may also need to count the parenthesis and remove SQL comments, depending on your application...

XPath injection works in an analogous way to SQL injection... the only different is that the injected attack has a different syntax. If your repository allows XPath query syntax, you'll need to do a lot more data validation to protect yourself...

Now, XPath is typically used to query single XML files. Very few people used it as a full query language to a database or content repository. In my humble opinion, that's for a damn good reason: XPath syntax is awkward and weird. Its totally a step backwards in both usability and performance... however, because of trendy new XML standards, XPath injection may be a bigger problem then you realize. You might be using XPath all over, or your repositories might allow XPath query syntax even if you don't use them.

Case in point, I'd highly recommend that anybody who uses that rotten JSR 170 protocol for content management PLEASE look long and hard at how secure your system actually is... You know who you are... I'd start by reading the XPath Injection Attacks article from IBM.

Seriously... I'm kind of excited about this. Partially because I got the title, but also because Oracle thinks that people who do what I do deserve official recognition. Oracle started this program for people who are something of a developer's advocate: somebody who helps out the Oracle community with tips, tricks, articles, or by working closely with local user groups. There's about 40 worldwide, probably growing to 60 eventually.

I also get the chance to chat with people on the product team to hopefully steer product direction based on developer need. So that means whenever I go out drinking with Alec, Andy, or my wife, its a business expense. Ha!

Anyway... Oracle is -- like Stellent was -- primarily a software company. Sure, they do consulting and training, but that's not their main focus. Therefore, a strong developer community is essential for the success of their business. Developers hate paying for consulting and training... so a strong community always means giving away great information for free. That's the only way to convince excellent developers to love the product. Trust me. When it comes to paying for software or training, the smarter the developer, the stingier the developer.

Oracle decided that it made lots of sense to bring me on board, since I already do the kinds of things a director should do:

The main disagreement is about SAML. I didn't see its value, and detailed Oracle/Stellent's architecture to explain why. James mostly agreed, except for one interesting use case:

If ECM vendors simply leveraged Active Directory not solely for authentication but also as a user store and mapped to it at runtime then the need for SAML disappears within most scenarios within the enterprise. It still ignores a potential scenario where your users aren't stored in any repository that the enterprise owns.

Bingo... the one situation where something like SAML comes in handy. Somebody has totally valid credentials to access the repository. However, the authentication and authorization of that user must be done by connecting to a server that is not owned by the enterprise. Stellent/Oracle can handle multiple user repositories, but typically only if its within the enterprise.

For example, assume the person trying to access your ECM system is a business partner, prospect, or customer... They already have passwords and credentials stored behind their organization's firewall, but if you can't access it, you need to duplicate all that info, and make them log in again. Until fairly recently, you were forced to do it this way: you could have SSO across an enterprise, but not easily between enterprises. Things like SXIP and SAML fix this, so you can have federated (or distributed) single sign on.

Imagine: one password to connect to the entire internet... The developers at Stellent knew a while back that something like this was the ultimate endpoint, but the question was which protocol was going to win out? SSL certificates are a management nightmare... Should we follow SAML/XACML because its a standard, or OpenID/SXIP because they are (fairly) open source, simple, and usable right now?

Which is better? Without a clear contender, or any any specific market demand, its very risky to take the lead... the safe bet is to be knowledgeable and reactive. If somebody asks for SAML, it's no problem to add it to Oracle. However, at present my money is against SAML/XACML for the long-term.

I've never deployed either enterprise wide, so I cannot speak about the maintenance problems... perhaps SAML is easy to maintain, but given its complexity, I'd find that surprising.

I'm also very nervous about SAML because it is endorsed by Microsoft, whose first attempt to solve this problem was the god-awful Microsoft Passport. Also, Microsoft has a long history of ruining open standards that threaten them. Active Directory is huge money, as is the enterprise search market, not to mention Sharepoint. I don't expect Microsoft to play nice for long...

Don't think so? Remember their proprietary Kerberos extensions? Or how about how they ruined SOAP with the ungodly complex WS-* stack? If Google tries to press harder into the ECM space -- and not just enterprise search -- then the other shoe will certainly drop, and decent SAML implementations without Active Directory may be impossible.

I sense danger...

And now I'm also nervous that SAML might be catching on in the ECM zeitgeist... one recent proposal included the terrible, rotten, just plain awful idea of integrating XACML, internet search, and ECM together. I challenge Guy Huntington to put his money where his mouth is, and implement something like that himself. I defy him to get his pet project to scale well or perform without millions in hardware for every ECM on the planet.

I was initially a bit bummed out by Adobe... in their announcement email to me it looked like I would miss the Adobe On Air bus tour. However, after checking their site, it appears they will be in Minneapolis on Sept 27th instead. Looks like I will after all get to hack rich internet apps (RIAs) with the experts.

Anyway, in honor of that, here are some other links I dredged up about Adobe Flash/Flex/AIR:

I first got interested in RIAs at an O'Reilly Emerging Technology conference 4 years back... I had three initial reservations about writing apps in Flash: stability, performance, and accessibility for the visually impaired. I brow beat one of the main developers about it in a highly attended general session, and later felt that I was perhaps a tad harsh... Some people do that. Anyway, thus far they have really worked on the performance and stability problems. I'm sad to see they've made little headway into the accessibility problems...

I'm sure they feel that blind people probably don't need flashy interfaces, and would instead greatly benefit from a non-Flash based interface... like what I talked about in The Future Of Accessibility. However, this misses an important point:

If blind people cannot access your Flash content, then neither can Google! All that great data embedded in your flashy interface is locked up... either in JavaScript, ActionScript, or whatever... and Google can't spider it with the GoogleBot.

Microsoft's Silverlight claims to be Googleable... plus there are other technologies (like Faust from the Minneapolis based Flash hackers at Space150) that work-around this gap in Flash... But Adobe really needs to get on the ball about this one if they really expect to take on both AJAX and Microsoft.

A new product, named Taberu Me, is taking business cards to an extreme. Instead of printing on something as mundane as paper, this one opts for printing on peanuts. Or beans. Or cashews. Why be ordinary when you can go for the gusto and be incredibly weird?

Pink Tentacle says "Taberu means “eat” and Me could either be an abbreviation of meishi (”business card”) or “me” in English, in which case Taberu Me would be saying “Eat me” — a message you probably don’t want to convey to your new business partner at the first meeting.

Innovative idea, and it will certainly make you memorable... but in Japan -- where trading business cards borders upon a sacred ritual -- I don't see this catching on.

Karl Fogel from the Subversion project had an interesting idea about government... perhaps its about time that we mandate our laws to be created inside a version control system. This could be an ECM system, or a source control system like Subversion, or something specific to government. In any case, it would help us track who made which change to which law, and when.

What I love most about Subversion is the blame feature... when something crazy goes awry, you bring up the source code and run blame. This will show you which user (or lawmaker) is responsible for which changes to the code (or law). If somebody snuck-in some untested code (or a $100 million kickback to a lobbyist), they won't be able to hide it too well...

In the image screenshot of blame in action, we see that there are four revisions of the file (8, 12, 13, 14), and the user "padma" is responsible for every change. This is similar to Microsoft's Track Changes feature... but since laws are high in content and low in formatting and photos, a plain-text specific tool might be a better match.

Of course, them Congressmen are tricksy... you'll need specific laws that only a Congressman gets access to the system to create a bill, and must have daily commits, and all amendments must be branches off of a bill to be merged in when voted on. I believe it will also be essential to have these be accessible over the web, so everybody can see not only who made what change, but what changes are being considered.

It would also be good to mandate a minimum time delay between when it was written, and when it can be voted on... not to mention syndication feeds and subscriptions so watchdog groups can instantly monitor proposed changes and proposed bills.

Heck, if Congress is making Wall Street follow Sarbanes Oxley, then its only fair to have some level of accountability in Washington as well...

Its unbelievably clunky and awful at the moment... but perhaps soon you can race through the real streets of San Francisco like Steve McQueen. Or perhaps it will always be clunky and awful.

I'm rather curious why they didn't do this in Flash... perhaps because they couldn't get clearance because Microsoft makes a competing product that nobody uses yet: Silverlight. However, that begs the question, why didn't they use Silverlight? And where are the dang mashup APIs, dudes?

OK, I think I've figured out the disconnect between me and James McGovern regarding SAML... When he asked if Oracle's ECM supported SAML, I was about as puzzled as if he had asked if it supported client connections via JDBC. Well... I suppose you could make that happen, but why not just connect directly to the database? It just made no sense...

Here's why: James has apparently never used Oracle's ECM solution, and is commenting on the poor architecture of other enterprise applications. I believe if he took a peek at chapter 2 of my book, he'd recognize that SAML support is unnecessary in this case... (psst, bug Billy for a free one ;-)

Here's the deal... back in version 3 of the product (we're now at version 10), the dev team saw the emergence of LDAP and Active Directory. We knew it made no sense for an ECM product to be both a user repository and a content repository. That just made things overly complicated. Plus, we could never keep up with the feature requirements. Instead, we recommend integrations that "slave" the content server to an existing user repository.

Put your users in a user repository, put your content in a content repository. It just makes sense.

Here's how a basic request operates: first, the content server asks the external system to authenticate the user's password (or token), and also return a "blob" of info about him. Every user repository has a different API, but this "blob" usually contains group memberships and attributes. The next step is to map the user data to content server specific security groups and security accounts. This mapping can be done in many many ways, from zero configuration to a few dozen lines of custom Java (or C++). Again, depends on the system. Finally, the security check determines if this user is allowed to execute the specific service (like GET_FILE), with the specific document, based on the security groups of the document, the security level of the service, and the user's roles & accounts.

It can get a little more complex with ACLs, personalization, and workflows, but you get the picture.

This happens on the fly: no authorization data is replicated, its only cached for a few minutes for performance reasons. Thus, all user management is where it should be: in the user repository. The content server does a mapping to a content-specific security model, no more.

This is called an External user. People also set up Local users, which are just stored in the database. Local users are discouraged in production systems, thus they are typically only used for testing and superusers. A small handful of customers use exclusively Local users, but they typically don't need, have, or want an enterprise user repository... thus, the only people who could possibly benefit from a SAML interface to Oracle's ECM would never use it.

But what if the Active Directory domain controller is on the other side of the planet, and performance sucks? It appears that some ECM systems make the interesting choice of replicating the user repository... but we'd suggest instead using a product that is explicitly designed to replicate a user repository, and "slave" the content server to that... such as Active Directory Application Mode (ADAM). Some customers went so far as to create home-brewed LDAP spiders to cache data, and then integrate all their apps with the cache.

I feel that making every application on the planet support SAML is a silly duplication of effort... I think its better that applications allow for loose slave-like integrations with dedicated user repositories. Use the right tool for the right job.

Now... SXIP and OpenID? Those are genuinely interesting... I'd bet that people will be willing to pay for an integration with them before they'll pay for SAML. Plenty of clients use SalesForce.com, and might be interested in a cleaner integration between content and customers.

I hate code patterns... they're a one-way street towards mediocre and uninspired software. But I love anti-patterns! Sites like Worse Than Failure love to chronicle the bad bad decisions that developers make, and the comments are a gold mine of how to do it right.

They started off by discussing the CRUD model for writing database applications -- short for Create, Read, Update, and Delete. This is used very often by REST fans, and it always gave me the willies. I'm happy to see that these guys dubbed it their #1 anti-pattern for good SOA design. It made many points that I argued long ago on my blog and in my book... but the article had it in concise list-format (emphasis mine):

The interface design encourages RPC-like behavior, calling Create, MoveNext, and so on, instead of sending a well-defined message that dictates the action to be taken. This is a violation of the first (Well Defined Boundaries) and third (Share only Schema) tenets.

Interface is likely to be overly chatty, since consumers may need to call two or three methods to accomplish their work.

Using a Sub for Create means that the consumer will have no idea if the operation succeeds or fails. When designing a service always keep the consumer's expectation in mind -- what does the consumer need to know?

CRUD operations are the wrong level of factoring for a Web service. CRUD operations may be implemented within or across services, but should not be exposed to consumers in such a fashion. This is an example of a service that allowed internal (private) capabilities to bleed into the service's public interface.

The interface implies stateful interactions such as enumeration (see the MoveNext and Current functions).

Abstract types (such as the Object returned by the Current function) result in a weak contract. This is another example of violating the third tenet (Share only Schema).

This is a very dangerous service since it could leave the underlying data in an inconsistent state. What would happen if a consumer added a new Contact (or updated an existing Contact) and never called the CommitChanges function? As stated earlier, service providers cannot trust consumers to "do the right thing."

yep... that pretty much sums it up...

I still prefer the model Stellent used for their SOA: a service should be a sequence of code to execute, or a CRUD operation to the database. Some code gets executed for all services -- like security or page rendering -- whereas other is limited to specific types of services, or specific services. The end user doesn't need to know anything about the back-end database schema, so its kind of silly to expose it.

I'm going to have to be more clear in my rants... my anti-ECM-standards rant is getting some people so hopped up they can't see straight. The latest is from Craig Randall:

Bex Huff left a comment on Mark’s post, which referenced his reply-via-post. Bex makes several good points, but at the same time what I perceive is that if an ECM standard isn’t reasonably or capably an end-all-be-all standard for the domain, then why bother. (Bex, if I misunderstood your post, please leave a comment to set me straight.)

huh... I actually said almost exactly the opposite.

In previous posts on my blog, I said that a end-all-be-all ECM standard is impossible. ECM is a marketing term, not a technical term, thus over 100 apps can claim to support "ECM", but can deliver whatever the hell they want. Good luck creating a standard interface to a marketing buzzword.

If you want some modest ECM standards, and a simple interface, fine. There are 4 such standard already, just pick a damn horse. Stellent/Oracle supports 3, and will probably support all 4 soon... just in time for the 5th to be finalized. Joy.

Not that it matters... nobody uses the standards that already exist, yet they keep asking for more. I understand why: every ECM standard is far far too simple to be useful. Why should somebody shell out thousands of dollars for an ECM system, and access it with a "standard API" that hides 90% of what they paid for? At the same time, an enterprise can have several ECM systems at once... and it would be nice if a middleware layer could have a single API to access them all. Nice, but not nice enough that they will willingly sacrifice important features...

I'm tired of wasting cycles on the pipe dream of a useful ECM standard, until the market changes enough for one to be feasible. That will happen after Microsoft fixes SharePoint, more consolidation happens in the market, and the vendors who merely claim to have ECM either shut up or go away. Like I said, probably not before 2009.