Ponderings on Life, the Universe, and Information

Menu

Widgets

Search

Everything here represents my own opinion unless clearly stated otherwise. I do this on my personal time for my own satisfaction. Nothing should be construed as specific advice as you have to pay for advice that goes beyond generalizations.

Folders, A Nutritious Part of Your Content Management Diet

Sean Hederman of Signate Document Management wrote a rebuttal of my AIIM article defending folders. He admits a bias as he works for a Document Management vendor built around search. I thought of writing a short response saying that he missed the point of my article and that he had nothing to rebut, but where is the fun in that?

My article was itself a rebuttal against an AIIM article saying we should get rid of folders by Chris Riley. I never meant to imply that we should get rid of search, only that we shouldn’t get rid of folders. I like search, and metadata for that matter. I complain vocally when search doesn’t work well.

With that in mind, here are his points. Each heading is a direct quote of a heading from his post.

People are used to folders, Rebuttal: They are used to search as well

This one is a solid point.

By that logic we shouldn’t use search engines to access the Web, we should organise it into folders instead, just like Yahoo used to do. You see, people are also used to search engines, they use them every day of the week.

Agreed. Thing is, people use search when they don’t know where something is located. They DON’T use search when deciding where to store things. They like to place things where they know how to find them later. People feel more comfortable storing something in a folder than spending time adding multiple attributes and hoping that search can find it later, sometimes MUCH later.

Folders didn’t work for Yahoo because people weren’t browsing for content they had placed in the folders.

Think of it this way, folders are great for storing, retrieving the known, and discovering related content (thanks to Shane for the latter point). You need search to discover the unknown. You can use it to retrieve the known, but that is based upon user preference.

Search Engines fail, Rebuttal: So does everything else

This is one of my favorite points that he raises because it uses the word “fallacy” and allows me to illustrate a key point.

Well, if your technology is based around an unreliable bolt-on search engine (looks meaningfully at SharePoint), then yes, this is a valid concern. If your entire system is designed around search, then the search engine is the core and any folder-based view would be the bolt-on, and thus would be the one more likely to fail.

He then accuses me of using a fallacy of the excluded middle, also known as a false dichotomy. He is wrong as I don’t propose an either-or situation. In any good system design, you need at least two methods for users to accomplish a task (which is why we still have menus in the age of right-click). In CMS, search would be one option for retrieval, folders another.

That aside, almost every Content Management system OEMs their search engine (I am not focusing upon WCM solutions here). As such, it is usually a locked-down version that does not have the full capabilities of the underlying product. I had a FAST expert on one Documentum project and he constantly complained about what had been done to the FAST engine provided with Documentum. It was done with good reason (e.g. simplify support), but it wasn’t all that the original product was.

In most systems, search is a separate process on the server. If it goes down, the system it still accessible, even if it isn’t fully functional. If the core process goes down, it doesn’t matter if search is up.

Folders help you organise, Rebuttal: Why manually organise?

Going to keep this one simple. Most of what he said is covered in my original post as Sean clearly comes across as someone who thinks folders aren’t necessary at all.

Not one person I’ve ever spoken to about their requirements from a document management system has ever mentioned the word taxonomy. Not one. Ever.

Sean needs to get out more or work with larger organizations or those further down the maturity path for Information Management. Not saying that taxonomy is always the right answer, but if he’s never run across it, then he hasn’t talked to enough of a breadth of organizations.

The simple answer, some people like to manually organize. That process helps them weed out unneeded content and the resulting organization can reveal a lot about a business.

Not using folders cripples systems, Rebuttal: Only if the developers were idiots

This one made me laugh. Mostly because of the “developers were idiots” phrase.

He starts talking about operating system limits, which I have never dealt with in this debate. The whole point of using Content Management is to transcend the file structure of the operating system. Most of his arguments revolve around the folder structure of the CMS equating to the one on the OS. I advise steering clear of any CMS that is like that.

Essentially put, a well designed system never has a “dump” location for content. Everything has a logical place that make business sense. I think Sean is limited here by his lack of exposure to the working of the larger Content Management marketplace.

Search Engines can’t read your mind reliably, Rebuttal: nothing can

Yeah, but when I put something into a folder, I know where it is. If the folders are logically organized, someone else can find it as well.

The real point I was trying to make is this, search engines have some limitations. One is that their algorithms are still evolving. Actually that is minor. The real issue that that not everyone knows how to ask a Search Engine the question correctly. You can train and use tricks behind the scenes, but it will be years before it always works.

Remember, it isn’t enough for one system to work, but for all search engines and those using them to work correctly.

My Conclusion

Sean concludes by saying that search should be the center of you Content Management system. Personally, I think neither search or folders should be the center.

In an ideal world, I would tell a system what I wanted and it would always return the correct file every time. This isn’t an ideal world. We are dependent on full-text search because people will only do minimal tagging unless forced and if forced, they will find a way around it.

Oh and for the record, the largest CMS I ever implemented did not expose any folders/hierarchy to the users.

Like this:

Related

Post navigation

20 thoughts on “Folders, A Nutritious Part of Your Content Management Diet”

Steve Bicklesays:

Folders, folders! Don’t get me started on folders… Ok I didn’t read the AIIM article but here we go…

Folders are at best a metaphor to describe a limited set of attributes, forced into a ‘meaningful order’. Unfortunately what is a meaningful order for one group of users is often not for other groups of users (insert your favorite finance vs. engineering example here). In reality a large ECM implementation will not ultimately be focused on a single group of users and the folder structure will ultimately not best serve a large proportion of the user base. Folders are just legacy from file systems that happened to be convenient for the early generations of ECM solutions. We need to ween ourselves off this now inadequate metaphor.

Ideally the focus should be on the capture of the correct minimum necessary set of attributes for the document and its related business process. Once these are defined then the various groups of people who need to contribute and retrieve documents can have the attributes that are important to them surfaced in a virtual folder structure.

Folder structures have been a useful mechanism to apply some of the initial attributes to a piece of content. Walking a folder structure to place a new item can enforce a consistent taxonomy on the attribute data (which is applied according to the chosen location), and is a metaphor thats easily understood by file system users, whose experience has been gained on PCs where folders have been the only tool (aside from those who register all their documents in a spreadsheet, but we all know how well that works ;).

It can be seen that walking a folder structure to place a piece of content is the logical equivalent to sequentially selecting a set of dependent attributes from an attribute dialog. Unfortunately the folder structure quite often forces unnecessary dependencies between attribute values due to its rigidly enforced hierarchy, whereas an attribute dialogue would (if correctly configured) only enforce required dependencies and be able to enforce a more complex set of interdependent values when required.

Having contributed documents/content to a system, users need a convenient mechanism to retrieve them. Folder structures really only work well for key daily tasks where the location of the content and the local taxonomy is well understood. Even then this is not satisfactory without a capable search solution along side for when it doesn’t deliver.

Both search and folder structures (or some logical equivalent) are required and must be able to work well together.

In many, if not most, current systems the underlying mechanism to provide a folder structure is driven by a rigid data model that’s separate from the content’s attribute data model rather than driven by it. Additionally search is often provided as an adjunct to the product via a separate indexing engine. The outcome is that the means of discovering content is often provided by two completely separate mechanisms which are melded together in a compromised manner that doesn’t really serve the users well. This is further complicated by the whole question of full text indexing and user confusion between search results derived from indexed content versus values from their taxonomy.

The solution for much of this would be to remove the dependence upon a rigid underlying data model for folders, and to provide an alternative to rigid folder structures where structured access to content is presented dynamically as required, using incremental or faceted search results based on the content’s attribute model. This will require a well integrated, more sophisticated search solution, capable of providing structured presentation of results (but not too folder like).

This may be what Sean is striving for, its certainly the way that I’d like to see things move.

Thanks for your comment. Faceted search is a pretty world, but the technology isn’t there yet from a scale, response, accuracy, and stability perspective (I have high standards). Besides, the best use of folders is for storing and subsequent retrieval, not for finding new things. It is a balance, which you mention. Folders may be legacy, but they are a reality and are still useful.

Oh, something I forgot to mention, this is almost a religious argument. Nobody is going to change sides. The best you can hope for is a peaceful coexistence and a shared hope that the technology will continue to evolve so that our children don’t have to fight the same battles. 🙂

I think it is in our best interest to lead people towards the proper solutions and not just towards taxonomies and folder structures. I’m not saying that we through the baby out with the bathwater here. I am saying that folder structures are not the end all solution, for the record I am not inferring that is what you are saying either.

The issue i have with Folder structures and taxonomies in general, stems from the evolution of folder structures (thank you Mr. Gates) as in the use of an individual’s computer or network drive, we have all see the shear chaos that follows this unstructured paradigm. In every client the first order of business that takes a tremendous amount of effort is the normalization of several departments “taxonomies” The evolution of the folder structure comes from as stated the individuals PC. Where each folder structure is only navigable with tacit information from the users head. You see this same phenomena happen at the department level, that is the need for tacit information to navigate the custom folder structure, which makes this such a difficult transition and in some cases mitigates the effectiveness of the taxonomy when moving to a ECM system. The change from the old tacit storage system to a new folder structure leaves the end user leaning on a robust search engine.

I think the reality is that Folder structures are still a viable way of organization information in a traditional document management paradigm, or in a records management paradigm, but they do loose their effectiveness as you move away from those paradigms into the business processes paradigm. Further the use of robust metadata enhances any retrieval of information and supplements the short coming of the folder structure itself

You mention the FAST paradigm with Documentum, and that lack of “usablity” was one of the major reasons Documentum was working on a xPlore even before the EOL announcement from Microsoft. The inability for people to navigate folder structures had always necessitate the need for a robust search engine, fortunately now with xPlore documentum has one.

In essence, you get two kinds of searches, the search on attributes and the search on text or content for that matter. Folders is just another attribute linked to the document, so you do a type of search anyway.

From working with Documentum Desktop and Webtop, both have a place. The folder structure was definetely faster on Desktop, but slower in Webtop, especially large folders with lots of content in them. The text search works better in Webtop but is slow and cumbersome in Desktop. This might also be to do with the way that it was implemented.

So for now I agree that you need both, and have to find a balance between them. My problem though is that we are still stuck with only these two ways of accessing content.

In my view, tweets and status updates is also content, just smaller and therefore easier to manage. But what I would like to see is that CMS follows the same “big data” principles where content is stored and summarized in different ways. I’m only now starting to explore Hadoop, but from what I saw, it creates extra indexes and summaries of the data the whole time, thus making the lookup of data faster and easier. It also does not adhere to a strict relational data model, but rather the data describes itself in the database (that is my understanding anyway).

But just think, using these principles in a CMS environment, users will have the ability to have content pushed to them. They will be able to subscribe to lists or topics, and will see the content as it appears. I have been in companies where the same type of investigation is done in at least 3 departments, and they don’t know about one another. This would help people to discover keywords and phrases used in the company and will make working together easier.

So at this stage, we need both to get to content in different circumstances, I just wished that content management can now also evolve into something new and more usefull. Mabye then it will also make more sense to customers and they will see the real benefit, not only the regulatory benefits that a lot of them is using it for now.

I LOVE the dismissive shots at my supposed lack of experience. So classy.

I’ve seen taxonomy-based systems fail and fail hard again and again and again. We wrote Signate as a search-centered system as a response to the lack of flexibility and sheer administrative burden of such systems.

If an organisation has a canonical, detailed and established taxonomy, a folder-based idiom would definitely work, and probably would even work better than a great search engine. As I pointed out in my post I think that such organisations are few and far between. In my experience most organisations are very vague in what they need from a DMS/CMS and need something that changes with them. I’ve yet to see a folder-based system do this without immense pain.

“When I put something in a folder, I know where it is”. Umm, yeah, and when I upload an invoice for customer X, I know how to find it. The difference is that when someone changes your taxonomy, you now DON’T know where to find it. I still just type something like “Invoice X”.

“I think Sean is limited here by his lack of exposure to the working of the larger Content Management marketplace.” – Clearly not nearly enough exposure. I’ve yet to see a folder-based system WITHOUT a “dump” location after they’ve been a year or two of operation, but then my exposure is minimal since I’ve only been consulting in DM and workflow for a trifling 13 years, and across, oh call it 30 odd major organisations including Coca-Cola, an airline, IT houses, medical aids, insurance companies, financial houses, and a stock exchange.

I’ve had to clean up the mess left by taxonomy focused consultants again and again. I’m not saying that you are the same. I have no doubt that when you design systems you do so with professionalism and you design them for change. The thing is in my experience people like that are few and far between.

“Not using folders cripples systems” – I was responding to a point you made and explaining why it was wrong. I clearly misunderstood what you were trying to say. Please explain more clearly what you mean by “One of the problems that you get when you don’t use folders is that you can cripple most systems. While few systems claim a limit to the number of documents that can reside in one location, there is a practical limit”

You claim an immaturity for search engines and their users that I simply don’t believe exists. I’ve yet to see the search technology we use in Signate fail, ad I’ve yet to see it return dud results, and I’ve yet to see it fail to find something. When we originally designed Signate search was going to be done via database queries with this technology as a backup. It was so fast and reliable that we turfed the database queries and made it the primary mechanism.

Again, if you HAVE to have a rigid taxonomy, use folders by all means. But I really think that the costs and effort involved in maintaining such a system is too high for the benefits it brings. Search brings the documents to you with minimal fuss, and virtually zero administration or upfront design.

Sean, few things. The first is that I didn’t knock your experience, just the breadth of your exposure within the Content Management industry. I don’t doubt that you know what you are doing and likely do it well. Keep in mind I could only judge by your statements. Saying you never heard them talk about taxonomy typically indicates limited exposure in the industry.

As for search engine immaturity, It is if there are only 2 choice, but I think of it as a gradient. It is MUCH better than 10 or even 5 years ago. There is just more road to travel. Better algorithms and Moore’s law to handle computational constraints.

I’ve also had to clean-up taxonomy messes. I’ve also had to install some taxonomic order when there was nothing present. I’ve even had to clean-up my messes as things change.

Finally, if the word rigid comes anywhere near an IA, then it is done wrong. If the taxonomy changes and you can’t find something, then both the process and taxonomy are broken.

So then essentially, we largely agree. I just happen to think search is ready to pick up the baton right now, and you don’t.

As for taxonomy keep in mind my quote was “Not one person I’ve ever spoken to about their REQUIREMENTS from a document management system has ever mentioned the word taxonomy”, which is absolutely true. I’ve heard numerous consultants mention it, but it has never started from the sponsors lips.

Nice post Lawrence, interesting stuff. I oversee both our ECM and Search research at Real Story Group and I agree 100% with you. A good ECM deployment utilizes both folders and search its not one or the other. Particularly in very large deployments of high millions or (more and more common) billions of documents, if there is no filing structure there is complete chaos.
A good filing structure can also be exploited by a good search engine, they complement each other. Taxonomies are indeed common enough in the world of document management and something that comes up in nearly every customer discussion (though they are not always called taxonomies). We recommend that they are kept simple (big bucket approach), but most importantly and to your point, I find that the vast majority of users do actually file things away in places (folders) where they can easily retrieve them in future, and when needed search for things they have either lost or have no idea if the document even exists or what its location is. Search engines are great and getting better, but they are not and never will be IMHO a replacement for a sound ECM system – nor should they be, they are something quite different.

Why not put another wrinkle in the conversation… The largest system I had implemented had a base folder structure, that most users ended up not using, but some did use it. Most users built their own taxonomies using Dynamic Views from Glemser Technologies (http://www.glemser.com/default.aspx?pageid=202). First the users would perform a search, save the results and “Group By” to create folders. They would create taxonomies by Region, Country, LC State etc. Each time they would drill through their Dynamic View, it was performing search to show them what was in their “folder”.

ECM systems shouldn’t put any stock in folders – they should be totally arbitrary and as free and easy to create as they are on your local PC. You should be able to nest empty folders in empty folders as much as you like in whatever whacked out taxonomical disaster you like to call your own filing system.

But the content that goes in them needs to be all managed centrally, and has to have its place in a meaningful enterprise wide management framework, We need to be able to do both, if we want these horrifically expensive systems we build and sell to ever really work properly.

Dirk; I think the problem is with the implementers. I have yet to see a folder-based DM system with anything approaching a proper search engine in place. Take SharePoint as an example. Up until SharePoint 2010 and the optional extra FAST search engine; the search was so poor and unreliable that it was all but unusable.

Having the option of folder access makes one lazy and use that for everything in the system. I guess the corollary is also true; us search centered implementations dismiss folder access and see it as an unnecessary bolt-on. Since search indexes can be changed so easily, folder paths in such an implementation are not easy to make reliable and canonical.

Microsoft’s best practice advice has changed from folders (Sp2001 and Sp2003) to definitely NOT folders (Sp2007) and even though metadata management is massively improved in SP2010, they seem to have stopped suggesting folders are bad (again…..).

In my experience users soon get used to alternative browsing and searching mechanisms when they are looking for content – when they get confused and seem to want to return to the folder paradigm is when they are storing / filing / saving the content they have created – there seems to be something largely psychological about having a definite place to put something, rather than tossing it into a big bucket even if it s appropriately tagged with metadata.

It would be very interesting to get into the learned journals and see what in depth research has been done on this front.

Finally, I have made the SP2007 search work just fine over a large corpus of documents, you have to ‘manage’ the search engine, and of course you absolutely must have decent metadata on your content, appropriate search scopes etc However I have also seen it in the same pretty much unusable state that you mention – just proof that this is not a bi-polar, search OR folder argument, and that different solutions work better in different contexts.

Oh how I wish we could live in a world without folders! Until of course I want one.

I think there is little chance folders will every go away and that there is little chance computers will totally be reduced to the “google window” a box with a search input and button. There simply are going to be many different needs and ways to do things.

Though I hate folders myself for many reasons there sometimes is just a need for them. And most users just get how to use them.

Back in my Stellent days they were experimenting with a folder generator. So you had meta tags, and the the users could select the parent and child axis from the meta data. So one person may choose to have “project” as the parent folder and another “company”. So maybe that’s a compromise. Meta data driven folders.

Interestingly enough this approach is exactly what we’re working on for our WebDAV support. WebDAV requires a folder hierarchy, but doesn’t require that each resource be stored in only one place, so a flexible approach like this is ideal.

Yup, you know this guy, yes he has a beard, and that beard can kick your ass.
Stop reading now, and take the action you need to take
your business to the next level. However, when you are using
such popup ads, for your website, it is very important
to make sure that you are using the right graphics and words.