Look and learn

By Editorial Content on September 6, 2017

We are on the cusp of a new wave in enterprise content management as automatic content analysis and machine learning provide better access to stored data. James Goulding speaks to Greg Milliken about what the future holds and why M-Files is viewed as ‘an innovator’

Greg Milliken, Vice President of Marketing, M-Files Corporation

For decades, document and then content management systems have been promising an end to document chaos. Yet, according to a recent survey by M-Files Corporation, 95% of UK organisations still face challenges when trying to find, access and edit documents.

63% sometimes have difficulty finding information

64% find that documents are often saved in incorrect folders or systems

Half of workers complain about ‘version creep’, with multiple versions of documents saved in different places

Four in 10 encounter problems caused by the incorrect naming of documents

29% have problems accessing documents from different devices

63% say they have had to recreate documents that already existed because they were unable to find them.

Clearly, there has been progress in content management – flexible working, digitisation, mobility and the cloud attest to that. Even so, a list from 10, 20 or 30 years ago might have looked very similar. So, why do these problems persist?

Information silos
One reason, claims Greg Milliken, vice president of marketing at M-Files Corporation, is the proliferation of information silos, including network folders, Sharepoint, traditional ECM systems like Opentext and Documentum, emerging file sharing systems like Box and Dropbox and core business systems like CRM and ERP systems. Research by AIIM shows that less than 40% of the ECM systems in use are integrated with another core business system.

“There’s a lot of fragmentation out there. Even a small to medium-sized business might have Sharepoint and some file shares and maybe Salesforce. Just that presents challenges. Even with these systems, it is difficult to find stuff. A given system might be great at finding what’s in it, but what if something you need that’s related to that customer is off in the file share or in Sharepoint? How do you get to it when you’re in Salesforce or any other flavour of CRM or ERP?” he said.

This, says, Milliken creates the problem of ‘dark data’.

“What we mean by ‘dark data’ is when somebody creates something hat they store in some folder that nobody ever finds again. It goes dark. Being able to overcome that so you can always find the most relevant and valuable information when you need it is what’s driving interest from companies – finding and harnessing what they have, eliminating duplications and unifying access any time, anywhere,” he said.

What customers don’t need, he says, is another repository. “The message we hear is ‘Don’t come in here and tell us that you can just give us another system that is going to fix everything, because that’s how we got multiple silos to begin with.’ We think what’s needed is the ability to get more value out of existing assets through integration.”

Ease of use
Central to this is improved ease of use.

“Traditionally, ECM systems have been really complicated; they’ve required lots of services and customisation, which have created barriers to the idea of unified access to information and ensured that legacy systems retain their position – every company we talk to still uses network folders, for example. Users have resisted ECM systems not only because they’re complex to implement but also because they can be complicated to use. People will even resist using a tool like Sharepoint, which is in almost every company, saying ‘I’m not going to put it up in Sharepoint until I’m done with it’ or ‘If I put it up there and change a copy here things will get out of sync’. Day to day challenges like these have been heavily influenced by the architectures of these systems: they’ve been static and they’ve been heavy around services, so hard to adapt, which has held back adoption.”

Milliken added: “The rise of Box and Dropbox is an immediate indicator that usability has been lacking. Granted, they don’t do a whole lot – they’re just a folder structure up in the cloud – but they’re simple and they’re easy. So we think that’s a fundamental part of the future.”

Integration with SAGE means documents can be stored and managed in M-files

Stumbling blocks
The other two really big stumbling blocks with traditional ECM, claims Milliken, are the need to migrate data from a file share or legacy system to the new system and the need to train up and overcome the resistance of people who might have been perfectly happy with the old system.

“If you could truly integrate and unify information you would lessen the need to migrate data and maybe eliminate it entirely. You might ultimately want to migrate the data, because you want to get rid of a legacy system and you don’t want to pay for two systems, but the idea that the first step doesn’t have to be migration, which is often expensive, is a really key point,” he said.

“Then, once you’ve chosen to use a new system and you begin to migrate your data, you have to train up all those people who are happy with the old system in how to use the new system. That’s very often even bigger than the migration problem and where a new project gets derailed, because people are resistant to change and just aren’t going to shift.

“What we think’s really interesting is that innovation in companies usually comes from smaller groups – someone in legal decides they need to handle their contracts better, someone in HR wants a better system for managing employee information, someone in accounting has to deal with invoice processing and accounts payable in a different way. Enabling one small group to innovate on a process without forcing everyone else in the company to change enables faster innovation and productivity.

“We think the future will encompass the idea that one can do that innovation while the content remains in other systems, undisturbed. This is the idea that one group could utilise that data in a wholly different way to how others are using the same data in another system, allowing different groups to be doing that simultaneously based on their needs, without incurring all that migration and change management on a large scale.”

Metadata layer
M-files eliminates many of the problems highlighted above through a metadata layer. Most ECM systems use a location-based paradigm for storing documents – the idea that you put something in a folder to classify it, a ‘customer’ folder or a ‘project’ folder or a ‘contracts waiting for review’ folder.

Milliken points out that systems of this nature are flawed because the organisation of folders and files is so subjective. “Do you have marketing, sales, administration and then under those North America, Europe and Asia? Or do you have North America and then marketing and sales under that? It’s a very subjective choice and each company really does things differently, each individual even. Then, you’ve got to teach people that subjective thing and that’s what we believe leads to imprecision and dark data. If I think this should be in the customer folder but somebody else thinks it should be in the project folder, where is it? And what if it’s in different systems? Then, what about if it needs to be in more than one place, if it needs to be in both the project folder and the customer folder?”

Milliken says that this is where context and M-Files’ metadata-driven approach brings benefits. By adding tags, in this case ‘customer’ and ‘project’, the document can show up in more than one place. “We often use the analogy of the iPhone. When you put music on your iPhone, it shows up by genre or artist or album or date, but it is still only one piece of music,” he said.

If, over time, the document becomes associated with another project or customer you just add their name as a tag. It is completely dynamic and completely objective.

No silver bullet
Milliken admits that M-files’ approach is not a silver bullet. There are still aspects of it that people might find fault with, such as the need to add metadata. “The area where there might be some overhead is adding the metadata. How does the metadata get defined? You could argue that some people might think ‘I don’t want to tag things’, which is why in the past they would just put things into a network shared drive without going into the ECM system – because they could just throw it in there. Then you don’t remember where you put it and nobody else can find it.”

How, then, do you address potential resistance around tagging things with metadata?

Traditionally, creating the metadata has been done by manually tagging a document or using semi-automated methods like scanning and OCRing content and identifying a part number within a document or reading a barcode and classifying it on that basis.

Milliken says that in the future this will be done automatically, using analytics and emerging technologies like natural language processing and machine learning. He describes this as the Holy Grail and says that with tools like IBM Watson and Alchemy from HP it is now within reach.

Repository neutral
M-Files is not alone in this thinking. Analysts like Gartner and Forrester also recognise that changing customer requirements and advances in technology have created the need for a more dynamic, flexible content management platform that offers:

1
Access to content wherever it might reside:
A system will have its own repository but must also be repository-neutral and able to connect to external repositories via connectors.

2
On premises, cloud and hybrid deployment:
In the past, a system tended to be either on premise or cloud-based. As the popularity of the cloud increases, users should be able to switch between the two. “When you’re archiving content, you could move from a cloud-based implementation to an on premise one where storage might be less expensive. Or, a highly regulated business that’s very concerned about its compliance might want to retain data on premise but share and collaborate with partners and vendors via a cloud-based repository,” explained Milliken.

3
Intelligent metadata layer and federated access across multiple sources:
“This,” explained Milliken, “is the idea of getting access to content based on context rather than just what repository it resides in. When we talk about repositories we don’t just mean content repositories but other business systems like CRM and ERP as well. If I’m in the CRM and I’m working on a given customer, it’s obviously important to find documents and other information related to that customer. That’s where you begin to see the opportunity to span outside of one system. It won’t be where content is stored that’s important but how it’s contextually relevant to you.”

4
Automatic tagging and content analysis:
Advances in analytics, machine learning and natural language processing mean that tagging and content classification can be done automatically rather than manually or semi-automatically through barcodes and OCR. Milliken points out that with natural language processing, things can be inferred about a document that may not be directly stated in its content. For example, certain characteristics might associate it with a particular project, even if the relationship is never stated. Machine learning might also lead to improved results. It might decide ‘Everybody else on the sales team is using this document, maybe you’d be interested in it too’ or ‘If you’re searching for these kinds of things with the term agreement, maybe we should tag this with agreement too,

A visionary
Gartner predicts that by 2020, 20% of ECM vendors will be morphing their systems to provide these capabilities. M-Files, the only visionary in Gartner’s 2016 Magic Quadrant for ECM (Enterprise Content Management), is already well down this road.

“Where we think we’ve got a big head start is that we have done this metadata thing from the outset and have been honing it through thousands of customer deployments. For us, it’s always been a question not of where but what. In the past, we were thinking more about data within M-Files, but now we are extending that to connectors so that we can be repository-neutral. It’s a very natural extension. Now it’s not just unstructured content – documents, contracts, proposals, presentations, invoices, whatever it might be – it’s the structured data too, the customers in the CRM and the vendors and projects in the ERP.

“Unifying these two environments will lead to better user adoption because people can find what they need right when they’re in the CRM. We call it a 360-degree view. It really doesn’t matter where you start, you will find what you need. If you’re looking at a document and you see it’s related to a certain customer and then you look at that customer and you see that that customer’s now related to a bunch of other documents, that leads you to information that you might not have found with a search. You’re creating a unified, really intelligent environment in which information finds you almost as much as you find it.”

M-files’ new solution, when it is launched later this year, will take this to another level.

“All we had to do was generalise our metadata-driven approach to be repository-neutral, open up the architecture to plug in the analytics and boom,” said Milliken.

“Imagine you have a fileshare with a ton of files. You now automatically start scanning this fileshare with intelligent analytics, something like IBM Watson, and suddenly you infer the customer relationships for those documents and you tag all those documents with a customer. You’re not just putting a text string in, you’re literally linking it to the object in the CRM. At that point, just by adding that context you’ve dramatically changed the relevance of that information and that is absolutely within reach,” he said.