JOPX on SharePoint, Big Data and the future of work

Occasional rantings about SharePoint, Office 365 and Yammer. Intrigued about how people collaborate and the data driven company. Taking the first small steps in big data, predictive analytics and data mining

Monday, March 02, 2015

When you are developing against SharePoint Server 2013 search, you might forced to reset the search index. You can do this using the SharePoint user interface through the screen shown below or using PowerShell. I prefer to use PowerShell since resetting through the user interface seems to give me timeouts especially when the index is a quite large. One of the reasons why you are required to reset your content index is when your Search Service Application got into an unhealthy state because of insufficient disk space (See Fixing the Search Service after the Index Drive fills) but I also noticed that when you are working on your development machine and are making lots of changes to the search schema – it might also be useful to reset the search index for your changes to be picked up. If you want to change it using the user interface go to the Search Administration screen of the Search Service Application and select the “Index Reset” option underneath the crawling section of the left menu.

Don’t just reset your search index in a production environment since this will also impact the analytics processing component (Read Reset the index in SharePoint Server 2013). Listed below is the syntax for the PowerShell command (the snippet below assumes that you only have one SearchServiceApplication)

(Get-SPEnterpriseSearchServiceApplication).Reset($true,$true)

The SearchServiceApplication.Reset method takes two parameters - public void Reset( bool disableAlerts, bool ignoreUnreachableServer) – I would recommend always setting disableAlerts to true if necessary. The value for the second parameter will depend on your specific case. If you also get a timeout when using the PowerShell cmdlet – you can use the steps outlined in SharePoint 2013 Content Index Reset Timeout – they worked for me.

Friday, February 13, 2015

When preparing for my session The future of business process apps – a Microsoft perspective last year I got inspired by this great article The future of enterprise apps: moving beyond workflows to mindflows – which introduced the concept of mindful apps. The core message is that if we want to automate the last mile we have to analyze how people work day in and day out and start our system/application design with people at the center. One of the quotes which is mentioned in the article is from Bill Murphy (CTO of Blackstone one of the largest investment funds worldwide) – “We aim to take away as much of the stress as possible from easy stuff, by automating the routine and mundane actions, and give users more time to focus on the higher-end pieces of what they need to do.”

Most of the characteristics which are outlined in the comparison between traditional and mindful apps are not revolutionary (See table above) but there is one one important key message.Mindful apps will allow us to assess and compare options in decision context, they will allow us to quickly respond to events and make the best decision given a specific context and will provide us with “extended intelligence” by understanding and recognizing patterns within the data at hand. We as humans are good at problem solving, pattern recognition, identifying outliers, making creative leaps and incorporating new information when making decisions. We should be able to focus on these high end tasks by being freed from laborious and menial tasks which can be automated.

There are 3 different trends which will impact how these mindful apps will be shaped:

User context matters – make it personal. When we make decisions or work within the context of specific processes, there are a lot of parameters which determine how we react or how we make decisions – these parameters should be integrated into the decision framework driving mindful apps. Our calendar, availability of colleagues to reach out to, input from communications (using e-mail, messaging or other formats), information that we capture from blogs, social networks such as LinkedIn or open data sources together with available information within your organization should be filtered and at your fingertips. Machine learning and cognitive algorithms will drive the second machine age (a term coined by Brynolfson from MIT) but we are only at the start of how these algorithms can drive the future workplace for information workers.

Mobile shapes our expectations. Mobile apps and the user experience they provide is shaping at how we see an ideal enterprise application as well. Mindful apps should strive to combine beauty, simplicity and purpose to create an experience that delights us and that is effortless to use. Mobile apps are easy to understand, when people use a good app for the first time, they intuitively grasp the most important features, why can’t we do the same for enterprise apps. Simplicity rules. The apps should also incorporate necessary logic to evolve as the user grows more comfortable with its use and is exploring more advanced functionality. Apps should learn people’s preferences over time and show the interface which is best suited for the task at hand.

(Big) data and advanced analytics are the driving force. There is a lot of hype and confusion around the term Big Data but one thing is for sure – storage costs and processing cost have dropped significantly in the last decade. When you combine this with the rise of new storage platforms such as Hadoop, NoSQL datastores such as HBase, Cassandra, etc … and new data processing frameworks such as Apache Drill, Dremel, Spark, etc.. new opportunities arise to support users in their decision making processes. While there is a lot of emphasis on the 4 Vs (Volume, Velocity, Variety and Veracity) – there is one more V that you have to think about that is Value (Also see Big Data beyond the hype, getting to the V that really matters)

Cloud will lead the way. A lot of the innovation which will enable this next generation of apps is coming out of the datacenters of Google, Amazon, LinkedIn, Microsoft, Yahoo, etc… but most organizations don’t have the available capacity (nor the same financial resources) as these internet giants. Luckily the economies of scales which are offered by the cloud allows solution providers to provide you with a data infrastructure which can scale from prototype size to production environments able to handle huge amounts of data. The different major cloud players – IBM, Microsoft, Amazon and Google all seem to make big bets in building out the data analytics platform of the future and this competition will drive prices further down. This competition will also force them to focus on more innovative solutions which allow them to differentiate from the competition.

The best examples where we – as a consumer - see the power of Big Data, Analytics, Machine Learning and the cloud appear is mobile. The three major players (Microsoft, Apple and Google) are relying quite heavily on the cloud computing power and huge data stores to provide the experience of digital assistants. Microsoft is currently working on Cortana (which has been released in a number of countries worldwide), Apple was definitely the trendsetter with Siri and Google has Google Now.

The future is already here — it's just not very evenly distributed. (William Gibson)

Thursday, February 05, 2015

Microsoft Azure Machine Learning provides Machine Learning as a Service (on Microsoft Azure) and allows you to make your own applications more intelligent. Microsoft Azure Machine Learning was initially started as as an incubation project in Microsoft Research (codename Passau) and is part of the overall Microsoft Data Platform.
The best definition for Machine Learning – in my opinion – is from the excellent book “Introduction to Machine Learning (MIT Press 2014, Ethem Alpaydin)” (Use it as a reference – this is not an easy “how to” book)The goal of machine learning is to program computers to use example data or past experience to solve a given problem.
In general when we want to solve a problem on a computer, we need an algorithm to transform using a set of instruction into an output. Unfortunately for some problems we do not know how to program such algorithms – such as for e-mail spam detection or predicting customer behavior. In most cases we have the input and output available e.g. a set of e-mails for which some are marked as spam. Based on this data, we would like a computer (or machine) to automatically extract the algorithm necessary to perform the classification. The algorithm does not need to be perfect but needs to be a good and useful approximation.
The term machine learning is tightly coupled to the domain of analytics (or data science – see Data Scientist: the sexiest job of the 21st century ). Analytics is concerned with the discovery and extraction of useful business patterns or mathematical decision models from a specific data set. For this a number techniques can be used, depending on the practitioners background they will probably favor a technique from their respective domain:

Machine learning algorithms such as support vector machines, neural networks, Bayesian methods, … (originated out of the computer science domain)

If we focus specifically on machine learning we make a distinction between supervised learning where we try to find a mapping between a set of input variables and a specific output variable using a set of values to train a specific model and unsupervised learning where we try to find patterns in the input data.

But why should you care about machine learning? I think the picture below shows you how the focus is shifting from traditional reporting (hindsight) to more advanced predictive and prescriptive analytics (foresight) which will provide business with more added value but also requires business intelligence specialist new competencies such as machine learning and data mining. Examples across industries vary but in general predictive analytics has the potential to change the way how businesses make decisions (I will take a look a more in depth definitely pick up Predictive Analytics – The power to predict who will click, buy, lie or die from Eric Siegel)

Microsoft Azure Machine Learning distinguishes itself from other platforms and tools by a number of different characteristics:

Allows you to jointly build predictive models from anywhere in the world using only a web browser by making use of visual composition canvas (called Machine Learning Studio) using modules without requiring you to write code (although you can use R code snippets if you want). You can start quickly from existing sample experiments/models or you can share your own data experiments.

Collaborative work together with anyone from anywhere using just your browser.

The different modules allow you to author an end-to-end machine learning workflow starting with reading data, to training and validating your predictive model.

Ability to deploy models as web services. You can quickly operationalize your models by converting them into web services and you even the ability to monetize your machine learning models using Azure Data Market.

Wednesday, January 21, 2015

On April 18th 2015 BIWUG (www.biwug.be) is organizing its fifth edition of SharePoint Saturday Belgium. We invite you to submit a session for this year's SharePoint Saturday Belgium using this link - http://www.spsevents.org/city/Antwerp/Antwerp2015 . It is possible to submit multiple sessions. We will close the call for speakers on February 18th EOD.

Monday, December 15, 2014

Azure Stream Analytics which is currently in preview is a fully managed real-time stream analytics service that aims at providing highly resilient, low latency, and scalable complex event processing of streaming data for scenarios such as Internet of Things, command and control (of devices) and real-time Business Intelligence on streaming data.

Although it might look similar to Amazon Kinesis, it seems to distinguish itself by aiming to increase developer productivity by enabling you to author streaming jobs using a SQL-like language to specify necessary transformations and it provides a range of operators which are quite useful to define time-based operations such as windowed aggregations (Check out Stream Analytics Query Language Reference for more information) – listed below is an example taken from the documentation which finds all toll booths which have served more than 3 vehicles in the last 5 minutes (See Sliding Window – slides by an epsilon and produces output at the occurrence of an event)

This SQL like language allows for non-developers to built stream processing solutions through the Azure Portal and allows to easily filter, project, aggregate and join streams, add static data (master data) with streaming data and detect patterns within the data streams without developer intervention.

Azure Stream Analytics leverages cloud elasticity to scale up or scale down the number of resources on demand thereby providing a distributed, scale out architecture with very low startup costs. You will only pay for the resources you use and have the ability to add resources as needed. Pricing is calculated based on the volume of data processed by the streaming job (in GB) and the number of Streaming Units that you are using. Streaming Units provide the scale out mechanism for Azure Stream Analytics and provide a maximum throughput of 1MB/sec. Pricing starts as low as €0.0004/GB and €0.012/hr per streaming unit (roughly equivalent to less than 10€/month). It also integrates seamlessly with other services such as Azure Event Hub, Azure Machine Learning, Azure Storage and Azure SQL databases.