Category Archives : Big Data

12

Mar

Looking to transform your business by improving your on-premises environments? Accelerating your move to the cloud, and gaining transformative insights from your data? Here’s your opportunity to learn from the experts and ask the questions that help your organization move forward.

Join us for one or all of these training sessions to take a deep dive into a variety of topics. Including products like Azure Cosmos DB, along with Microsoft innovations in artificial intelligence, advanced analytics, and big data.

Azure Cosmos DB

Engineering experts are leading a seven-part training series on Azure Cosmos DB, complete with interactive Q&As. In addition to a high-level technical deep dive, this series covers a wide array of topics, including:

By the end of this series, you’ll be able to build serverless applications and conduct real-time analytics using Azure Cosmos DB, Azure Functions, and Spark. Register to attend the whole Azure Cosmos DB series, or register for the sessions that interest you.

Artificial Intelligence (AI)

Learn to create the next generation of applications spanning an intelligent cloud as well as an intelligent edge powered by AI. Microsoft offers a comprehensive set of flexible AI services for any

05

Mar

We are happy to announce that job monitoring and job view have been added into the Azure Data Lake Tools for Visual Studio Code. Now, you can perform real-time monitoring for the jobs you submit. You can also view job summary and job details for historical jobs as well as download any of the input or output data and resources files associated with the job.

Key Customer Benefits Monitor job progress in real-time within VSCode for both local and ADL jobs. Display job summary and data details for historical jobs. Resubmit previously run Enable jobs resubmission for an old job. Download job inputs, outputs and resource data files. View the job U-SQL script for a submitted job. Summary of key new features

27

Feb

To provide more authentication options, HDInsight Tools for VSCode now can be connected to HDInsight cluster through Ambari for job submissions. You can easily link (HDInsight: Link a cluster) or unlink (HDInsight: Unlink a cluster) a normal cluster by using Ambari managed username and password, which is independent of your Azure signing process. The Ambari connection applies to Spark and Hive clusters in all the Azure environments which host HDInsight services.

To support HDInsight Enterprise Secure Package (in preview), you can also connect to the secured cluster through domain username (e.g. user1@contoso.com). This connection is applicable for both traditional blob storage (WASB) or Azure Data Lake Storage (ADLS) as underlying storage. Once you connect to the secured HDInsight cluster, you can use the signed in domain credentials for all you job submissions.

This addition grants you more flexibilities to connect to your HDInsight clusters in addition to your Azure subscriptions and greatly simplify your experiences in submitting your Hive and Spark jobs.

How to link a cluster Open the command palette by selecting CTRL+SHIFT+P, and then enter HDInsight: Link a cluster.

27

Feb

We are excited to announce the general availability of the StorSimple Data Manager. This feature allows you to transform data from StorSimple format into the native format in Azure blobs or Azure Files. Once your data is transformed, you can use services like Azure Media Services, Azure Machine Learning, HDInsight, Azure Search, and more.

StorSimple devices use the cloud as a tier of storage and sends data to the cloud in a highly efficient and secure manner. Data is stored in the cloud tier in this deduped, compressed, and encrypted format. A side effect of this is that this data is not readily consumable by cloud services that you might want to use. Azure offers a rich bouquet of services and our goal is to let you use the service of your choice on your data to unleash its potential.

Using this service, you can transform data stored in your 8000 series StorSimple devices into Azure blobs or Azure Files. All the file data that you store on-premises on your StorSimple device will show up as individual blobs or files in Azure. You can use the Azure portal, .NET applications, or Azure Automation to trigger these transformations. You can

13

Feb

Over the past decade, Microsoft has partnered with the National Science Foundation (NSF) on three separate programs, first in 2010, and more recently through a commitment of $6M in cloud credits across two NSF supported data science programs – with the Big Data Regional Innovation Hubs and as part of the NSF BigData solicitation.

The engagement with NSF has helped Microsoft reach diverse research groups such as the Big Data Hubs1 that brings together communities of data scientists to spark and nurture collaborations between domain experts, researchers, communities, state partners, nonprofits, and industry.

As of today, Microsoft has provided 17 cloud credit awards to Principal Investigators (PIs) who benefit from NSF supported programs. These collaborations are already seeing some interesting breakthroughs across the human body, microbial diseases, and even everyday communication –

Franco Pestilli, Assistant Professor in Psychology, Neuroscience and Cognitive Science, Indiana University is an Azure awardee and PI through the Midwest Big Data Hub2 – his group has built a platform called Brainlife using the Azure award, with the goal of fostering collaboration with sixty-six different global scientific communities such as developmental and learning sciences, network science, computer science, engineering, psychology, statistics, traumatic brain injury, vision science. Chirag

12

Feb

Providing a rich GUI for Azure Data Lake Storage resources management has been a top customer ask for a long time, we are thrilled to announce the public preview for supporting Azure Data Lake Storage (ADLS) in the Azure Storage Explorer (ASE). With the release of ADLS resources in ASE, you can freely navigate ADLS resources, you can upload and download folders and files, you can copy and paste files across folders or ADLS accounts and you can easily perform CRUD operations for your folders and files. Azure Storage Explorer not only offers a traditional desktop explorer GUI for dragging, uploading, downloading, copying and moving your ADLS folders and files, but also provides a unified developer experiences of displaying file properties, viewing folder statistics and adding quick access. With this extension you are now able to browse ADLS resources along-side existing experiences for Azure Blobs, tables, files, queues and Cosmos DB in ASE.

In that repo, you will find data files and scripts in the Deployment folder. There are also lab manual folders for each lab module as well an overview presentation to walk you through the labs. Below you will find more details on each module.

The repo also includes a series of PowerShell and database scripts as well as Azure ARM templates that will generate resource groups that the labs need in order for you to successfully build out an end-to-end scenario, including some sample data that you can use for Power BI reports in the final Lab Module 9.

25

Jan

The ability to run Spark on a GPU enabled cluster demonstrates a unique convergence of big data and high-performance computing (HPC) technologies. In the past several years, we’ve seen the GPU market explode as companies all over the world integrate AI and other HPC workflows into their businesses. Tensorflow, a framework designed to utilize GPUs for numerical computation and neural networks has skyrocketed into popularity, a testament to the rise of AI and consequently the demand for GPUs. Simultaneously, the need for big data and powerful data processing engines has never been greater as hundreds of companies start to collect data in the petabyte range.

By providing infrastructure for high performance hardware such as GPUs with big data engines such as Spark, data scientists and data engineers can enable many scenarios that would otherwise be difficult to achieve.

Along with the recent release of our latest GPU SKUs, I’m excited to share that we now support running Spark on a GPU-enabled cluster using the Azure Distributed Data Engineering Toolkit (AZTK). In a single command, AZTK allows you to provision on demand GPU-enabled Spark clusters on top of Azure Batch’s infrastructure, helping you take your high performance implementations that are usually

16

Jan

ADF v2 public preview was announced at Microsoft Ignite on Sep 25, 2017. With ADF v2, we added flexibility to ADF app model and enabled control flow constructs that now facilitates looping, branching, conditional constructs, on-demand executions and flexible scheduling in various programmatic interfaces like Python, .Net, Powershell, REST APIs, ARM templates. One of the consistent pieces of customer feedback we received, is to enable a rich interactive visual authoring and monitoring experience allowing users to create, configure, test, deploy and monitor data integration pipelines without any friction. We listened to your feedback and are happy to announce the release of visual tools for ADF v2. The main goal of the ADF visual tools is to allow you to be productive with ADF by getting pipelines up & running quickly without requiring to write a single line of code. You can use a simple and intuitive code free interface to drag and drop activities on a pipeline canvas, perform test runs, debug iteratively, deploy & monitor your pipeline runs. With this release, we are also providing guided tours on how to use the enabled visual authoring & monitoring features and also an ability to give us valuable feedback.

04

Jan

It has been a busy season for many retailers. During this time, retailers are using Azure to analyze various types of data to help accelerate purchasing decisions. The Azure cloud not only gives retailers the compute capacity to handle peak times, but also the data analytic tools to better understand their customers.

Many retailers have a treasure trove of information in the thousands, or millions, of product reviews provided by their customers. Often, it takes time for particular reviews to show their value because customers “vote” for helpful or not helpful reviews over time. Using machine learning, retailers can automate identifying useful reviews in near real-time and leverage that insight quickly to build additional business value.

But how might a retailer without deep big data and machine learning expertise even begin to conduct this type of advanced analytics on such a large quantity of unstructured data? We will be holding a workshop in January to show you how easy that can be through the use of Azure and Qubole’s big data service.

Using these technologies, anyone can quickly spin up a data platform and train a machine learning model utilizing Natural Language Processing (NLP) to identify the most useful reviews.