Topics

Featured in Development

Peter Alvaro talks about the reasons one should engage in language design and why many of us would (or should) do something so perverse as to design a language that no one will ever use. He shares some of the extreme and sometimes obnoxious opinions that guided his design process.

Featured in AI, ML & Data Engineering

Today on The InfoQ Podcast, Wes talks with Katharine Jarmul about privacy and fairness in machine learning algorithms. Jarul discusses what’s meant by Ethical Machine Learning and some things to consider when working towards achieving fairness. Jarmul is the co-founder at KIProtect a machine learning security and privacy firm based in Germany and is one of the three keynote speakers at QCon.ai.

Featured in Culture & Methods

Organizations struggle to scale their agility. While every organization is different, common patterns explain the major challenges that most organizations face: organizational design, trying to copy others, “one-size-fits-all” scaling, scaling in siloes, and neglecting engineering practices. This article explains why, what to do about it, and how the three leading scaling frameworks compare.

The new ADLS Gen2 service combines scalability, cost-effectiveness, and a security model with rich analytics capabilities using the Hadoop Distributed File System (HDFS). Moreover, with the HDFS customers can store both structured and unstructured data, along with an Azure Blob File System driver (ABFS) that allows files and folders to be distinctly addressed on the server side – eliminating the need for a complex client-side driver, and ensuring high fidelity file system transactions.

We implemented a hierarchical namespace (HNS) which supports atomic file and folder operations. This is important because it reduces the overhead associated with processing big data on blob storage. This speeds up job execution and lowers cost because fewer compute operations are required. The ABFS driver and HNS significantly improve ADLS’ performance, removing scale and performance bottlenecks.

Next, in regard to the performance boost, Microsoft also offers the same robust data security capabilities built into Azure Blob Storage, such as:

Currently, ADLS is available in almost all Azure regions except for US DOD Central and US DOD East. Furthermore, the pricing details for ADLS are available on the pricing page.

With the new ADX, customers can leverage a fully managed data analytics service for real-time analysis on large volumes of streaming data. This service is, according to the blog post by Willis, capable of querying 1 billion records in under a second with no modification of the data or metadata required. Furthermore, ADX includes native connectors to Azure Data Lake Storage, Azure SQL Data Warehouse, and Power BI and comes with an intuitive query language allowing customers to obtain insights in minutes.

Microsoft made the design for ADX with speed and simplicity in mind – it combines two distinct services that work in tandem:

The Engine, a service responsible for processing the incoming raw data and serving user queries, and

A Data Management (DM) service, which allows the ingestion of various types of raw data. Furthermore, the DM is also responsible for managing failures, backpressure, and data grooming tasks when necessary.

Note that both services are deployed as clusters of compute nodes (virtual machines) in Azure.

ADX is currently available in 41 Azure regions, and pricing details are available on the pricing page.

With the two new services, customers can have greater flexibility in managing unstructured data or data generated from interactions on the web, software-as-a-service apps, social media, mobile apps, and internet of things devices. According to John Chirapurath, general manager of Azure data, blockchain, and AI at Microsoft in a VentureBeat article:

We always strive to make it very easy for IT staff to adopt analytics and for line-of-business people to utilize and deliver powerful insights using beautiful products.

Lastly, Microsoft also released a preview of a new Mapping Data Flow capability in Azure Data Factory (ADF) - a hybrid cloud-based data integration service for orchestrating and automating data movement and transformation. With the new capability, customers can visually design, build, and manage data transformation processes without learning Spark or having a deep understanding of their distributed infrastructure. Currently, ADF is available in 21 regions and pricing details are available on the pricing page.