For me, the main point of the write up is that the Oracle database is coming. There’s nothing like an announcement to keep the Oracle faithful in the fold.
If the write up is accurate, Oracle is embracing buzzy trends, storage that eliminates the guess work, and security. (Remember Secure Enterprise Search, the Security Server, and the nifty credential verification procedures? I do.)
The new version of Oracle, according to the write up, will deliver self driving. Cars don’t do this too well, but the Oracle database will and darned soon.

The 18c Autonomous Database or 18cad will:

Fix itself

Cost less than Amazon’s cloud

Go faster

Be online 99.995 percent of the time

And more, of course.

Let’s assume that Oracle 18cad works as described. (Words are usually easier to do than software I remind myself.)

The customers look to be big winners. Better, faster, cheaper. Oracle believes its revenues will soar because happy customers just buy more Oracle goodies.

Will there be a downside?

What about database administrators? Some organizations may assume that 18cad will allow some expensive database administrator (DBA) heads to roll.
What about the competition? I anticipate more marketing fireworks or at least some open source “sparks” and competitive flames to heat up the cold autumn days.

If you want to get a sense of the time and computational cost under the covers of Big Data processing, please, read “Cost in the Land of Databases.” Two takeaways for me were [a] real time is different from what some individuals believe, and [b] if you want to crunch Big Data bring money and technical expertise, not assumptions that data are easy.

The new frontier in analytics might just be pictures. Known to baffle even the most advanced AI systems, the ability to break pictures into recognizable parts and then use them to derive meaning has been a quest for many for some time. It appears that Disney Research in cahoots with UC Davis believe they are near a breakthrough.

We’ve seen tremendous progress in the ability of computers to detect and categorize objects, to understand scenes and even to write basic captions, but these capabilities have been developed largely by training computer programs with huge numbers of images that have been carefully and laboriously labeled as to their content. As computer vision applications tackle increasingly complex problems, creating these large training data sets has become a serious bottleneck.

A perfect example of the application of this is MIT attempts to use AI to share recipes and nutritional information just by viewing a picture of food. The sky is the limit when it comes to possibilities if Disney and MIT can help AI over the current hump of limitations.

as you added more data — not just a bit more data but orders of magnitude more data — and kept the algorithms the same, then the error rates kept going down, by a lot. By the time the datasets were three orders of magnitude larger, error was less than 5%. In many domains, there’s a world of difference between 18% and 5%, because only the latter is good enough for real-world application. Moreover, the best-performing algorithms were the simplest; and the worst algorithm was the fanciest. Boring old perceptrons from the 1950s were beating state-of-the-art techniques.

Bayesian methods date from the 18th century and work well. Despite LaPlacian and Markovian bolt ons, the drift problem bedevils some implementations. The solution? Pump in more training data, and the centuries old techniques work like a jazzed millennial with a bundle of venture money.

Care to name a large online outfit which may find this an idea worth nudging forward? I don’t think it will be Verizon Oath or Tronc.

We are making progress in training AI systems through the neural net approach, but exactly how those systems make their decisions remains difficult to discern. Now, Tech Crunch reveals, “MIT CSAIL Research Offers a Fully Automated Way to Peer Inside Neural Nets.” Writer Darrell Etherington recalls that, a couple years ago, the same team of researchers described a way to understand these decisions using human reviewers. A fully automated process will be much more efficient and lead to greater understanding of what works and what doesn’t. Etherington explains:

Current deep learning techniques leave a lot of questions around how systems actually arrive at their results – the networks employ successive layers of signal processing to classify objects, translate text, or perform other functions, but we have very little means of gaining insight into how each layer of the network is doing its actual decision-making. The MIT CSAIL team’s system uses doctored neural nets that report back the strength with which every individual node responds to a given input image, and those images that generate the strongest response are then analyzed. This analysis was originally performed by Mechanical Turk workers, who would catalogue each based on specific visual concepts found in the images, but now that work has been automated, so that the classification is machine-generated. Already, the research is providing interesting insight into how neural nets operate, for example showing that a network trained to add color to black and white images ends up concentrating a significant portion of its nodes to identifying textures in the pictures.

The write-up points us to MIT’s own article on the subject for more information. We’re reminded that, because the human thought process is still largely a mystery to us, AI neural nets are based on hypothetical models that attempt to mimic ourselves. Perhaps, the piece suggests, a better understanding of such systems could inform the field of neuroscience. Sounds fair.

Remaining relevant means making money in technology and Google and Apple are not about to be outdone by Amazon despite it appears that may be the case. In an effort to stem the potential loss of revenue both Apple and Google are re-engineering their search capabilities to “buttress the value of traditional search.”

According to GeoMarketing, the two tech giants are approaching the same problem from different angles:

In a sense, the battle between the mobile web and apps is a proxy war between Google and Apple.

For Google,

The (Q&A box) fits right in with the current idea of getting direct, personalized responses to queries as opposed to the traditional method of showing infinite hypertext listings based on general popularity. It follows a path that Google has already taken with its search functions, including the automatic addition of the term “near me” into the search box as well as providing searchable menu listings for restaurants and direct bookings to salons and spas.

Apple is focusing on apps rather than search, but with the same end in mind.

As consumers are demanding local results and more organic answers to their search questions, search giants have to continually find ways to accommodate. As long as it results in more revenue, the infinite chase is worth it, we suppose.

The issue arose because, as a government contractor, Palantir must report its diversity statistics to the government. The Labor Department sifted through these reports and concluded that even though Palantir received a huge number of qualified Asian applicants for certain roles, it was hiring only small numbers of them. Palantir, being the big data company that it is, did its own sifting and produced a data-filled response that it said refuted the allegations and showed that in some tech titles 25%-38% of its employees were Asians. Apparently, Palantirs protestations weren’t enough on to satisfy government regulators, so the company agreed to settle.

For its part, Palantir insists on their innocence but say they settled in order to put the matter behind them. Bort notes the unusual nature of this case—according to the Equal Employment Opportunity Commission, African-Americans, Latin-Americans, and women are more underrepresented in tech fields than Asians. Is the Department of Labor making it a rule to analyze the hiring patterns of companies required to report diversity statistics? If they are consistent, there should soon be a number of such lawsuits regarding discrimination against other groups. We shall see.

What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.

Here they are:

Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.

Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent

Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.

Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.

Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.

Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”

Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.

ScyllaDB’s biggest differentiator is that it’s compatible with the Apache Cassandra database APIs. As such, the creators claims that ScyllaDB can be used as a drop-in replacement for Cassandra itself, offering users the benefit of improved performance and scale that comes from the integration with a light key/value store.

The company says the new release is geared towards development teams that have struggled with Big Data projects, and claims a number of performance advantages over more traditional development approach, including:

*10X throughput of baseline Cassandra – more than 1,000,000 CQL operations per second per node

Wheatley cites Scylla’s CTO when he points to better integration with graph databases and improved support for Thrift, Date Tiered Compaction Strategy, Large Partitions, Docker, and CQL tracing. I notice the company is hiring as of this writing. Don’t let the Tel Aviv location of Scylla’s headquarters stop from applying you if you don’t happen to live nearby—they note that their developers can work from anywhere in the world.

IBM left private keys to the Docker host environment in its Data Science Experience service inside freely available containers. This potentially granted the cloud service’s users root access to the underlying container-hosting machines – and potentially to other machines in Big Blue’s Spark computing cluster. Effectively, Big Blue handed its cloud users the secrets needed to potentially commandeer and control its service’s computers.

IBM hopped to it. Two weeks after the stumble was discovered, IBM fixed the problem.

The write up includes this upbeat statement, attributed to the person using a demo account which exposed the glitch:

I think that IBM already has some amazing infosec people and a genuine commitment to protecting their services, and it’s a matter of instilling security culture and processes across their entire organization. That said, any company that has products allowing users to run untrusted code should think long and hard about their system architecture. This is not to imply that containers were poorly designed (because I don’t think they were), but more that they’re so new that best practices in their use are still being actively developed. Compare a newer-model table saw to one decades old: The new one comes stock with an abundance of safety features including emergency stopping, a riving knife, push sticks, etc, as a result of evolving culture and standards through time and understanding.

Bad news. Good news.

Let’s ask Watson about IBM security. Hold that thought, please. Watson is working on health care information. And don’t forget the March 2017 security conference sponsored by those security pros at IBM.

Search the site

Stephen E. Arnold monitors search, content processing, text mining
and related topics from his high-tech nerve center in rural Kentucky.
He tries to winnow the goose feathers from the giblets. He works with colleagues
worldwide to make this Web log useful to those who want to go
"beyond search". Contact him at sa [at] arnoldit.com. His Web site
with additional information about search is arnoldit.com.