Login

Username *

Password *

What code is in the image? *

Register

Username *

E-mail address *

A valid e-mail address. All e-mails from the system will be sent to this address. The e-mail address is not made public and will only be used if you wish to receive a new password or wish to receive certain news or notifications by e-mail.

Technologies

The Technologies we love,

The Technologies we used to built Smart Data

Hadoop

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop 2 environment provides scalable services including HDFS, Yarn, Zookeeper, HBase, PIG, Sqoop or Apache Drill.

Hadoop fuels Smart Data, with its powerful components, providing scalability and support of standard for large deployment, with limited software cost. Smart Data is taking advantage of the following Hadoop components.

Spark

Spark is a fast processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop input Format. It is designed to perform both batch processing and new workloads like streaming, interactive queries, and machine learning.

Spark is the de facto standard for in-memory computation, and is paving the way to modern way of computing data analysis. Both Vanilla Hub and Vanilla Air can take advantage of Spark nodes inside a Data Science architecture, by taking data stored inside Spark, and processing data using Spark in-memory data analysis process.

Apache

The mission of the Apache Software Foundation (ASF) is to provide software for the public good. We do this by providing services and support for many like-minded software project communities of individuals who choose to join the ASF.

Apache is the heart of Open Source, and the foundation for Hadoop. In addition to Hadoop,

Smart Data integrates these Apache module :

Nutch : Apache Nutch is a highly extensible and scalable open source web crawler software project. Nutch is available in Vanilla Hub, to provide a crawling function

Drill : Apache Drill enables analysts, business users, data scientists and developers to explore and analyze this data without sacrificing the flexibility and agility offered by these datastores. Drill processes the data in-situ without requiring users to define schemas or transform data

Phoenix :Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets.

Tika : The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF) … it’s used together with Solr/Sloud to extract information from documents.

Solr

Solr is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distribute search and index replication, Solr is designed for scalability and Fault tolerance.

Smart Data is taking advantage of Solr/Cloud indexation server, and can be deployed together with the leading framework, Elastic Search and LucidWorks.

Elastic Search provides a growing platform of open source projects and commercial products designed to search, analyze, and visualize your data, allowing you to get actionable insight in real time. Our products are architected to seamlessly work together as a standalone solution or easily integrated into your existing infrastructure.

Lucidworks Fusion, the most advanced search and data analysis platform on the planet. It provides the enterprise-grade capabilities needed to design, develop, and deploy powerful search apps at any scale. Use it for enterprise search, ecommerce, real-time data analytics, and practically anything else you can think of requiring blazing fast data retrieval.

R Project

R is a free software environment for statistical computing and graphics. R and its libraries implement
a wide variety of statistical and graphical techniques, including linear and nonlinear modeling,
classical statistical tests, time-series analysis, classification, clustering, and others. R is easily
extensible through functions and extensions, and the R community is noted for its active
contributions in terms of packages.

R is the heart of Vanilla Air, as it provides engine to run standard and custom R programs, in a clustered environment. Together with Vanilla Air, you can build complex predictive and forecasting model to take control of your data strategy.
R provides powerful visualization such as 3D plot, maps … whatever you need, R can run it for you!

Vanilla ETL

Vanilla ETL platform provides an integrated suite of components to help enterprises extract value from their data. The Vanilla ETL platform addresses some of the key challenges in the data ETL value chain and processes

When it comes to support more complex transformation and aiming to standardize column’s value using a Master Data repository, Vanilla ETL provides reliable component to write complex transformation. Vanilla ETL provides the following components .

Vanilla KPI

Vanilla KPI comes with a set of full featured modules to take in charge the development & deployment of KPI applications, using data stored in various database, including relational databases and NoSql database such as Hbase.