Big Data is one of the most in-demand technology. Startups, big tech vendors, companies big and small are all jumping on the big data craze. The reason being the data amount getting just doubled almost every two years.

It is used to handle massive amounts of information in all sorts of formats -- tweets, posts, e-mails, documents, audio, video, feeds etc.

There are a growing number of technologies that make up the Big Data world including all sorts of analytics, in-memory databases, NoSQL databases, Hadoop to name a few. We will briefly look upon the various tools and terminologies that are making huge inroads on this technology stack

Hadoop

Hadoop is a crucial technology at the center of the whole Big Data.

It is an open source software used to gather and store vast amounts of data and analyze it on low-cost commodity hardware. For instance, banks may use Hadoop for fraud detection, and online shopping services could use it to analyze customers' buying patterns. That will make huge impact once integrated in a CRM system.

Cassandra

Cassandra is a free and open source NoSQL database.

It's a kind of database that can handle and store data of different types and sizes of data and it's increasingly the go-to database for mobile and cloud applications. Several companies including Apple and Netflix use Cassandra and have been highly impactful.

MapReduce

MapReduce has been called "the heart of Hadoop."

MapReduce is the method that allows Hadoop to store all kinds of data across many low-cost computer servers. To get meaningful data of Hadoop, a programmer writes software programs (often in the popular language, Java) for MapReduce.

Cloudera

Cloudera is a company that makes a commercial version of Hadoop.

Although Hadoop is a free and open-source-project for storing large amounts of data on inexpensive computer servers, the free version of Hadoop is not easy to use. Several companies have created friendlier versions of Hadoop, and Cloudera is arguably the most popular one.

Hbase

Hbase is yet another project based on the popular Hadoop technology.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Once that data is stored using the Hadoop Distributed File System (HDFS), Hbase can sort through that data and group bits of data together, somewhat similar to how a traditional database organizes data.

Pig

Pig is another hot skill, thanks to demand for technologies like Big Data and Hadoop.

Pig is a programming language that helps extract information from Hadoop like find answers to certain questions or otherwise use the data.

Flume

Flume is yet another skill spawned from Big data" craze and the popularity of Hadoop.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Flume is a method to move massive amounts of data from the place it was created into a Hadoop system.

Hive

Hive is yet another hot in-demand skill, courtesy Big Data and the popularity of Hadoop.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Hive provides a way to extract information from Hadoop using the same kind of traditional methods used by regular databases. (In geek speak: it gives Hadoop a database query interface).

NoSQL

NoSQL is a new kind of database that is part of the big data phenomenon.

NoSQL has sometimes been called the cloud database. Regular databases need data to be organized. Names and account numbers need to be structured and labeled. But noSQL doesn't care about that. It can work with all kinds of documents.

There are a number of popular noSQL databases including Mongo, Couchbase and Cassandra.

Zookeeper

Zookeeper is a free and open-source project that also came from the big data craze, particularly the uber popular tech called Hadoop.

Hadoop is a way to store all kinds of data across many low-cost computer servers. Zookeeper is like a file system, a way to name, locate, and sync files used in Hadoop. But now it's being used with other big-data technologies beyond Hadoop.

Arista

Arista makes a computer network switch used in big data centers.

Its claim to fame is its operating system software which users can programme to add features, write apps or make changes to the network.

R

At the center of much-in demand technology Big Data is something called "analytics," the ability to sift through the humongous amount of data and gather business intelligence out of it.

R is the language of choice for this. It used for statistical analysis and graphics/visualization.

Sqoop -

Sqoop is one of those skills that has zoomed into popularity, thanks to Big Data craze.

It's a free and open source tool that lets you transfer data from popular Big Data storage system, Hadoop, into classic relational databases like the ones made by Oracle, IBM and Microsoft.

It's a command-line interface tool, meaning you have to know the commands and type them directly into the system, rather than click on them with a mouse.

While Big Data options like Hadoop are the new-age way of dealing with data, Documentum (EMC Documentum is an "enterprise content management" system) remains a popular tool in industries that still use a lot of paper or electronic forms, like legal, medical, insurance, and so on. A major sections where BigData can bring about a revolution.

While NoSQL databases are increasingly becoming popular for new applications, many companies still RDBMS-based systems.
Relational Database Management System is the full from of RDBMS, a type of database management system. This is the traditional kind of database that uses the structured query language (SQL) used by databases like Oracle, Microsoft SQL Server, and IBM DB2.

There are data scientists that work on the tech side, the marketing side, and just about every other area of business in enterprise systems, and in just about every size company. They figure out how to get meaningful numbers and information from large volumes of data. And Bigdata is the most magical word in for them.

API when internal can be tested through unit testing and the use of the product that consumes it.

If it is an externally consumable API then you need to be much more thorough because people could use it in different ways than you might expect and send data in much different formats, etc. It also usually needs to make sense, be intuitive and be well documented if it is externally consumable.

Testing an API nearly always requires you to create some sort of consumer for testing purposes. You have to create an application to interact with the API. The application is usually very simple and driven by automated test cases and little/or no manual interaction.

If the API has dependencies, you may choose to mock those dependencies out so you can more thoroughly test all of those interactions and hit all of the positive and negative code paths.

Suppose an API interacts with a database and intends CRUD operation :

Create an invalid record with some business rule violation, such as a foreign key violation, or a unique key constraint violation, or something like Not Null violation, or even you can check for the precision violation . Just analyse how the system behaves in these type of test activity.

Read some record when row count is huge for a table and assess timelines for the same, and note if the query does not time out in the server- a typical bug you see in almost all system when developed initially.

Update some set of record with valid business rules and then invalid business rules and invalid constraints. Simultaneous updates involving some pessimistic locks and optimistic locks.

Delete a set of record which involves cascade delete, delete a non -existent records. Lot of scenarios on use cases that come up in a system will be criterion for API testing.