Pages

Saturday, November 21, 2015

If you are a java developer, java garbage collection (gc) sometime pop up from time to time in javadoc, online article or online discussion. It is such a hot and tough topic because that is entirely different paradigm than what programmer usually do, that is coding. Java gc free heap for the object you created in class in the background. In the past, I also cover a few article which related to java gc and today I am thinking to go through several blogs/articles which I found online, learn the basic and share what I've learned and hopefully for java programmer, java gc will become clearer.

When you start a java application, with the parameters that are assigned to the java, the operating system will reserved some memory for java application known as heap. The heap further divided into several regions collectively known as eden, survivor spaces, old gen and perm gens. In oracle java8 hotspot, perm gen has been removed, be sure to always check official documention on garbage collector for changes. Below are a few links for hotspot implementation for java gc.

Survivor spaces are divided into two, survivor 0 and survivor 1. Both eden and survivor spaces collectively known as Young generation or new generation whilst old gen also known as tenured generation. Garbage collections will happened on young generation and old generations. Below are two diagrams show the heap regions are divided.

While the concept of Garbage Collection is the same, the implementation is not and neither are the default settings or how to tune it. The well known jvm includes the oracle sun hotspot, oracle jrockit and ibm j9. You can find the other jvm lists here. Essentially garbage collection will perform on young generation and old generation to remove object on heap that has no valid reference.

common java parameters settings. For full list, issue the command java -X

Friday, November 20, 2015

There are many article online that articulate how fast a solid state drive in comparison to a spinning hard disk drive. But ones wonder really, just how fast is ssd. I mean if your earning is limited and the current spinning disk is working fine, there is no compelling reason to make the change and at the same time, it will be difficult to afford a ssd considering the cost per gb. As of this writing, a samsung 850 pro 512GB selling at compuzone malaysia at a staggering pricing of 1378MYR!!! But there is a kind soul who generously donate ssd to me and as a gesture of good will back to him, here in I will write a blog describe my experience migrating from a Hitachi/HGST Travelstar Z7K500 HGST HTS725050A7E630 to a Samsung SSD 850 PRO 512GB.

One things for sure, ssd weight is so light. It is only 66gram! In comparison with hdd at 95gram. For the first timer, ssd weight is so light that it confused me like did it actually contain the size of 512GB.

For the spinning hdd, I took 5 samples, 1min 41sec, 1min 34sec, 1min 34sec, 1min 34sec and 1min 31sec. All measurements from the moment I push the power button until I see the login screen. Of cause, there many services running during bootup. On average, the bootup time is around 1min 48sec.

I have performed 3 tests with cache and without cache reading. On average, non caching read speed is about 107.12MB/sec and caching read speed is 2867.80MB/sec . The parameter for hdparm is shown below.

-t Perform timings of device reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system
(no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading through the buffer cache to the disk without any prior caching of data. This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. To ensure accurate measurements, the buffer cache is flushed during the processing of -t using the BLKFLSBUF ioctl.

-T Perform timings of cache reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system
(no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading directly from the Linux buffer cache without disk access. This measurement is essentially an indication of the throughput of the processor, cache, and memory of the system under test.

Next, I use command dd to do higher layer benchmark on the disk. See below.

On average, non caching read speed is 253.67MB/sec and caching read speed is 2926.93MB/sec. As for dd tests, on average, 236.10MB/sec. On average, the bootup time is around 30sec! The significant change is during after grub and the login shown is around 3 seconds or less. Although the rated for this ssd for sequantil read and write is 550/520MB/sec, maybe it was because of my old system bandwidth maxing out.

Significant time reduced during bootup and disk read is clearly seen from the statistics above. As for user experience, everything become so fast! To put into comparison, hdd to ssd is like proton car to F1 car. I think in the future, it will help in term of programming like code grepping and code find.

UPDATE: with adjustment to the grub timeout and changes to the POST to fast boot, now my cold boot to login screen is improved to 20seconds!

Sunday, November 8, 2015

Apache CouchDB, commonly referred to as CouchDB, is an open source database that focuses on ease of use and on being "a database that completely embraces the web".[1] It is a document-oriented NoSQL database that uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.[1] CouchDB was first released in 2005 and later became an Apache project in 2008.

Actually couch is an acronym for Cluster Of Unreliable Commodity Hardware. If you are using debian based linux distribution, it will be very easy, just apt-get install couchdb . If not, you can check out this link on how to install for other linux distribution. Once installed, make sure couchdb is running.

As you can read above, we update this document twice, first, we update by increase rating to 6 and appending _rev using a unique id generated from couchdb. If this update is a success, notice that rev is increase by 1? note 2- on rev. On the second update, we update the view to 10 but essentially in couchdb, everything in this doc is wipe and insert a new key called views. So note if you want to update a document, you should send in all existing value before do the update.

couchdb is very good and efficient of handling document, it certainly definitely earn it spot for beginner to dwelve deeper of its capability. If you think so, take a look at this link. That's it for today, have fun learning!

Saturday, November 7, 2015

Today we will take a look into another big data technology. Apache accumulo is the topic for today. First, what is accumulo?

Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.

Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, Hypertable, and Cassandra. Accumulo began its development in 2008 and joined the Apache community in 2011.

In this article, as always, we will setup the infrastructure. I reference this article with the following environment.

64bit arch

open jdk 1.7/1.8

zookeeper-3.4.6

hadoop-2.6.1

accumulo-1.7.0

openssh

rsync

debian sid

As accumulo is java based project, you must installed and configured java. Get latest java 1.7 or 1.8 as of this writing. After java is installed you need to export JAVA_HOME in your bash configuration file, .bashrc with this line export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55

Then you need to source the new .bashrc. . .bashrc is sufficient. For ssh and rsync, you can use apt-get package manager as it is easy. What's important is that, you should enable public and private key in your user configuration ssh directory.

You can create two directories, $HOME/Downloads and $HOME/Installs respectively. It's pretty intuitive, the downloads directory is for the package downloaded and the install is the working directory after the compress packages are downloaded.

As you can read above, we specify the java home for hadoop and then we configure hadoop to run on port 9000, so make sure this port is free for hadoop to use. Then we format the hadoop namenode and start the hadoop.

So we have configure the setting for accumulo in .bashrc and some properties settings in accumulo-env.sh and accumulo-site.xml . Next, we will initialize accumulo and start it using the password we specify previously.

Apache Karaf is a small OSGi based runtime which provides a lightweight container onto which various components and applications can be deployed.

If you have no idea what does that means, maybe a simple quick how to give some idea. First, let's download a copy of apache karaf and you can do that just here. At the time of this learning experience, I'm using Apache Karaf version 4.0.1. Then extract to a path so that will be karaf home directory.

Let's launch karaf, see screenshot below in the terminal. Let's add apache camel repository into apache karaf and then install on it. We will therefore using this as a sample for this learning experience.

In this article, we are just going through the surface on what apache karaf can do and it certainly deliver! If you are looking for docker container alternative, apache karaf certainly worth the time to look into. Hence forth I think these links will provide you further learning experience.

Sunday, October 25, 2015

If you have been a java developer and you should came across java garbage collection that free the object created by your application from occupied all the java heap. In today article, we will look into java heap and particular into java eden space. First, let's look at the general java heap.

The heap memory is the runtime data area from which the Java VM allocates memory for all class instances and arrays. The heap may be of a fixed or variable size. The garbage collector is an automatic memory management system that reclaims heap memory for objects.

Eden Space: The pool from which memory is initially allocated for most objects.

Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space.

Tenured Generation: The pool containing objects that have existed for some time in the survivor space.

When you created a new object, jvm allocate a part of the heap for your object. Visually, it is something as of following.

When eden space is fill with object and minor gc is performed, survive objects will copy to either survivor spaces; s0 or s1. At a time, one of the survivor space is empty. Because the eden space are relatively small in comparison to the tenure generation, hence, the gc that happened in eden space is quick. Eden and both survivors spaces are also known as young or new generation.

The Sun/Oracle HotSpot JVM further divides the young generation into three sub-areas: one large area named "Eden" and two smaller "survivor spaces" named "From" and "To". As a rule, new objects are allocated in "Eden" (with the exception that if a new object is too large to fit into "Eden" space, it will be directly allocated in the old generation). During a GC, the live objects in "Eden" first move into the survivor spaces and stay there until they have reached a certain age (in terms of numbers of GCs passed since their creation), and only then they are transferred to the old generation. Thus, the role of the survivor spaces is to keep young objects in the young generation for a little longer than just their first GC, in order to be able to still collect them quickly should they die soon afterwards.

Based on the assumption that most of the young objects may be deleted during a GC, a copying strategy ("copy collection") is being used for young generation GC. At the beginning of a GC, the survivor space "To" is empty and objects can only exist in "Eden" or "From". Then, during the GC, all objects in "Eden" that are still being referenced are moved into "To". Regarding "From", the still referenced objects in this space are handled depending on their age. If they have not reached a certain age ("tenuring threshold"), they are also moved into "To". Otherwise they are moved into the old generation. At the end of this copying procedure, "Eden" and "From" can be considered empty (because they only contain dead objects), and all live objects in the young generation are located in "To". Should "to" fill up at some point during the GC, all remaining objects are moved into the old generation instead (and will never return). As a final step, "From" and "To" swap their roles (or, more precisely, their names) so that "To" is empty again for the next GC and "From" contains all remaining young objects.

As you can observed based on the visual diagram above, you can set the amount of heap for the eden and survivor space using -Xmn in the java parameter. There is also -XX:SurvivorRatio=ratio and you can find further information here for java8. Note that in the diagram above, Perm gen has been removed in java8, hence always refer find out what java run your application and refer to the right version of java documentation.

If you want to monitor the statistics of eden , you can use jstats. Previously I have written an article about jstat and you can read here what is jstat and how to use it. You can also enable gc log statistics and so jvm will write the gc statistics into a file, you can further read more here.

Till then we meet again in the next article. Please consider donate, thank you!

Saturday, October 24, 2015

It's been a while since my last learning on MongoDB. The last learning on MongoDB was on administration. Today, we will learn another topic of mongoDB; MongoDB security. As a general for MongoDB security context, it means

Maintaining a secure MongoDB deployment requires administrators to implement controls to ensure that users and applications have access to only the data that they require. MongoDB provides features that allow administrators to implement these controls and restrictions for any MongoDB deployment.

This article is reference the official documentation which can be found here. As the security context is pretty huge, in this short article, we will focus how to setup mongdb server to use on ssl and how client will access the database resource securely.

First, make sure you have install the server and client package. If you are on deb package linux distribution, it is as easy as sudo apt-get install mongodb-clients mongodb-server. Once both packages are install, you can check in the log file at /var/log/mongodb/mongodb.log similar such as the following. So our mongodb version is 2.6.3 and it has support using openssl library.

Next, let's generate a public and private key and a self sign certifcate.

user@localhost:~/test1$ openssl req -newkey rsa:2048 -new -x509 -days 365 -nodes -out mongodb-cert.crt -keyout mongodb-cert.key
Generating a 2048 bit RSA private key
.............................+++
..................................................................................................................................................................................................................+++
writing new private key to 'mongodb-cert.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:MY
State or Province Name (full name) [Some-State]:KL
Locality Name (eg, city) []:Kuala Lumpur
Organization Name (eg, company) [Internet Widgits Pty Ltd]:example.com
Organizational Unit Name (eg, section) []:Engineering
Common Name (e.g. server FQDN or YOUR name) []:Jason Wee
Email Address []:jason@example.com
user@localhost:~/test1$ ls
mongodb-cert.crt mongodb-cert.key