Sunday, December 11, 2016

About a week ago, I ordered and received the System76 Kudu laptop. Here is my review after six days of use.

The System76 Kudu, a 17.3 inch laptop

System76 is a relatively young computer brand that differentiates itself by making high-quality laptops installed with Ubuntu, the most popular Linux distro available. Since Apple's underwhelming 2016 MacBook announcements, System76 announced their website crashed due to the surge in traffic and orders. While Linux is still relatively obscure to the consumer public, System76 has taken advantage of the sudden interest from Apple users.

I have never been a MacBook person. I used a MacBook to prototype an iOS app in Objective-C back in 2011. I was not terribly thrilled with the developer tools at the time, so I never felt compelled to switch from my Windows/Linux dual boot setup. I switched to Windows almost exclusively for a time after I bought a Surface Pro 3. While it is a great piece of hardware, Windows 10 was slow and updated at the most inopportune times. There are few things more annoying than getting ready to project my Surface for a business meeting, and I sat there hoping and waiting the update will finish in time. After this happened on a business trip a few weeks ago, I made up my mind to switch back to Linux.

As I became proficient as a developer, I was already gravitating back to Linux on my desktop computer, and became particularly fond of Linux Mint which is based off Ubuntu. From my experience, Linux Mint is snappier, faster, less buggy, and prettier than Ubuntu. I tried to put Linux Mint on my Surface Pro 3, but the UEFI made it an absolute pain to use. Not to mention, the drivers hacked together in the linux-surface package felt fine at first, but broke down and froze my machine over time.

I decided to jump ship on Windows devices and wanted a UEFI/SecureBoot-free laptop. Although not fashionable, I wanted a 17.3" screen because I needed a large amount of screen real estate for the projects I am doing. Although this would make me less mobile, I did not plan on using it on the plane. Initially I looked at Dell Ubuntu machines, but then a colleague from an open-source project suggested I check out System76, and I immediately became drawn to the Kudu.

I use my laptops for work, and primarily do coding, technical writing, and instructional videos for O'Reilly. Therefore I do not need high-end graphics capability or a 4K display, or else I would have gone with the popular System76 Oryx Pro. But the Kudu seemed to be what I needed. When I bought it, it featured a 6th-gen Intel i7 and a 1080P display. I customized it to have 8GB of RAM and a 256 GB PCIe M.2 SSD. Out the door, I paid around $1150. I'm glad to say it was worth every penny.

The System76 feels solid but sleek for a 17.3" laptop. It does not have a "hollow" cheap feel that many PC laptops have in this category

Installing Linux Mint

I tried to use the stock installation of Ubuntu, but after a few hours I wiped it and installed Linux Mint, which runs phenomenally well on it. What is great about System76 is you can take a vanilla Ubuntu, Linux Mint, and most other Linux distro images and install them with no drama. System76 makes all of their drivers available as a PPA so you can optimize Ubuntu or Linux Mint easily. It is nice to not have to use a proprietary OS with customizations (or even worse a Windows license key). You can nuke and re-install the Linux OS with no hassles at any time. This is a huge feature especially as Microsoft is now locking down Windows devices with UEFI and SecureBoot.

I installed Linux Mint 18 Cinnamon Edition on my Kudu, which I prefer over Ubuntu.

Appearance and Mobility

When I started using the Kudu, the first thing I noticed is it does not have that cheap "hollow" feel that many large-screen work laptops have. It feels fairly solid but is not too heavy at 6.8 lbs. Granted you will not want to walk long distances with it in one arm, but it is fairly mobile for a 17.3 inch laptop. I am able to work with it comfortably on my lap, but it probably would be difficult to use on a plane or other tight areas.

It has a fairly low profile as well. It is not an ultrabook for sure, but when folded closed does not feel like a brick.

The Kudu has a relatively compact profile

The Keyboard and Track PadThe keyboard and track pad are comfortable and reliable. The keys are backlit and have a slight concave which feels great to type on. The track pad is responsive but not over-sensitive, and it supports gestures nicely. I have used many Windows laptops (including the Surface Pro 4 Touch Keyboard) and none of them approach this level of quality in a track pad. I have not used a MacBook in a while so I cannot compare to its track pad, which I understand is the best in terms of standards.

This laptop is fast. Of course I paid an extra $190 or so for the PCIe M.2 SSD, but it was well worth it. Everything from Intellij IDEA to Atom Editor opens almost instantly. When I booted Linux Mint off a USB stick, I recall the OS loaded in a few seconds.

I never was a big fan of Ubuntu after discovering Linux Mint. Although Linux Mint is based off Ubuntu, it does a much better job of "just working" from my experience. It is faster and more intuitive to navigate. The Cinnamon version of Linux Mint feels like a modernized Windows XP with better aesthetics and lean resource usage. I even put my parents' desktop on Linux Mint after Windows crashed, and they have used it daily without complaints.

That being said, you can use the stock Ubuntu installation or put Linux Mint on easily. Both are compatible with the same Debian/Ubuntu-based software which I'll discuss later.

The screen is beautiful and big, with no backlight bleeds or dead pixels. It is easy to multitask and have multiple windows open. Being able to review and edit code with the large workspace is also a plus. One task I especially am happy with is doing Markdown editing for books. Having enough screen real estate to have the editor on the left and the rendering on the right makes a huge difference to productivity. I also had some annoying lag on my Surface Pro 3 working with Atom Editor in Markdown Preview, but it is pretty snappy on the Kudu.

Writing books with Atom Editor on the Kudu's large HD screen is an absolute joy

Overrall, Linux Mint feels like it is made for the System76 Kudu. Everything is snappy, fast, and instantaneous. The Ubuntu key brings up the home menu and everything works optimally out-of-the-box.

The Software

I have been using Ubuntu and Linux Mint off-and-on for about 4 years now. Since Linux Mint is based off Ubuntu (which is based off Debian), you can easily install software built for any of those distros. LibreOffice comes pre-installed for both Ubuntu and Linux Mint, not that it matters since you can download it for free at any time. For a Microsoft Office alternative, LibreOffice works pretty well. I have had many issues moving presentations from Impress to PowerPoint and vice versa. The cross-compatibility is somewhat exaggerated as slide content can be misaligned and scattered. I learned to stick with PowerPoint if I am going to give my presentations to Office users.

Speaking of Microsoft Office, you can run Windows 10 inside VirtualBox for free. VirtualBox is an open-source virtual machine software that allows you to run an operating system inside an operating system. In other words, you can run Windows 10 inside Ubuntu or Linux Mint.

The System76 Kudu is probably the most productive laptop I ever had, outperforming my Surface Pro 3 and every other device I have owned. The hardware feels great and runs snappy. The experience of using a System76 has that "premium" quality lacking in most PC's now. If you are interested in graphics-intensive gaming and multimedia applications, you might want to consider getting the System76 Oryx Pro. But for a workhorse laptop, the Kudu is great.

Wednesday, November 30, 2016

About a month ago I posted an article proposing Kotlin as another programming language for data science. It is a pragmatic, readable language created by JetBrains, the creator of Intellij IDEA and PyCharm. It has received growing popularity on Android and focuses on industrial use rather than experimental functionality. Just like Java and Scala, Kotlin compiles to bytecode and runs on the Java Virtual Machine. It also works with Java libraries out-of-the-box with no hiccups, and in this article I’m going to show how to use it with Apache Spark.

Officially, you can use Apache Spark with Scala, Java, Python, and R. If you are happy using any of these languages with Spark, you likely will not need Kotlin. But if you tried to learn Scala or Java and found it was not for you, you might want to give Kotlin a look. It is a legitimate fifth option that works out-of-the-box with Spark.

I recommend using Intellij IDEA as it natively includes Kotlin support. It is an excellent IDE that you can also use with Java and Scala. I also recommend using Gradle for your build automation.

Kotlin is replacing Groovy as the official scripting language for Gradle builds. You can read more about it in the article Kotlin Meets Gradle.

Setting Up

Gradle - Build automation system, download Binary Only distribtion and unzip it to a location of your choice

You will need to configure Intellij IDEA to use your Gradle location. Launch Intellij IDEA and set this up in Settings -> Build, Execution, and Deployment -> Gradle. If you have trouble there should be plenty of walkthroughs online.

Let’s create our Kotlin project. Using your operating system, create a folder with the following structure:

kotlin_spark_project
|
└────src
|
└────main
|
└────kotlin

Your project folder needs to have a folder structure inside of it containing /src/main/kotlin/. This is important so Gradle will recognize this as a Kotlin project.
Next, create a text file named build.gradle and use a text editor to put in the following contents. This is the script that will configure your project as a Kotlin project. You can read more about Kotlin Gradle configurations here.

Finally, launch Intellij IDEA and click Import Project and navigate to the location of your Kotlin project folder you just created. In the wizard, check Import project from external model with the Gradle option. Click Next, then select Use Local Gradle Distribution with the Gradle copy you downloaded. Then click Finish.

Your workspace should now be set up with a Kotlin project as shown below. If you do not see the project explorer on the left press ALT + 1. Then double-click on the project folder and navigate down to the kotlin folder.

Right click the kotlin folder and select New -> Kotlin File/Class.

Name the file “SparkApp” and press OK. You will now see a SparkApp.kt file added to your kotlin folder. An editor will open on the right.

Using Spark with Kotlin

Let’s put our Spark usage in the SparkApp.kt file. Spark was written with Scala. While Kotlin does not work directly with Scala, it does have 100% interoperability with Java. Thankfully, Spark has a Java API by providing a JavaSparkContext. We can leverage this to use Spark out-of-the-box with Kotlin.
Create a main() function below which will be the entry point for our Kotlin application. Be sure to import the needed Spark dependencies as well. In your main() function, configure your SparkConf and create a new JavaSparkContext off of it.

The JavaSparkContext provides a Java API to create Spark streams. Thankfully, we can use the excellent Kotlin lambda syntax which the Kotlin compiler will translate into the needed Java functional types.
Let’s turn a List of Strings containing alphanumeric text values separated by / characters. Let’s break these alphanumeric values up, filter only for numbers, and then find their sum.

If you click the Kotlin logo right next to your main() function in the gutter, you can run this Spark application.

A console should pop up below and start logging Spark’s events. I did not turn off logging so it will be a bit noisy. But ultimately you should see the value of sumOfNumbers printed.

Conclusion

I will show a few more examples in the coming weeks on how to use Kotlin with Spark (you can also check out my GitHub project). Kotlin is a pragmatic, readable language that I believe has potential for adoption in Spark. It just needs more documentation for this purpose. But If you want to learn more about Kotlin, you can read the Kotlin Reference as well as check out a few books that are out there. I heard great things about the O’Reilly video series on Kotlin which I understand is helpful for folks who do not have knowledge on Java, Scala, or other JVM languages.

If you learn Kotlin you can likely translate existing books and documentation on Spark into Kotlin usage. I’ll do my best to share my discoveries and any nuances I may encounter. For now, I do recommend giving it a look if you are not satisfied with your current languages.

Friday, October 28, 2016

Can Kotlin be an Effective Alternative for Python and Scala?

As I started diving formally into data science, I cannot help but notice there is a large gap between data science and software engineering. It is good to come up with prototypes, ideas, and models out of data analysis. However, executing those ideas is another animal. You can outsource the execution to a software engineering team, and that can go well or badly depending on a number of factors. In my experience, it is often helpful to do the execution yourself or at least offer assistance by modeling towards production.

Although Python can be used to build production software, I find its lack of static typing causes difficulty in scaling. It does not easily plug in with large corporate infrastructures built on the Java platform either. Scala, although an undeniably powerful JVM language, is somewhat esoteric and does not click with everyone, especially those who do not have a software engineering background or love of expressing code with mathematical flair. But Kotlin, a new JVM language by JetBrains (the creator of Intellij IDEA, PyCharm, and dozens of other developer tools), has an active community, rapid growth and adoption, and might serve as a pragmatic alternative to Scala. Although Kotlin is unlikely to replace Python's numerical efficiency and data science libraries, it might make a practical addition to your toolbelt especially since it works with Spark out-of-the-box. as shown below.

I'll cover Spark with Kotlin another time, but you can look at this simple GitHub project if you like. Apache Spark is definitely a step in the right direction to close the gap between data science and software engineering, or more specifically, turning an idea into immediate execution. You can use SparkR and PySpark to interface R and Python with Spark. But if you want to use a production-grade JVM language, the only mainstream options seem to be Scala and Java. But as stated Kotlin works with Spark too, as it is 100% interoperable with all Java libraries.

A Comparison Between Python and Kotlin

Let's take a look at a somehwat simple data analysis case study. For now, we will leave Scala and Spark out to only compare Kotlin with Python. What I want to highlight is how Kotlin has the tactical conciseness of Python, and maybe in some ways brings a little more to the table as a language for data analysis. Granted, there are not a lot of mainstream JVM data science libraries other than Apache Spark. But they do exist and perhaps there is room for growth, and maybe the language may be worth keeping your eye on (and even exploring).

This comparison was inspired by the social media example from the first chapter of Data Science from Scratch (Grus, O'Reilly). Let's start with declaring two sets of data, users and friendships. Using simply dicts, lists, and tuples without any classes, this is how it could be done in Python.

Python

users=[

{"id":0,"name":"Hero"},

{"id":1,"name":"Dunn"},

{"id":2,"name":"Sue"},

{"id":3,"name":"Chi"},

{"id":4,"name":"Thor"},

{"id":5,"name":"Clive"},

{"id":6,"name":"Hicks"},

{"id":7,"name":"Devin"},

{"id":8,"name":"Kate"},

{"id":9,"name":"Klein"},

]

friendships=[

(0,1),

(0,2),

(1,2),

(1,3),

(2,3),

(3,4),

(4,5),

(5,6),

(5,7),

(6,8),

(7,8),

(8,9)

]

The users is a List of dict items, and the friendships are a List of Tuple items. A feature of dynamic typing is you can be fast-and-loose creating data structures that maintain a raw data-like nature. There is no enforcement to uses classes or explicit types.

For the friendships, we can actually create Pair<Int,Int> items. Kotlin does not really encourage Tuples (or any collection with differing types) and we will see what it offers instead later with the data class. But let's use Pairs in this example instead using the to operator.

val friendships = listOf(

0 to 1, 0 to 2, 1 to 2, 1 to 3, 2 to 3, 3 to 4,

4 to 5, 5 to 6, 5 to 7, 6 to 8, 7 to 8, 8 to 9

)

This may look effective at first glance, and Kotlin is statically typed. It is inferring the type for users and friendships as List<Map<String,Any>> and List<List<Int> respectively. Notice how friendships is a List containing Map<String,Any> items, meaning each item has a String for a key and an Any for the value. The reason for the Any is some values are String and others are Int (due to the "id" and "name"), and because the type is not consistent it cast them back down to Any. If we want to use Hero's "id", we need to cast it back up to an Int for it to be treated like an Int rather than a raw Any.

val herosId = users[0]["id"] as Int

Of course, if an "id" value is slipped in as a String accidentally this would throw an error. You can check if it is an Int, but at this point we are just fighting the statically-typed nature of Kotlin (just like Java and Scala). In Kotlin, we are much better off creating a class and doing things the statically-typed way. While this may make dynamic-typing advocates moan, check this out. Kotlin has a concise, readable way of declaring a class quickly and easily, even exceeding Python's standards

Python

classUser(Any):

def__init__(self,id,name):

self.id=id

self.name=name

def__str__(self):

return"{0}-{1}".format(self.id, self.name)

users=[

User(0,"Hero"),

User(1,"Dunn"),

User(2,"Sue"),

User(3,"Chi"),

User(4,"Thor"),

User(5,"Clive"),

User(6,"Hicks"),

User(7,"Devin"),

User(8,"Kate"),

User(9,"Klein"),

]

Kotlin

data class User(val id: Int, val name: String)

val users = listOf(

User(0, "Hero"),

User(1, "Dunn"),

User(2, "Sue"),

User(3, "Chi"),

User(4, "Thor"),

User(5, "Clive"),

User(6, "Hicks"),

User(7, "Devin"),

User(8, "Kate"),

User(9, "Klein")

)

Not too shabby, right? Technically, we did less typing (as in keyboard typing) than Python (76 characters less to be exact, excluding spaces). And we achieved static typing in the process.

Kotlin is certainly a progressive language compared to Java, and it even has practical features like data classes. We made our User a data class, which will automatically implement functionality typically used for classes holding plain data. It will implement toString() and hashcode()/equals() using the properties, as well as a nifty "copy-and-modify" builder by using a copy() function. (This helps aid flexibility while maintaining immutability, which is valued in software engineering).

Kotlin

data class User(val id: Int, val name: String)

val user = User(10,"Tom")

val changedUser = user.copy(name = "Thomas")

println("Old user: $user")

println("New user: $changedUser")

OUTPUT:

Old user: User(id=11, name=Tom)

New user: User(id=11, name=Thomas)

NOTE: In Kotlin, val precedes the declaration of an immutable variable. var precedes a mutable one.

Data classes are a valuable tool especially for working with data. And yes, Kotlin supports named arguments for constructors and functions as shown in the copy() function above.

Let's return back to our example. Say we wanted to find the mutal friends between two Users. Traditionally in Python, you would create a series of helper functions to assist in this task.

Python

classUser(object):

def__init__(self,id,name):

self.id=id

self.name=name

def__str__(self):

return"{0}-{1}".format(self.id, self.name)

users=[

User(0,"Hero"),

User(1,"Dunn"),

User(2,"Sue"),

User(3,"Chi"),

User(4,"Thor"),

User(5,"Clive"),

User(6,"Hicks"),

User(7,"Devin"),

User(8,"Kate"),

User(9,"Klein"),

]

friendships=[

(0,1),

(0,2),

(1,2),

(1,3),

(2,3),

(3,4),

(4,5),

(5,6),

(5,7),

(6,8),

(7,8),

(8,9)

]

defuser_for_id(user_id):

foruserinusers:

ifuser.id==user_id:

returnuser

deffriends_of(user):

forfriendshipinfriendships:

iffriendship[0]==user.idorfriendship[1]==user.id:

forother_user_idinfriendship:

ifother_user_id!=user.id:

yielduser_for_id(other_user_id)

defmutual_friends_of(user,otherUser):

forfriendinfriends_of(user):

forother_friendinfriends_of(otherUser):

if(friend.id==other_friend.id):

yieldfriend

# print mutual friends between Hero and Chi

forfriendinmutual_friends_of(users[0],users[3]):

print(friend)

OUTPUT:

1-Dunn

2-Sue

But we can do something similar in Kotlin. This is our first pass, so I'll show a better way in a moment.

Kotlin

fun main(args: Array<String>) {

data class User(val id: Int, val name: String)

val users = listOf(

User(0,"Hero"),

User(1, "Dunn"),

User(2, "Sue"),

User(3, "Chi"),

User(4, "Thor"),

User(5, "Clive"),

User(6, "Hicks"),

User(7, "Devin"),

User(8, "Kate"),

User(9, "Klein")

)

val friendships = listOf(

0 to 1, 0 to 2, 1 to 2, 1 to 3, 2 to 3, 3 to 4,

4 to 5, 5 to 6, 5 to 7, 6 to 8, 7 to 8, 8 to 9

)

fun userForId(id: Int): User {

for (user in users)

if (user.id == id)

return user

throw Exception("User not found!")

}

fun friendsOf(user: User): List<User> {

val list = mutableListOf<User>()

for (friendship in friendships) {

if (friendship.first == user.id)

list += userForId(friendship.second)

if (friendship.second == user.id)

list += userForId(friendship.first)

}

return list

}

fun mutualFriendsOf(user: User, otherUser: User): List<User> {

val list = mutableListOf<User>()

for (friend in friendsOf(user))

for (otherFriend in friendsOf(otherUser))

if (friend.id == otherFriend.id)

list += friend

return list

}

for (friend in mutualFriendsOf(users[0],users[3]))

println(friend)

}

OUTPUT:

User(id=1, name=Dunn)

User(id=2, name=Sue)

Although Kotlin seems to have lost in this example by being wordier and resorting to Lists, hold on. Kotlin has no direct concept of generators and yield keywords. However, we can accomplish something that fulfills the same purpose (and is arguably stylistically better) through Sequence.

We can use the Sequence to compose a series of operators as a chain, like filter(), map(), flatMap(), and many others. This style of functional programming has been getting a lot of traction over the years thanks to LINQ, primarily because it easily breaks up logic into simple pieces and increases maintainability. 99.99% of the time, I am never using for loops but rather using a Kotlin Sequence, a Java 8 Stream, or an RxKotlin/RxJava Observable. This chain-operator syntax is becoming less alien in Python as well (look at PySpark and RxPy). What is great about this style of programming is you can read what is happening left-to-right, top-to-bottom rather than jumping through several loops and helper functions.

Conclusions

In the coming months, I am going to blog about my experiences using Kotlin for data science, and I will continue to share what I learn. I may throw in an article occasionally covering ReactiveX for data science as well (for both Kotlin and Python). I acknowledge that the Java JVM platform, which Kotlin runs on, does not handle numbers as efficiently as Python or R (maybe Project Valhalla will change that?). But successful models inevitably need to turn into execution, and the Java platform increasingly seems to be the place that happens.

Kotlin merely provides a pragmatic abstraction layer that provides a tactical and concise syntax that seems excellent not just for data analysis, but also executing software. Outside of data science, Kotlin has spurred many successful open-source libraries even before a year after its release. One library, TornadoFX, allows rapid turnaround of complex business user interfaces using Kotlin (As a disclaimer, I have helped with that project). The Kotlin community is active, growing, and engaged on the Slack channel. It continues to be adopted on Android as well as backends, and JetBrains is using Kotlin to build all their tools (including PyCharm and Intellij IDEA). It is also replacing Groovy as the official language for Gradle. Because of these facts, I do not see Kotlin's momentum slowing down anytime soon.

I believe Kotlin could make a great first JVM language, more so than Java or Scala (I struggle to make Jython count). If you are already happy with Scala or Java you will likely not need Kotlin. But for folks wanting to break into JVM languages, there is a new O'Reilly video series that covers Kotlin from scatch. Its instructor Hadi Hariri (one of the JetBrains folks behind Kotlin) believes Pythonistas should be able to follow along. He said anybody familiar with classes, functions, properties, etc should be able to learn Kotlin in a day with this video series. Unfortunately, the existing Kotlin documentation and books assume prior Java knowledge, and hopefully more resources other than the video pop up in the future.

There is a lot of exciting features I have not covered about Kotlin in this article. Features like nullable types, extension properties and functions, and boilerplate-free delegates make the language pleasant to use and productive. So check out Kotlin if you are using Python for data science and wanting to learn a JVM language. Again, this is not a proposal to drop your current tools, but rather consider exploring an additional one that may help you tackle new problems. I will continue blogging about my experiences with Kotlin, and showcase it being used in deeper data science topics as well as Spark.