Tag Archives: open source

That’s right, as of August 1st, there are 330 active open-source projects hosted at the Eclipse Foundation and if you look across the 1120 Git repositories that this represents, you will find over 162 million physical source lines of code. But beyond this number, let’s look at how it was obtained, and what it really means.

I’ve blogged severaltimes about the importance of using metrics to monitor the health (and hopefully, growth!) of an open source project/community, and lines of code are just one. You should always have other metrics on your radar like the number of contributors, diversity, etc.

There are many ways, and many tools available out there, to count source lines of code. Openhub (previously known as ohloh) used to be a really good tool, but it doesn’t seem to be actively maintained. For a few years now, I’ve been relying on a home-made script to analyze Eclipse IoT projects, and it’s only recently that I realized I should probably run it against the entire eclipse.org codebase!

In this blog post, I will briefly talk about how the aforementioned script works, why you should make sure to take these metrics with a pinch of salt and finally, go through some noteworthy findings.

Line counting process

The script used to count the number of lines of code is available on Github. It takes a list of Eclipse projects’ identifiers (e.g ‘iot.paho’) and a given time range as an input and outputs a consolidated CSV file.

The main script (main.js) uses the Eclipse Project Management Infrastructure (PMI) API to retrieve the list of Git repositories for the requested projects and then proceeds to clone the repos and run the cloc command-line tool against each repo. The script also allows computing the statistics for a given time period, in which case it looks at the state of each repository at the beginning of each month for that period.

Once the main script has completed (and it can obviously take quite some time), thecsv-concat.js script can be used to consolidate all the produced metric files into one single CSV file that will contain the detailed breakout of lines of code per project and per programming language, the affiliation of the project to a particular top-level projects, the number of blanks or comment lines, etc.. It is pretty easy to then feed this CSV into Excel or Google Spreadsheets, and use it as the source for building pivot tables for specific breakouts.

Caveats

Just like virtually any KPI, you want to take the number of lines of code in your project with a grain of salt. Here are a few things to keep in mind:

All lines of code are not created equal

There is an incredible diversity of projects at Eclipse, and while a majority is using Java as their main programming language, there’s also a lot of C, C++, Python, Javascript, … 10M lines of Java code probably don’t carry the same value (i.e. how much effort has been needed to produce them) as 10M lines of C code.

Trends are more important than snapshots

It is nice to know that as of today there are 162 million lines of code in the Eclipse repositories, but it is, in my opinion, more important to look at trends over time. Is a particular programming language becoming more popular? Are all the top-level projects equally active?

I didn’t have a chance to run the scripts for a longer time period yet, but I will make sure to share the results when I get a chance!

Generated code, should it count?

There is a fair amount of generated code in some projects (in the Modeling top-level project in particular, of course), which certainly accounts for a few million lines of code. However, generated code often is customized, so I think it doesn’t necessarily skew the numbers as much as one would think.

Development does not always happen in a single branch

My script just looks at the code stored in the main (HEAD) branch of the Git repository. Some projects may have more than one development stream and may e.g. have a “develop” branch that is ahead of the main stable branch. Therefore, there is very likely more code in our repositories than what this quick analysis shows.

Additional findings

As my script outputs pretty detailed statistics, it is interesting to have a quick look at e.g. how the different top-level projects and programming languages compare.

Top 3 top-level projects: Runtime, Technology & Modeling

Top-level project

Physical SLOC

rt

54,961,728

technology

28,887,621

modeling

27,140,344

tools

14,214,182

webtools

9,651,900

eclipse

6,401,518

ee4j

5,809,126

ecd

3,114,768

polarsys

3,105,229

iot

2,930,217

birt

2,235,624

science

1,670,051

datatools

939,424

mylyn

767,652

soa

752,774

Top programming language: Java

Programming language

Physical SLOC

Java

72,349,870

HTML

61,119,106

XML

7,543,689

ANTLR Grammar

3,161,339

JSON

2,313,556

JavaScript

2,251,418

C++

2,245,759

C

1,446,013

XMI

1,355,914

C/C++ Header

1,019,368

TTCN

923,098

Maven

884,271

CSS

805,073

Assembly

717,771

XSD

688,764

PHP

459,237

Python

316,553

Markdown

304,421

XSLT

256,857

Scala

229,560

Bourne Shell

214,142

Go

184,306

SWIG

152,062

JSP

142,190

Gencat NLS

125,251

Ant

113,133

TypeScript

108,217

AsciiDoc

105,552

Windows Module Definition

64,843

TITAN Project File Information

64,014

Groovy

55,261

Sass

53,915

XQuery

51,432

XHTML

51,166

DTD

51,052

make

48,021

Perl

43,643

DITA

42,526

yacc

39,876

TeX

36,400

m4

34,438

AspectJ

33,717

Ruby

28,355

Scheme

27,484

YAML

26,348

CMake

25,182

Lua

23,646

LESS

18,712

SQL

16,070

Cucumber

15,454

IDL

12,564

INI

12,171

Bourne Again Shell

11,978

Pascal

11,915

lex

11,795

DOS Batch

11,675

Windows Resource File

10,278

Blade

8,295

C#

7,983

Tcl/Tk

7,611

Stylus

7,477

Fortran 90

7,211

ERB

7,048

Vuejs Component

6,281

Visualforce Component

5,047

MSBuild script

4,538

Freemarker Template

4,077

Dockerfile

3,696

Velocity Template Language

3,649

awk

3,068

Rust

2,903

Qt

2,772

CUDA

2,533

Puppet

2,084

diff

1,880

Haml

1,819

Oracle PL/SQL

1,778

ProGuard

1,739

Objective C

1,469

ActionScript

1,459

Visual Basic

1,365

Mathematica

1,247

RobotFramework

1,074

Korn Shell

1,023

D

1,007

Smalltalk

911

R

887

TOML

826

Ada

668

Lisp

618

Objective C++

589

Fortran 77

588

Arduino Sketch

480

MATLAB

476

sed

461

Protocol Buffers

454

WiX source

446

JavaServer Faces

440

PowerShell

284

Qt Project

176

Windows Message File

139

Expect

120

NAnt script

110

Smarty

109

HCL

78

CoffeeScript

78

Skylark

74

Forth

69

Qt Linguist

61

WiX include

52

XAML

49

QML

48

Handlebars

46

Clojure

38

Prolog

37

Razor

32

PO File

29

Haskell

27

JSX

24

ASP.NET

21

HLSL

15

F#

11

Swift

10

GLSL

8

Kotlin

7

C Shell

7

Mustache

1

If you end up using my script and have any question, please let me know in the comments or directly on Github!

Did you know that the Eclipse Foundation is home to many open source implementations of industry standards?

From IETF to ISO to oneM2M or OASIS, we have many open source projects that provide industrial-grade implementations that anyone can use to evaluate a given standard, or to effectively use it in their commercial solution.

We do believe that open source is key to the adoption of standards, and in a presentation I gave last week at an Open Source Think Tank organized by IEEE, I shared some thoughts on what makes a standard successful, as well as how Eclipse has proved with recent success stories that open source and open communities are a key factor.

The two examples I used in my presentation (see the slides at the end of this post) originate from the Eclipse IoT community.

OMA (Open Mobile Alliance) LWM2M is a standard for doing device management of IoT devices (i.e remotely monitor the device’s health, upgrade its firmware over-the-air, etc.). The first drafts of the standard have been published less than 4 years ago and today, LWM2M is already used in commercial products, and has a thriving community of developers and contributors gathered around two Eclipse open source projects: Eclipse Wakaama, and Eclipse Leshan. I think you will agree that this is the kind of timeline you would like to see for all standards!

The other example is MQTT, a very popular IoT protocol that I’m sure you’ve heard about! 🙂 In just a few years, it went from a de-facto standard to an actual OASIS and ISO/IEC standard. Having a rich ecosystem of open source MQTT implementations (including Eclipse Paho clients, and the Eclipse Mosquitto server) certainly helped the standards organizations to pin down the issues that need to be fixed in the spec much faster. What’s more, open source projects will also fuel the future of the MQTT specification, as they allow for new ideas to be explored (see e.g this recent work on MQTT-SN).

My hope is that Standards Developing Organizations will start embracing open source initiatives more and more. Open source communities are a great place for innovation, and can host standard implementations that sometimes actually become reference implementation. They also complement very well the role of the SDOs, which are here to enforce some needed processes when it comes to evolving a standard, anticipating incompatibilities or corner cases, etc.

As mentioned above, here are the slides I used during my presentation. I am looking forward to hearing your comments and feedback.

The IoT industry is slowly but steadily moving from a world of siloed, proprietary solutions, to embracing more and more open standards and open source technologies.
What’s more, the open source projects for IoT are becoming more and more integrated, and you can now find one-stop-shop open source solutions for things like programming your IoT micro controller, or deploying a scalable IoT broker in a cloud environment.

Here are the Top 5 Open Source IoT projects that you should really be watching this year.

#1 – The Things Network

LP-WAN technologies are going to be a hot topic for 2016. It's unclear who will win, but the availability of an open-source ecosystem around those is going to be key. The Things Network is a crowdsourced world-wide community for bringin LoRaWAN to the masses. Most of their backend is open-source and on Github.

#2 – VerneMQ

MQTT just got approved as an ISO standard. What else do you need to demonstrate that it's one of the key protocols for IoT? More open-source implementations!
VerneMQ is a highly scalable MQTT broker written in Erlang that is getting lots of interest if you judge by its 500 stars on Github!

#3 – RIOT OS

RIOT is a very impressive realtime operating system for IoT, with a very active community. For the first time this year, they are organizing a RIOT Summit – that certainly tells something about the maturity of the project!

#4 – Eclipse IoT

I could not not include Eclipse IoT in the list! 😉 The thing is, there really is a lot of cool stuff happening right now, and I think 2016 will be exciting to watch for Eclipse IoT. In particular, we're moving to the cloud, and projects like Eclipse Hono will provide a great foundation for building OSS-based IoT backends.

#5 – RHIOT

It's not a typo, both RIOT and... RHIOT in the same Top 5! Red Hat is already contributing to several open-source projects very relevant in an IoT context (e.g Apache Camel), and RHIOT is an interesting approach for implementing end-to-end IoT messaging.

Note: you can click on the pictures to learn more!

What about you? What are the projects you think are going to make a difference in the months to come?

In case you missed it, the upcoming IoT Summit, co-located with EclipseCon North America, is a great opportunity for you to learn about some of the projects mentioned above, so make sure to check it out!