Software engineering will always be an uncomfortable fit with the traditional engineering disciplines. One of the key issues is the fact that all the other engineering disciplines create physical artifacts but software engineering does not. This difference means that the basis in physics and chemistry shared by all the other engineering disciplines is simply not relevant to software engineering.

This week I had a graphic reminder of this gap when attending the annual conference for the American Society for Engineering Education, where I presented several papers related to software engineering education. The exhibit hall was filled with vendors selling engineering education products, many of which involved equipment or scale models of large artifacts like bridges. Reflective of the relatively minor presence of software engineering at the conference, there were no vendors in the large exhibit hall who were positioned to support software engineering education.

This minor representation for software engineering reflects a national problem. Federal projections indicate that we should be graduating about five to seven times the number of computing majors that we are now graduating. Software engineering majors should be a key part of that group. “Software engineer” has even topped the list repeatedly in recent years as the best career opportunity available. Any yet the number of undergraduate programs in software engineering nationally is in the low 30’s and most of those programs have small numbers of students.

The lack of software engineering majors is a looming national economic problem. It’s a problem for the other engineering disciplines too. While browsing the exhibit hall at ASEE, I couldn’t help but note the extensive integration of software with all those displays of engineering equipment. With almost every exhibit, like the one shown to the right, there was a laptop or tablet that was used to provide controls or models or processing. In a profession where concern for attributes like reliability and performance are typical, the software engineer in me was inclined to guess that all that software was likely to be a weak link in many of these products. Until we start to take the challenges of software engineering more seriously, software will remain a weak link in engineering artifacts and beyond.

Data science, data analytics, and big data are all topics that have a rising buzz in the last few years. As with many “new” tech topics, much of what these terms encompass is not new at all. There clearly are ties to existing activity in areas like data mining, decision support systems, business intelligence, visualization, etc. So what’s new and why the new terms and growing buzz?

One key to the shift in discussion clearly is the data itself. There are several categories of data that are simply exploding in size and importance. In trying to get your head around data science, it seems useful to categorize the types of data involved. My current mental model is that there are three broad categories of data that seem relevant to the discussions of data science. They are:

Human Generated Data

The volume of data published on the Web by individuals is truly one of the amazing features of our time. And the publication rate and variety of this data continues to accelerate. For anyone interested in what people are doing and thinking, this is a total game changer. Some examples of data in this category are:

Clickstreams and navigation histories of Web activity

Tweets – person to person message interactions

Facebook, Linkedin, and semi-public records of people’s lives and interactions

Citizen science – data gathering in support of science by interested non-scientists

Device Generated Data

There have been devices that generate massive amounts of data for decades, with areas like medicine, lab science, and aerospace providing ready examples. But the number and type of devices that create large data streams accessible via the Web is rising sharply. Projecting forward to fully instrumented intelligent infrastructure implies that the history of device generated data is barely a trickle compared to the future. Some examples of data in this category are:

Scientific devices – e.g., medical and molecular imaging

Sensors – intelligent infrastructure

Video and audio capture – traffic cams; security cams

Newly Accessible Data

As more and more of the world’s data shifts online, there are legacy data sources that take on new meaning. Much of this is data that was previously paper or computerized but off the Net. It includes data that may have been previously available, but that was prohibitively expensive and time consuming to access and aggregate. Examples of data in this category are:

Real estate transactions

Legal filings

Price data

——–

The iSchool at Drexel has active research efforts that address a variety of topics related to data science. Our degree programs increasingly address these topics too. And clearly the development of education for data science has just begun.

The recent announcement by Blackboard (Bb) that it was acquiring two Moodle service providers was quite interesting to anyone who follows open source in higher education. Over the years, Blackboard has emerged as a market leader in the Learning Management System (LMS) arena, through both product development and acquisition. At the same time, Blackboard has attracted considerable heat and a large dose of scorn for a patent the company filed and tried to enforce. That patent was viewed by many to be an attempt to corner the LMS market and to claim invention of many LMS features in use well before Blackboard’s supposed date of invention. Coverage of the long story and eventual Blackboard loss in the courts can be found here. Particularly for fans of open source, this sort of behavior does not make Blackboard an admired company, and acquisitions in the Moodle niche are much more likely to raise eyebrows than cheers.

It’s interesting however to see how Blackboard explains this latest move. It’s also important to note that Blackboard recently returned to being a private company after trading publicly for some years. That switch may have provided increased flexibility in strategy formation.

Blackboard’s strategy already includes multiple learning platforms due to acquisitions. The company has also broadened its scope beyond the LMS niche to address a range of educational institution application needs, including a push into areas like student services. Finally, Blackboard also grows by providing services, not just software. Taken together this means that accommodating the open source world makes sense for Blackboard in two ways:

Enterprise sales – In the push to cover the education enterprise, Bb will sometimes be sole provider for an institution across the whole Bb product line. But much more often, like any enterprise vendor, Bb will sell some applications and need to co-exist with products from other vendors in other applications. Open source is just another flavor with which to co-exist.

Services – To the extent that Bb is a service provider, large open source projects like Moodle and Sakai create a business opportunity. Blackboard clearly is moving to be a service player for both of these open source communities.

So, in spite of the history that seems to make Blackboard an unlikely candidate for good citizenship in open source communities, it’s not hard to see a business case for moving in that direction. And this step in the evolution of Blackboard makes an interesting case study for the continuing evolution of open source as a significant, not to be ignored, part of the software industry. Of course, the case study is still being written. And open source advocates who have followed Blackboard over the years will be excused if they want to wait to see how this plays out!

Over the few years that I’ve been exploring the open source world, I’ve come to realize that there is quite a bit about open source that most people, including most technical people, don’t understand. Since I’m a faculty type, I got beyond some of this early on by looking at research literature. As with many technical topics, the growth of open source means that it has attracted a good bit of researcher interest. See for example, Deek and McHugh or FLOSShub. Most people don’t have much tolerance for wading through research papers though, so many of the things known about open source are not widely known.

One of the misconceptions has to do with the number of developers on most projects. People seem to expect that projects have lots of developers, when just the opposite is true for most projects. Research studies show that the average number of developers across the broad sweep of FOSS projects is one per project. That’s right, most projects have a single developer!

The community team at Source Forge recently blogged about this and published a nice graph showing the distribution of developers by project. The steep drop-off in that curve tells the tale. Source Forge “About” currently indicates that there are 324,000 projects on the site. 269,000 of them have only one developer. Yes, the large and popular projects mostly have quite a few developers, but only 21 have over 100, and of those 21, only 7 are over 200.

This preponderance of single developer projects and overwhelming majority of projects with no more than a small development team presents a very different picture of the FOSS ecosystem. Clearly, one reason for this picture is that forges contain many projects that have been started but never really gone anywhere. But, in terms of student participation in open source, it implies lots of opportunity. Given the large number of projects, there clearly are going to be quite a few that could use some additional developers.

Finding the sweet spot on that curve of development team size is one of the challenges for getting students involved in FOSS. I don’t think there is a magic team size, but rather that team size is one of the factors that should be considered in project selection. We’ve been working on a framework to help faculty with this problem of selecting projects for student participation. This will need additional development, but we recently presented initial ideas at the ACM SIGCSE annual symposium. The paper is:

I’ve been increasingly involved in the world of Free and Open Source Software (FOSS) in recent years, and that involvement has made me re-think the role of blogs. Blogs always seemed like an interesting development in the evolution of the Web, but didn’t have much appeal to me personally. As I came to understand the FOSS world however, I had to re-consider blogging. If you follow FOSS, it becomes clear fairly quickly that blogs are a key communication vehicle, and also a key mechanism to establish presence and credibility in the FOSS community. So I decided to blog as part of joining the the FOSS world.

That was almost a year and a half ago. As you can see, my initial blogging effort consisted of exactly one post. I’m sure that there are many blogs that are started with a single post and stop right there, so this isn’t a surprising result. But it is interesting to consider why this might be so. In particular, it seems that professionally oriented blogs are an uneasy fit (at best) with professional life.

In my case, the profession is being a faculty member at a research university. Writing is part of the job, but not the sort of writing that appears in a blog. Academic culture is very much more about publication of polished, finished products. And publication also includes a filtering process (and stamp of approval) provided by the peer review and editing process typical of academic publications. Publications that have not been through that filtering and approval process are not valued much, and faculty have little incentive (or have actual disincentive) to spend time on other writing, like blogging.

While academic culture is relevant to my failure to blog, I’m also struck that similar cultural biases exist in the commercial world. In academia, the writing issue is primarily related to reputation of an individual. In the commercial world, the concern is much more about reputation, intellectual property, and liability of the organization that employs the blogger. But the effect is much the same in creating no incentive and some disincentive to blog.

So yes, on one level I was just “too busy” to blog. But I managed to get to a whole bunch of other things during this time of being “too busy”. Blogging never got the priority in part because the openness that a professionally oriented blog implies just doesn’t fit the culture that surrounds me. It seems that this issue applies to all attempts to marry openness principles with existing organizational cultures and personal work habits. That doesn’t seem insurmountable, but it’s an issue to remember when encouraging openness in the workplace and among students.

Last week I attended the Grace Hopper Celebration of Women in Computing held in Atlanta. As you might expect, the attendees are predominantly women, and roughly half are current students. I noted to several of my fellow faculty members that the experience was something like seeing a mirage. Many of us with interest in computing education would like to see many more women among our students. The number of students in computing majors remains far to low to meet the projected demand for computing graduates. And current representation of women in computing majors is dismal. Given that women represent roughly 50% of the population, they are by far the largest under-represented group among our majors. So if computing were more successful in attracting women as majors, the potential to really solve the overall shortage in majors is excellent. But thus far, this goal has been elusive.

Seeing so many women students in one place is refreshing and a sharp contrast to the everyday experience in our classes. A bit like a mirage in that we don’t have that sort of concentration of women computing students at any one institution. But also a bit like an oasis because the gathering was a very concrete reminder that a substantial population of women in computing does exist.

Another bright spot in the conference was a chance to catch up with some of the people who share my interest in having students participate in communities that develop open source software. The picture below shows a gathering of old friends and new sharing ideas for Teaching Open Source over lunch. My thanks to Mel Chua of Red Hat for supporting our gathering!