A collection of thoughts, ideas, and opinions independently written by members of the MSU community and curated by MSU Libraries

Tag Archives: coding

I bumped up against the following problem while doing some coding in Java 8 (and using streams where possible). Given a vector of objects \(x_1, \dots, x_N\) that come from some domain having an ordering \(\le\), find the vector of indices \(i_1, \dots, i_N\) that sorts the original values into ascending order, i.e., such that …

I struggled a bit this afternoon creating a new MIME type and associating it with a particular application, so I’m going to archive the solution here for future reference. This was on a Linux Mint system, but I found the key information in a GNOME documentation page, so I suspect it works for Ubuntu and …

In a previous post Half of all jobs (> $60k/y) coding related? I wrote In the future there will be two kinds of jobs. Workers will either Tell computers what to do or Be told by computers what to do I’ve been pushing Michigan State University to offer a coding bootcamp experience to all undergraduates who want …

As I noted in yesterday’s post, one of the major changes associated with the new “generic” callback structure in CPLEX is that users now bear the responsibility of making their callbacks thread-safe. As I also noted yesterday, this is pretty new stuff for me. So I’m going to try to share what I know about thread …

The topic of reproducible research is garnering a lot of attention these days. I’m not sure there is a 100% agreed upon, specific, detailed definition of it, and I do think it’s likely to be somewhat dependent on the type of research, but for purposes of this post the Wikipedia definition (previous link) works for …

Since I use the Linux Mint operating system, the obvious (if not only) choice for a LaTeX distribution is TeX Live. (If you are not familiar with, or are not interested in, the LaTeX typesetting system, you have already read too far in this post.) On Mint, Ubuntu and other Debian-type operating systems, you typically …

As of version 12.7, CPLEX now has built-in support for Benders decomposition. For details on that (and other changes to CPLEX), I suggest you look at this post on J-F Puget’s blog and Xavier Nodet’s related slide show. [Update 12/7/16: There is additional information about the Benders support in a presentation by IBM’s Andrea Tramontani …

I refactored a recent Shiny project, using Hadley Wickham’s ggplot2 library to produce high quality plots. One particular feature the project requires is the ability to hover over a plot and get information about the nearest point (generally referred to as “hover text” or a “tool tip”). There are multiple ways to turn static ggplots …

Scenario: I’m running Linux Mint 17.3 Rebecca (based on Ubuntu 14.04) on a PC with a GeForce 6150SE nForce 430 graphics card. My desktop environment is Cinnamon. The graphics card is a bit long in the tooth, but it’s been running fine with the supported nVidia proprietary driver for quite some time. Unfortunately, having no …

I’m following up on yesterday’s post, “Formatting in a Shiny App“. One of the features I added to my Shiny application was the ability to identify a point in a plot by hovering over it. Since I wanted to do this in several different plots, and did not want to reproduce the logic each time, …

I’ve been updating a Shiny (web-based interactive R) application, during the course of which I needed to make a couple of cosmetic fixes. Both proved to be oddly difficult. Extensive use of Google (I think I melted one of their cloud servers) eventually turned up enough clues to get both done. I’m going to record …

(Should I have spelled the last word in the title “ResouRces” or “resouRces”? The R community has a bit of a fascination about capitalizing the letter “r” as often as possible.) Anyway, getting down to business, I thought I’d post links to a few resources related to the R statistical language/system/ecology that I think may …

I recently picked up a pair of Bluetooth headphones (Mixcder ShareMe 7) for use with my laptop (which runs Linux Mint). Getting them to connect properly was a bit of an adventure. After I had things (mostly) sorted out, I decided to script the steps necessary to get them working so that I could just …

I’m working on an optimization problem (coding in Java) in which, should various celestial bodies align the wrong way, I may need to compute the rank of a real matrix and, if it’s less than full rank, a basis for its kernel. (Actually, I could get by with just one nonzero vector in the kernel, …

Technology is continuously evolving, and for the most part that’s good. Every now and then, though, the evolution starts to look like a random mutation … the kind that results in an apocalyptic virus, or mutants with superpowers, or something else that is much more appealing as a plot device in a movie or TV …

My first introduction to Twitter was in a class on the intersection of technology and Mormonism that I took from David Wiley at Brigham Young University. During the class, David encouraged us to try experiencing the sessions of the upcoming semi-annual LDS General Conference in a new way: by following the #ldsconf hashtag. My very …

My MythTV setup recently broke in an “interesting” way, and the fix I came up with is a bit involved … so be warned, what follows is a major kludge. Setup Before explaining the pathology (and fix), I need to describe the setup. I have Mythbuntu installed on a PC whose only function is to …

A couple of weeks ago, I spent some time discussing how to use R and Web scraping to retrieve information on Twitter users’ locations, as stored in their profiles. I’ve since updated the code to scrape not only locations, but names, descriptions, locations, personal websites, join dates, number of tweets, number of users following, number …

I just wrapped up (knock on wood!) a coding project using R and Shiny. (Shiny, while way cool, is incidental to this post.) It was a favor for a friend, something she intends to use teaching an online course. Two of the tasks, while fairly mundane, generated code that was just barely obscure enough to …

Something along the following lines cropped up recently, regarding a discrete optimization model. Suppose that we have a collection of binary variables $x_i \in B, \, i \in 1,\dots,N$ in an optimization model, where $B=\{0, 1\}$. The values of the $x_i$ will of course be dictated by the combination of the constraints and objective function. …

Fair warning: most of this post is specific to Linux users, and in fact to users of Debian-based distributions (e.g., Debian, Ubuntu or Mint). The first section, however, may be of interest to R users on any platform. An alternative to “official” R By “official” R, I mean the version of R issued by the …

The Monty Hall problem is very famous (Wikipedia, NYT). It is so famous because it so easily fools almost everyone the first time they hear about it, including people with doctorate degrees in various STEM fields. There are three doors. Behind one is a big prize, a car, and behind the two others are goats. …

I just finished adding a feature to a utility library I use in Java projects that employ either CPLEX or CP Optimizer. In addition, I moved the files to a new home. The library is free to use under the Eclipse Public License 1.0. The code is mentioned in previous posts, so I’ll just quickly …

Something weird happened with SSH today, and I’m documenting it here in case it happens again. I was minding my own business, doing some coding, on a project that is under version control using Git. After committing some changes, I was ready to push them up to the remote (a GitLab server here at Michigan …

I’ve been trying to get my kids interested in coding. I found this nice game called Lightbot, in which one writes simple programs that control the discrete movements of a bot. It’s very intuitive and in just one morning my kids learned quite a bit about the idea of an algorithm and the notion of …

My laptop is not exactly a screamer, but it’s adequate for my purposes. I run Linux Mint 17 on it (Xfce desktop), which uses Thunar as its file manager. Not too long ago, I installed the RabbitVCS version control tools, including several plugins for Thunar needed to integrate the two. Lately, Thunar has been incredibly …

As part of a recent analytics project, I needed to convert strings containing (English) names of months to the corresponding cardinal values (1 for January, …, 12 for December). The strings came from a CSV file, and were translated by R to a factor when the file was read. The factor had more than 12 …

Remember that Python usage survey that went around the interwebs late last year? Well, the results are finally out and I’ve visualized them below for your perusal. This survey has been running for two years now (2013-2014), so where we have data for both years, I’ve charted the results so we can see the changes …

One of the assignments in the R Programming MOOC (offered by Johns Hopkins University on Coursera) requires the student to set up and utilize a (free) Git version control repository on GitHub. I use Git (on other sites) for other things, so I thought this would be no big deal. I created an account on …

This recipe provides a time-efficient way to determine whether you’ve saturated your sequencing depth, i.e. how much new information is likely to arrive with your next set of sequencing reads. It does so by using digital normalization to generate a “collector’s curve” of information collection. Uses for this recipe include evaluating whether or not you …

This is a recipe that provides a time- and memory- efficient way to loosely estimate the likely size of your assembled genome or metagenome from the raw reads alone. It does so by using digital normalization to assess the size of the coverage-saturated de Bruijn assembly graph given the reads provided by you. It does …

The below is a recipe for subsetting a high-coverage data set to a given average coverage. This differs from digital normalization because the relative abundances of reads should be maintained — what changes is the average coverage across all the reads. Uses for this recipe include subsampling reads from a super-high coverage data set for …

In recent days, we’ve gotten several requests, including two or three on the khmer mailing list, for ways to extract shotgun reads based on their coverage with respect to the reference. This is fairly easy if you have an assembled genome, but what if you want to avoid doing an assembly? khmer can do this …

Two years ago, I posted an example of how to implement Benders decomposition in CPLEX using the Java API. At the time, I believe the current version of CPLEX was 12.4; as of this writing, it is 12.6.0.1. Around version 12.5, IBM refactored the Java API for CPLEX and, in the process, made one or …

Nik Sultana, a postdoc in Cambridge, asked me some questions via e-mail, and I asked him if it would be OK for me to publish them on my blog. He said yes, so here you go! How is the quality of scientific software measured? Is there a “bug index”, where software loses points if it’s …

It’s been well over a year since I wrote my last tutorial, so I figure I’m overdue. This time, I’m going to focus on how you can make beautiful data visualizations in Python with matplotlib. There are already tons of tutorials on how to make basic plots in matplotlib. There’s even a huge example plot …

I had to delve into the CPLEX documentation today, and found something I had not seen before. As part of a (Java) program I’m writing, I need to use the conflict refiner to track down which upper and lower bounds on variables take a role in making a linear program infeasible. Of course, I could change the …

We just released khmer v1.1, a minor version update from khmer v1.0.1 (minor version update:220 commits, 370 files changed. Cancel that — _I_ just released khmer, because I’m the release manager for v1.1! As part of an effort to find holes in our documentation, “surface” any problematic assumptions we’re making, and generally increase the bus factor of the khmer project, …

I have recently dedicated myself to learning R, a programming language and environment for focusing largely on statistical analysis and computing. The benefit of using R over other statistical computing packages is that it is free, open-source, and has a hugely active community around its use. R can be used cross-platform (PCs, Macs, and Linux) …

As part of the 2-day Mozilla Science Labs hackathon in late July, the khmer project will be providing a “mentored open source contributathon” experience. This will provide an opportunity for people interested in trying out our instance of the “github flow” model, in which contributions are submitted for review using a pull request. Since our …

tl;dr? I played around with building a CountMin Sketch that is dynamic in size, based on a scalable Bloom Filter approach. I’m not sure it worked. Thoughts, suggestions, help? Bloom Filters In our research, we’ve made some hay using Bloom filters. They’re remarkably easy to implement; I’ve talked about them a couple of times on …

I’m on a European trip that involves several plane flights accompanied by long airport stays, and I just used some of that time to do a bit of tedious coding on khmer. The coding I did was to add proper exception handling to khmer’s internal file loading routines (see the pull request). The old behavior …

Late last year, the NY Times released an article quoting a specialist working on the HealthCare.gov web site: According to one specialist, the Web site contains about 500 million lines of software code. By comparison, a large bank’s computer system is typically about one-fifth that size. This astronomically large number became the subject of intense …

A few years back I was coding (in Java, of course) the <shudder>GUI</shudder> for a research program. I needed to provide controls that would let a user specify priorities (0-100) scale for various things. Two possibilities occurred to me, with pretty much diametrically opposed strengths and weaknesses. Sliders have a few virtues. Grabbing and yanking …

A bit more than a year and a half ago, I wrote some Java code to facilitate setting parameters for the CPLEX optimizer using their Concert API. Since then, I’ve added support for their CP Optimizer, and IBM has refactored the handling of parameters in CPLEX, necessitating an update to my code. This post (which …

After years of coding CPLEX applications in Java, I’ve just started working with CP Optimizer (the IBM/ILOG constraint programming solver) … and it did not take me long to run into problems. As with CPLEX, you access CP Optimizer from Java through the Concert API. As always, I am using the NetBeans IDE to do …