Recent postings on social media have argued over who should receive credit for this work. Some have credited Katie Bouman for the image, which others have contested as overstating her contributions to the overall project.

For example:

According to data provided publicly by GitHub, Bouman made 2,410 contributions to the over 900,000 lines of code required to create the first-of-its-kind black hole image, or 0.26 per cent. Bouman’s contributions also occurred toward the end of the work on the code.In contrast, contributor Andrew Chael wrote over 850,000 lines of code. While CNN attempted to give Bouman full credit, explaining “That’s where Bouman’s algorithm — along with several others — came in,” they slyly admitted that fellow researchers told CNN “‘(Bouman) was a major part of one of the imaging subteams,'” even after CNN incorrectly wrote on the previous line that she was on one of the “imaging teams,” not subteams.

This analysis seems to disregard the way collaborative scientific research actually works; are the metrics being discussed sufficient to measure this kind of contribution, or the impact someone has on a project of this nature?

Reminder to prospective contributors: 1) do not answer in comments, we will nuke them aggressively without notice; 2) original research is not allowed on this site, please base any answers on reputable sources and not on your personal opinion of what should count as a contribution.
– Sklivvz♦Apr 12 at 9:48

11

@TimPost: your edit to the title has completely altered what is being asked, invalidating the existing answers. Please revert or give a solid explanation for your change?
– Jack AidleyApr 13 at 6:12

14

The 5th revision completely changes the meaning of the question. The CNN quote is still complete nonsense, but the titular question is completely different. That is not a change that should have been made.
– PolygnomeApr 13 at 8:19

1

Please DO NOT change the claims or add new claims to a question that already has good answers.
– DJClayworthApr 16 at 14:06

1

If it's a new question, please use the "New Question" button. There was a question about lines of code and it was answered. Asking about the use of the algorithm is a new question.
– DJClayworthApr 16 at 14:21

3 Answers
3

The old title asked "Did researcher Katie Bouman only contribute 0.26% of code that created Black Hole image," and the existing answers do a good job explaining why it isn't true and why lines of code aren't a useful metric. The new title, however, asks "Was credit for the black hole image misappropriated?" and the correct answer should appear rather differently.

On the one hand, we know that Bouman deserves a large share of credit.

"(Bouman) was a major part of one of the imaging subteams," said Vincent Fish, a research scientist at MIT's Haystack Observatory.

For the past few years, Bouman directed the verification of images and selection of imaging parameters.

"We developed ways to generate synthetic data and used different algorithms and tested blindly to see if we can recover an image," she told CNN.
"We didn't want to just develop one algorithm. We wanted to develop many different algorithms that all have different assumptions built into them. If all of them recover the same general structure, then that builds your confidence."

That's where Bouman's algorithm -- along with several others -- came in. Using imaging algorithms like Bouman's, researchers created three scripted code pipelines to piece together the picture.

However, Bouman does not deserve all of the credit. So, for the claim to be true, we would need to see that journalists claimed that she deserves all of the credit.

OP has since added the text of a tweet that they claim supports a view that Dr. Bouman deserves zero credit. So it seems like OP is coming from the standpoint that, if she has been given any credit is is therefore misappropriated or misapplied. You might want to address that as well.
– GalacticCowboyApr 13 at 20:53

The phys.org piece seems to be a syndicated AFP text.
– E. P.Apr 15 at 7:28

The metric does not measure what it is claimed it does, and even if it did it would be meaningless for assessing the role of Dr. Kate Bouman in creating the image. I'll go on to why, but I first want to draw particularly attention to the fact that Dr. Bouman has explicitly rejected the idea that she deserves sole credit:

But Dr Bouman, now an assistant professor of computing and mathematical sciences at the California Institute of Technology, insisted the team that helped her deserves equal credit.

The effort to capture the image, using telescopes in locations ranging from Antarctica to Chile, involved a team of more than 200 scientists.

"No one of us could've done it alone," she told CNN. "It came together because of lots of different people from many different backgrounds."

The primary reason the metric is meaningless is that Dr. Bouman is credited with developing an algorithm not with typing lines of code, so any metric measuring code production is simply not measuring the thing she is credited with doing. She could have typed not a single character and still designed the algorithm that played a key role. It's like trying to measure the input of an architect by how many bricks they laid in a building.

Additionally, the project is broader in scope than simply implementing the algorithm credited to Dr. Bouman. Large amounts of code are involved in simply loading and co-ordinating files, displaying and saving images, and the like. All of which is necessary to the project at large but not specific to the algorithm used.

Finally and least importantly, the statistics in GitHub are not even measuring Lines of Code written - as claimed in the source - they are measuring lines changed in submission. Those lines can be code, or a change to a line, or a line copied between branches, even blank lines. In fact, (and, hat tip @Polygnome) the count includes lines which aren't even in the code at all, as there is also data and documentation included.

Also, some of the best changes to a repository is when one can replace 100 lines of code with 10.
– Per AlexanderssonApr 14 at 16:57

12

This answer fails to mention CHIRP, the actual algorithm that Dr Bouman was credited with developing years ago, or how it relates to the black hole image, or how it relates to the code base in question. I believe the answer is "it was essential" and "it has next to nothing to do with that code base", but I am uncertain; if right, however, it means that we are measuring how much a sprinter who happens to be a licensed pilot helped win medals by how long he was in the airplane cockpit of the plane that flew them to the olympics.
– YakkApr 14 at 17:13

1

@Yakk: yeah, the question was changed completely after I answered. I was addressing the more specific claim highlighted in the original version.
– Jack AidleyApr 14 at 17:30

3

The whiteboard is mightier than the github commit.
– mckenzmApr 15 at 5:19

"While I wrote much of the code for one of these pipelines, Katie was a huge contributor to the software; it would have never worked without her contributions and the work of many others who wrote code, debugged, and figured out how to use the code on challenging EHT data.

"With a few others, Katie also developed the imaging framework that
rigorously tested all three codes and shaped the entire paper.

"As a result, this is probably the most vetted image in the history of
radio interferometry. I'm thrilled Katie is getting recognition for
her work and that she's inspiring people as an example of women's
leadership in STEM.

@JeromeViveiros can you edit the answer to reflect the new question?
– SSimonApr 13 at 10:44

17

"that is the percentage of the code she contributed" (emphasis mine) - this isn't correct, since most of those lines are not actually "code" i.e. while at a quick glance it looks like achael wrote most of the python in the repo, they did not write 850,000 lines (unsurprisingly). find . -name '*.py' | xargs wc -l shows there are about 36K of what a reasonable person would call "lines of code"
– jberrymanApr 13 at 23:08

3

@jberryman They did write 850k lines of code. The fact that most of these lines were overwriting other lines does not mean they did not write them. A painter can easily say that they created a drawing with 850k strokes, even though the outermost layer of paint that is visible at the end of the process only contains 36k strokes.
– BakuriuApr 14 at 20:41

9

@Bakuriu: Chael said no. The majority of that 850k is data, not code anybody wrote. «Also I did not write "850,000 lines of code" -- many of those "lines" tracked by github are in model files. There are about 68,000 lines in the current software, and I don't care how many of those I personally authored.» See: twitter.com/thisgreyspirit/status/1116518550297096194
– GáborApr 14 at 20:57

4

@Bakuriu no, most of those lines are data of some sort, e.g. what look like trained model parameters. (I realize I wasn't clear about that in my comment). I spent about 3 minutes looking at the repo, so I can't give you a more detailed analysis, but just enough to know that the way the data was being interpreted is nonsense, and people are terrible
– jberrymanApr 14 at 20:57

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).