I am looking to write a new open source library (at github.com/me/my_lib, say). Rather than starting from scratch I would like to begin with a copy of the code from an existing project (github.com/them/their_lib). Although their_lib is an active project, my_lib needs to be independent because its goals are quite different.

I expect that by the time my_lib sees its first release, I will have removed about 1/3 of their_lib's code, and re-written another 1/3. The two projects will therefore have about 1/3 of the original code (on the order of 10,000 lines) in common.

How should I structure my git repository/ies to ease both the development of my project and, where possible, two-way sharing of code between the two libraries (e.g. bug fixes)?

Should I start me/my_lib as a fork/clone of them/their_lib (i.e. including the full history), or would I be better off just importing the latest release of them/their_lib (perhaps leading to a tree structure like that illustrated here)?

To share code in the their_lib => my_lib direction, how should I keep up to date with changes in them/their_lib? In the opposite direction, will I need a separate fork me/their_lib to use for pull requests?

2 Answers
2

If you want any kind of two way sharing of the code, you should start by discussing with the original project IMHO.

If you were to fork a project of mine then modify a third and remove another third of it, I would be quite unlikely to accept any changes back as the effort to review these changes would be immense. I would likely politely decline or ignore. So the problem is not how to structure your code, but rather how to engage in a discussion with the upstream library your are forking and whether any code sharing would be possible.

In all case starting with a fork and documenting all your changes in small coherent bitesized changet sets maintained in different branches is always a better way to go.

First of all, are you sure you want to do this? I have had this experience, starting a project by forking another one whose objectives were different. I thought I would be saving time this way. It turned out I lost an incredible amount of time instead and was always constrained by an architecture that was not the one I would have built if I had started from scratch. Now, this is just my experience and yours could turn out differently but think twice before taking this path.

Structure

Now let's say there are good reasons for doing this anyway (for instance, the 1/3 of code you won't touch is not something you would be able to build yourself). Then try to isolate what you change. It would be good if, for instance, you could extract the files you are interested in keeping as a library your new code would depend on; and if you could keep the original structure as much as possible in these files. This way, if you fix a bug in the part that you didn't touch, it will be easier to try to apply the same patch to the original code.

git history

I honestly don't know about this one. My gut feeling would be to keep the history.

Contributing back

You will need either a special fork for that, or at least a special branch that you keep synced with the original repository. This way, you will apply your patches (bugfixes) to their code, and test it on their code, before doing any pull request. It will make reviewing much easier for them.

Importing updates

Depending on how different the two projects look like, it might be very difficult to follow updates from the original repository. One solution to make your life easier would be to convince the original project owners that the sub-library that you extracted makes sense to be maintained separately (or at least as a well identified sub-block of the original project). This way you could update this sub-library specifically much more easily (it would just be a dependency of your own project).