Henk has been doing a fairly bang up job keeping the two repos in sync, but he has concerns - since he's not here to share them I'll paraphrase them:

1) Our coding methods do not support the github model - We use long lines of tab spaced coding; github does better with multi-line objects2) PrettyLST usage pretty much foobars the order of objects and can cause havoc for any merging by other users

That all said, after the craziness with every single RC since I've joined would be completely avoidable if we went with a more traditional Git Workflow. Let me explain:1) Our Releases follow traditional program release steps -a) alpha phase where new features are scoped out (roadmap) and then added by various teamsb) beta phase where new features are now entered and tested for bugs (primary focus in this phase is to correct any bugs that arise during testing)c) RC phase where all fixes are tested and we make sure the product is stable for a production released) Production release

Well, once we move to b) all efforts for any new features go on hold. This means for a significant period of time, 6-12 weeks, no new development is worked on cause BRANCHING and MERGING in SVN is not pretty and is time intensive.

As you can see, if we move to GITHUB, then no features go on hold, and we're not calling for "Commit" freezes. Instead, people can work on what they desire and our main repo pulls in the relevant items for a successful release, while other work is in progress and doesn't affect the release cycle.

My only concerns mirror Henk's.

Should LST files move to multi-line format for greater ease?

Should PrettyLST only be used for Releases and not normal development?

Then of course, or JAVA coders need to be onboard, and learn the new format and procedure, some we have a learning curve which would slow down. But with James already immersed in GIT, and Henk assisting, I think moving towards GIT and GITHUB for our releases is a good thing.

Pretty much any developer (Java or no) should be familiar now with Git and GitHub - it's the lingua franca of collaborative software these days (rightly or wrongly). I think that with the LST files people can just continue to use local diff tools (GitHub's UI for wide files is poor).

Yes, but our developers aren't all GIT savvy or currently on GITHUB yet. That was one of the main reasons we didn't migrate to GIT or GITHUB initially.

Back then, only Henk was proficient with GIT and handled the mirroring on github for us. Since that time, James and myself are now GIT proficient. While Tom, Stefan and any other active monkeys would need to be brought up to speed.

I think Henk's concern was LST file merges. If PrettyLST is used, then it becomes harder for github to handle the PR merging for them, and would be another hurdle to overcome. I don't know, I haven't had to deal with that personally.

With the move to JAVA 7, I think now would be a great time to migrate to GITHUB. It's a fresh cycle, and with the traditional break after a release, a perfect time to get everyone familiar with the process, and set up before we swing into action.

Yeah, the thing that catches most people out is that GitHub != Git. Git has no concept of the Pull Request model, once you get people understanding that subtle split it's actually fine (diagrams!). Once they install the hub command line tool, it gets even better.

I prefer the Tortoisexxx context menu tools, with the number of files I work with, having the commands integrated is handy, plus it integrates into my primary editor. Though I should start getting my git commands down by heart.

The biggest difference would be making a branch for each feature/issue. Cleaner method to know what all was addressed. Squashing and Rebasing are handy. The fact you can test the changes in the branch safely is even better.

I've been working intensively using the feature branch/pull-request model in git for a while now and it is a great approach. The ability in GitHub to fork and submit a pull request is great and makes it very easy to contribute to a project. So I'm quite comfortable now with a move to GitHub. We just have to make sure all our history is preserved.

Another consideration is our project structure. Git is geared around small contained projects. We have a main project but some other sibling folders (e.g. NFD). I think these should each be a repository under the pcgen project.

I have a shell account that does the sync from SVN to Github automatically, but the reverse needs to be done manually. I just got Andrew up to speed today and he's been merging PRs as if he's been feeding Orio's to a toddler (believe me, they go down well).

So if there are any Git experts out there that feel like chiming in, please step forward. There will be conflicts that need to be resolved. Especially because the sync entails git push --force to Github which can frag mergeability of PRs.

So the topic of having all objects described on a single line is actually not very much related to Git. It is related to merging.

Conflicts in edits on objects are hard to fix

For example, say I want to fix a couple of lines in the new Pathfinder Advanded Class Guide that Mark Means has been coding on a lot lately. I work on it for an hour, then try to commit to SVN. The commit fails, because Mark has changed the same line.

This is fine if the fix is small. This is not fine when the fix was 200 lines. So now I have the options of starting over. Or trying to resolve the conflicts. Well, you'll probably start over.

So trying to solve the problem entails scanning long lines for the delta, then interpreting it, etc. That is hard work that takes a lot of time of people that could have solved more problems in the same sparse time they have for PCGen.

Conflicts in edits on objects can disappear or be easy to solve

Of we move to multi-line objects, that problem will 1) occur less 2) be easier to solve:

1) It will occur less, because svn or git know how to automatically merge things that are problematic in SVN like a) adding two key to the same object (just adds two lines, instead of the modifying same line twice), b) removing two keys (removes two lines instead of modifying the same line twice), c) changing two key (changes two lines instead of modifying... you get the picture).2) If there are conflicts, you will instantly see the changes and the conflicts on their own line instead of the whole object-line.

It is actually surprising to me that noone brings this problem up, ever. That means that either people are okey with it. Or they never encounter this problem, maybe because different people always edit different files, etc.

What does it have to do with git?

Now why does this come up during the Git discussion?

1) Partly because I brought these up one after the other. I think that multi-line objects would be an improvement for our SVN workflow anyway.2) We will be merging MUCH more often when using Git. Look at the release strategy Andrew has top-of-mind. We could do all sorts of cool stuff that are hard to do in SVN. Like just cherry-picking a patch to the 6.2 branch. Easy, but not with multi-line objects.

When I joined the project I did not understand why we would have a concept of a prettylst program. This was before I knew that prettylist does more than keeping keys in order and spacing them with tabs.

I understand that we have tools that act on objects to keep them up to the latest standards. When I added hyperlink to the PRD for all Pathfinder objects, I automated that. I would never consider doing that by hand. I am lazy, so I understand the need to automate that. So my objections are not with the sanitizing of objects. Because of that I consider it a really valuable tool!

But my objections stem from the way it replaces ALL lines with all new lines. What happens after prettylist has come by?

All commit information is gone and all files were last edited by amaitland

Everyone working on files needs to reset all their work and start again because merging single-line object files is impossible (see earlier post).

It has just re-ordered keys and added tabs. Result? So many, many gratituous changes resulting in merge conflicts

But I never really understood why the tabs were important. I still do not really know now, but I once asked and it seems like some people edit LST files in Excel or in a tabbed based editor. I do not know who still does that and maybe it is worth asking around or doing a poll.

A couple of questions about that:

Who actually depends on the tabs while editing?

Who actually prefers the single lines?

Who cares about the order of keys?

If it turns out that nobody cares about these things, then removing that functionality from prettylist would solve a LOT of my concerns about it.

henkslaaf wrote:So the topic of having all objects described on a single line is actually not very much related to Git. It is related to merging.

What does it have to do with git?

Now why does this come up during the Git discussion?

1) Partly because I brought these up one after the other. I think that multi-line objects would be an improvement for our SVN workflow anyway.2) We will be merging MUCH more often when using Git. Look at the release strategy Andrew has top-of-mind. We could do all sorts of cool stuff that are hard to do in SVN. Like just cherry-picking a patch to the 6.2 branch. Easy, but not with multi-line objects.

Was that meant to be 'Easy, but not without multi-line objects.' Otherwise, it seems like you're arguing against multi-line...

Nylanfs wrote:When editing a large set like the rsrd or pathfinder having an object on one line and then having all the same tags line up are fairly important to me.

Yeah, there are bound to be use-cases. Just wondering which ones. I have a vim plugin that transforms from single-line to multi-line when opening the file, I've never missed it.

Just out of curiosity: what file editor are you using?

As a Data Chimp - having each main item on the same line with all the objects organized allows for much faster review of sources. Everything lines up, it gives a uniform look to the data (It's pretty!). For editing, grouped objects like KEYs are easier to manipulate, cause I can use Column editing. Mind you, most people don't see the uniform appearance cause it uses a default tab space from the standard today. If you set the space to 6, it lines up nicely. Picking out an error with rows upon rows of uniform is easy, especially if you have both syntax coloring and same word matching highlight. In case you wondered, my tool of choice is Editplus. Nice integrated tools, multi-tab file switching, project management, easy directory navigation, and file selection, also supports GIT and SVN in editor - For individual file, and by Directory.

Multi-line is much easier for first entry, and bulk fast review for each individual group when first inputting. But big picture, multi-line for the standard means it becomes harder to spot group aberrations.

Pros and Cons to everything.

The merging hasn't been a big issue since most developers don't work on the same files at the same time. There is a reason we have JIRA. First it gives us an idea of who is working on what issue. And luckily, since I've been pushing for uniformity in coding, and got away from shared pool items across the board, most features and fixes are isolated to set specific files. Obviously this isn't the case when doing a Game System wide change. And that is also why during the end of the Beta cycle, an all call for commits happens, so when we do a prettylst run, we don't blow people's work out of the water.

The idea of prettylst was two fold:1) Originally it handled conversions. This is only true till version 5.14 of PCGen. Since 5.16 forward, we now use the in-system converter for all major conversion updates. Anything left behind is updated manually. (We lack any PrettyLST developers - PERL code, to feasibly maintain or upgrade).2) It's focus is to make the objects line up in an organized fashion with tags grouped in a ordered fashion. Remember, per OGL license, we are required to make the files and code 'human readable'. By organizing the files, we make this requirement easier to justify. Human eyes are designed to read by group, and having a jumbled organization makes reading the lines difficult. (I know, I do it often).

It lines up and actually makes a matrix - Column A is Name w/row, cross referenced with tag in Column. X and Y coordinates if you like. And since ###Block designates a new section, you cut down on empty space.

I agree multi-line is easier on the merging side, but the same line approach presents issues with group investigations.

As a real example - B3 Race file has ~ 375 lines for races. Several lines are blank lines, or # comments. The entire file though 1-376 lines. If I find/replace all double tabs to single tabs till I get one tab space between items, and then set each tab to it's own line as would be down in a multi-line, the file size is now 1-7104 or 7,104 lines of code. Having personally coded that, I can say scrolling up and down ~7k of lines is not fun. In fact, when it was first made, the multi-lines were ~20k since bulk entry is not one line per item, but multiple lines. So the file lines have a multiplication factor of 18 times. 375 vs. 7104.

Tom complains when the code files are larger than ~10k. Now my example was using a nice benign race file. Do you want to see the numbers for the Ability File? The starting point there is a whopping 1,737 lines. Luckily, plenty of dead space... We max out at 7,192 lines.

Core rule book Class Ability file - 1,810, not as many dead spaces. 11,721 is where we end up.

That's a lot of scrolling. I'm sure we can institute shortcuts like tree folding, so the list condenses back to the 375, 1737 and 1810. But we remove the ability to spot errant mistakes if everything is folded.

henkslaaf wrote:So the topic of having all objects described on a single line is actually not very much related to Git. It is related to merging.

What does it have to do with git?

Now why does this come up during the Git discussion?

1) Partly because I brought these up one after the other. I think that multi-line objects would be an improvement for our SVN workflow anyway.2) We will be merging MUCH more often when using Git. Look at the release strategy Andrew has top-of-mind. We could do all sorts of cool stuff that are hard to do in SVN. Like just cherry-picking a patch to the 6.2 branch. Easy, but not with multi-line objects.

Was that meant to be 'Easy, but not without multi-line objects.' Otherwise, it seems like you're arguing against multi-line...

Mmm, yes. Either "not without multi-line objects" or "not with single-line objects".

I'm pretty sure Tom Parker is hard set against making multi-line a normal. He was aghast when he discovered it was an added feature.

I don't know where James stands on the issue.Me, I understand the principles, it does make human review on github easier, but on the flip side our files go vertical. Also, which file types is this multi-line going to apply too? Class, Templates, Abilities/Feats, Kits, etc.?

I'm also not keen on having to convert back and forth. I use Multi-line when I'm first setting up objects. Especially things like Spells, so you can catch non-caught issues in a find/replace system.

But for every day use, I'm not sure I want to be hunting for bugs in the multi-line environment.

Okey, sorry about that. Not trying to provoke any reponse. It is just, that I'm trying to systematically discuss concerns Andrew voiced for me. Since they were my concerns anyway, I thought I would conclude that they were sufficiently addressed. But reading the thread now, I see I've hijacked the thread a bit.

Let's remove that latest post and let me state that I think we do not need multi-line support and that I feel we can solve any problems when we encounter them.

The same for prettlylist. We'll be fine.

A more on-topic subject though would be that the repository is currently 244 MB, which is considerable. We've had complaints that downloading PCGen through Git takes too long, especially from people new to git, who sometimes foobar their repo and need to download things over when they are unable to rebase.

A second problem with git related to size is that SVN allows a checkout of a subdir. Git does not allow this, forcing you to download the entire repo when you would only hack on NFD.

We could split the repositories, which would help, but there are some hard-to-solve problems:

I've tried to reduce the repository size in the past and it is hard. A few causes of repo size:

[list=][*] Git, being a distributed system, downloads all revisions from the first to the last, including all branches. SVN, being centralized, downloads only the latest revision and fetches earlier revisions on demand. This is not something we can solve.[*] When renaming files, often people have (instead of using SVN move or git mv) used their file manager to move them, then added the new, removed the old. Especially with mass file renaming, this effectively doubles the repo size. With the next mass rename, it has trippled, etc.[*] Subversion to Git conversion is not perfect. In a pure-git repository, all branches stem from other commits in the master branch. This branch then only contains the delta of the branch to master and stays relatively small. In the Git conversion, branches are not always well-detected and sometimes contain the entire content of the master branch again.[*] We have years and years of awesome releases in the repo, which we want to keep. Fact of life.[/list]

So if the repo size is a problem, there are a couple of hard-to-solve technical problems in reducing size.

Splitting the repo is, IMHO, a good idea. This also allows us to set up CI jobs that specifically test certain things for certain repositories. It will allow us to run those tests when specific repo triggers are fired. Jenkins can also trigger tests based on file patterns, but splitting the repo's is much cleaner.

It is also safer: we could delegate merging and commiting to master in groups. So the website team could manage it's own members and adopt its own work- and QA-flow.

Things that come to mind are simply the directories in the root of the project

[list=][*] website[*] NFD[*] arch docs[*] pcgen[*] utiltities[/list]

The IDE dir could be moved to the new pcgen repo.

Where we would mostly gain size reduction by splitting would be inside the pcgen-proper repo. However, the best candidates are code and data. I would not recommend to split these dirs, because they are so tightly linked in functionality. (Data depending on code, mostly).

Content - includes NFD, our publisher docs, and our OOC materials.PCGen - is the main program and should not be split upUtilities - is our holding tank for practical aids to coding.

This actually makes the projects make sense, since now a whole check-out of pcgen is the main program. Which as you stated, we couldn't drill down with GIT anyways.

I think IDE should be part of Utilities in this model. Not sure about those Arch Docs.

I would also point out, whenever it's been possible, I have used the MOVE feature, and the RENAME feature. SVN has messed up and caused me to use the alternatives, but that has been the exception rather than the rule.

LegacyKing wrote:Only issue with the split proposed is this:I would also point out, whenever it's been possible, I have used the MOVE feature, and the RENAME feature. SVN has messed up and caused me to use the alternatives, but that has been the exception rather than the rule.

Yeah, I know you do. I remember working on it with you when SVN messed up. Git will also mess up. Cannot be helped. I'm glad you and others did that work.

I was just splitting out some causes for size increase and wether it can be addressed. This one can not.

Active developers we should get the proper mapping. Inactive, well, if they are inactive by definition not participating. We can do an all-call request for svn > github user name, but that is the best you can do.

So I've started the process of mirroring content, website and utilitites.

This brought up the question of branches.

Do we need to process the branches for these subrepositories? Looking at them, they make most sense for the pcgen/ dir, the website does not release alongside of the main program, nor do the utilities or the NFD content.