PhD Tools

24/05/2015

It's now been 9 months since I successfully submitted my thesis and passed my viva. I'm glad to say that although I was given minor corrections, they were very minor, requiring only a a few modifications to my thesis.

Although at this stage I cannot make the thesis itself publicly available, I think its worth closing out this blog with some reflections on what I think worked well about the tools and techniques that I used, and also noting some areas that I might perhaps do differently a second time around.

What worked

Version control

This was really key for me, and helped me keep everything in good shape throughout the work. I wrote about the technique I used and a few tips and tricks for the software (Bazaar) that I used. I know that plenty of people get through by making manual backups and having clever naming conventions or similar, but I wouldn't want to do it any other way.

LaTeX

I'd used LaTeX before and knew it was pretty powerful, but I really enjoyed using it to write my thesis (and a few papers along the way). It is really versatile and, although it has plenty of quirks and idiosyncrasies, it can be moulded to do almost anything that you could want. There is also a wealth of information out there on the web to help with getting it to do what you want. Although it might seem like a steep learning curve at first, it's worth it in the end.

Referencing

I'm aware that there are more sophisticated web-based solutions around for reference management (Mendeley for example), but I found that my large folder of pdf files all referenced from a bibtex file and managed using cb2bib was a pretty good approach.

Figures

I developed an approach that allowed me to recompile everything in my thesis from the raw data. This was certainly a hassle at times, as writing a script to lay out a plot definitely takes more time than laying it out by hand, but overall I think it saved me time. Making changes late-on in the process I was able to do things like change colour schemes and a grid to all my figures relatively easily, without having to manually re-edit everything. Having said this, I might consider changing the software that I used for the job in future - see below.

What I might do differently

Not use Matlab for figures

Don't get me wrong, I'm a long time Matlab user and think it's great, but in trying to produce figures that looked as good as possible I feel like I really pushed it to its limits. What started out as some relatively basic plotting scripts, using mostly built-in functionality, evolved over the course of the work to be really quite complicated beasts. In some instances I had multiple sub-plots per figure, all using modified locations, with stacks of up to 4 axes sets in each plot (in order to achieve both axis breaks and different colour grids). These turned out fine in the end, but were quite a frustration to get right.

Since completing the work I have been increasingly producing my visualisations in javascript. This retains the scripted nature of the source code and the ability to regenerate plots as the data changes; it produces/exports vector graphics in a standard format; and it is increasingly common and well supported (both in terms of software and community) on the web, with libraries such as D3 enabling some really awesome stuff. But additionally I think it has the following advantages over Matlab plots:

It is an open language that is supported in all major web browsers and is therefore about a portable as you can get

It is less constrained than Matlab, as you have access to the raw line drawing commands.

It enables a much greater level of user interaction in the results, with tooltips, data selection, animation and linked plots all being achievable.

Although I would likely still use Matlab/Simulink for simulations, calculations and data analysis, I would write my results out in a format such as .csv and then plot the results using D3 in javascript. It should then be relatively straightforward to include the resulting .svg files in a LaTeX document.

Make my thesis more 'sectioned'

I struggled to get a consistent story that I wanted my thesis to tell until very late on in the process. Consequently the sections I wrote got juggled around and rewritten a few times. The techniques I used for writing it certainly helped in allowing me to play with the structure, but I feel like I could have done better in this regard. It's always easy to think like this in hindsight, but I would like to have had the content broken up in a more reusable manner.

Unfortunately having reusable sections of content is almost the opposite of having the consistent and coherent narrative that a good thesis (allegedly) requires. I don't have a solution to this at present and I'm not sure I ever will, but it's certainly something I would think more about (if there were going to be) a second time around (which there won't).

Make my thesis more engaging

It upsets me that I put a great deal of effort into something that perhaps only 4 other people will actually ever read through in it's entirety. It also doesn't seem to me to be the best use of the funding that I received. Nearly 4 years ago I proselytised about future documentation methods, and I think we've since seen a lot of progress in this area, particularly on the web. I would love to have been able to present my thesis in a novel and engaging format, but unfortunately I would still be working on it now if I had tried to go down that route (and also still arguing with supervisors and examination boards about accepting it).

I think it needs someone who is well on top of their work to forge ahead in this area, and produce a truly dazzling thesis that sets a standard for others to copy. With the right approach I believe it should be possible to produce something with the technical depth and quality to merit the award of PhD, whilst still being accessible to the lay reader. Perhaps through staged levels of detail, or interactive facilities or similar. Once someone has proved that it can be done and it's "gone viral" or whatever, I feel like the overall approach will be replicable. Unfortunately changing the overall PhD thesis paradigm, whilst simultaneously studying for and obtaining a PhD, is probably a bit much for most students!

13/03/2014

I found out that I need one of these in my thesis recently and thought I would probably be a major PITA. However I then found TeXcount, and then found that it was included in my install by default. So typing
"texcount thesis.tex" at a command line gave me a load of interesting output.

A bit more playing and Googling showed me that I could run it during compile to automatically update in the document. I was using "\write18" anyway to pull images from other folders so this was no problem. In order to include the wordcount in the document I made a new command to do this. Mine is slightly simpler than in that link as I only need a total, so here it is:

11/10/2013

As I write my thesis I keep coming up with questions about the style, layout and formatting that I should use. In many cases I doubt there is a right or wrong answer to the questions I'm asking, however I need to make a decision either way, and be consistent throughout my document. Therefore I've decided to start noting down the questions I'm coming up with and the direction I'm taking, ideally along with reasoned arguments as to why. Hopefully this should serve as a reference for me down the line.

Double spacing after full-stops?

I vividly remember being taught at school to double space after a full stop and single space after a comma, colon or semi-colon. This is a throwback to the days of using typewriters and it is widely regarded as no longer needed. Although it sounds like a slightly larger space can help with readability, modern proportional fonts and processing tools allow for this anyway. The Wikipedia page gives a lot of details on this.

I believe LaTeX can (and does) adjust spacing after a period anyway.

Title case?

What words should be capitalised in a title (e.g. a chapter/section/subsection heading)? "Title case" refers to the practice of capitalising all words in the title except for some subset of short or unimportant words. However what the subset is and whether title case should be used at all does not seem to be clearly defined. Once again the Wikipedia page kicks off discussion on this. It seems that several major publications use sentence case for titles. It seems that its main motivation is as form of emphasis, however it is argued that with the use of modern digital fonts emphasis is more easily added in other ways and that sentence case is easier to read.

Some contributors suggest using title case only for main titles, such as the title of the book and possibly chapters, but using sentence case for lower level headings. I will use title case in this manner, for the title of my PhD and for the chapter titles, but not elsewhere.

Capitalising references?

Throughout my document I make references to other sections and chapters, as well as to figures and tables. Of course I link these properly in my LaTeX source so that hyperlinks are formed, but should I capitalise the word section/chapter/figure/equation? It seems that most, but not all, scholarly articles choose to capitalise in these cases; the argument being, I think, over whether it behaves as a proper noun or not. This raises the question of whether "Page" should be capitalised in cross-references, this would appear very strange and this style does not appear to be in use anywhere. A possible explanation for this is that figures, chapters, etc are 'intentionally' named, whereas pages are not, although this seems somewhat unsatisfactory.

I will capitalise all references except for pages (which I do not expect to use) to fit with the most common useage.

Company, brand and product names?

There does not seem to be a common style in use for formatting these. Most commonly they are presented in standard body text with whatever capitalisation and punctuation the brand commonly uses. At most there may be some specific rules relating to capitalisation for certain publications. However in reading heavily technical work I find that confusion can often arise between technical terminology and product details, therefore it would make sense to be able to show the difference in some way. For example consider the sentences:

" The PID controller is simulated using a Matlab model. The ABB controller is simulated using a parametric model"

Here both "Matlab" and "ABB" are product and company names respectively, whereas "PID" is a technical acronym and "parametric" is a technical term explaining the simulation type. I doubt this distinction would be obvious to anyone unfamiliar with the terminology.

In order to assist with this I would like to apply some formatting convention to highlight company, brand or product names in the text. I have seen one dissertation that used small caps to denote products. This seems to work well as I am not using them for anything else in my document, although it is slightly unfortunate that acronym names do not stand out in this way.

I am not fully decided on the best way to proceed with this at present.

Indenting paragraphs?

Separating paragraphs in some way to prompt a slight pause whilst reading seems like a sensible writing style. The most common technique for this in printed material is to indent the start of paragraphs, although alternatively an increase in line spacing may be used. As a pause is not required at the start of a new section (or rather, that pause has already been prompted) an indent is typically omitted.

This standard in online material seems to be to use "block text", that is no indents but an increased gap between paragraphs. This makes some sense as there is an infinite space available and there is no confusion over paragraphs breaking across pages

I will be indenting printed paragraphs other than the first and using block text in online materials.

Serif or sans-serif?

Although the jury seems to be out on whether serif fonts are actually any better for print reading, I went with convention and chose a serif font for my thesis.

Table format

As far as possible I followed the excellent recommendations in the booktabs usage guide.

Unit format

I'll be trying to stick with SI units wherever possible in the thesis, although there may be some deviations where other conventions are the industry standard. Where appropriate I'll be using prefixes to avoid long and ugly numbers. LaTeX provides the excellent siunitx package for typesetting units which I will be using heavily. When used in tables and figures to apply to an axis, row or column I will be using a slash to denote the units. This is recommended by the BIPM as the correct method of expressing values for multiple quantities. It effectively divides the title by the units, making the values dimensionless.
However, going against the BIPM advice, I will be including a space before and after the slash, as I feel that this separates it from the axis title - which may be multiple words, and it reduces confusion between the slash and the units themselves. With the slash present all the denominator parts of the units will be referred to as a power of -1.

Scientific notation

Due to the mathematics of my work covering a large number of physical domains, including thermal, mechanical and electrical, I ended up with a relatively large notation section. I also began to run out of letters in the latin and greek alphabets. I therefore ended up using a cursive script to denote thermal parameters, allowing me to reuse some letters.

Figure design

As far as possible I followed the guidance of Tufte in presenting data in my thesis.

Punctuation

Acronyms

Line spacing

Margins

This seems to be the subject of as much debate as anything else in typography and formatting! Things I think are worth considering: binding, screen reading, ease of reading, general prettiness. By default LaTeX gives you a larger margin on the outside when you request twosided document. This is to maintain the same distance between text in the centre as it does at the edge. It does not make any allowance for binding.

"The size of the text body is intentionally shaped like it is. It supports both legibility and allows a reasonable amount of information to be on a page. And, no: the lines are not too short."

None of this addressed binding, which seems to be considered separately, however must be allowed for in thesis printing. University guidelines have this to say: "To allow for binding the margin at the binding edge of any page must be not less than 40mm; other margins must be not less than 15mm.".

A further consideration is the fact that the thesis will frequently be viewed in digital format. In this case margins are not necessarily required, and variation in margins between odd and even pages is likely to be distracting. This suggests that specific formatting for two sided printing should be avoided.

As a compromise I will be using equal margins either side of the text. When bound this will produce a narrower inside margin. I will also be placing page numbering at the centre of the page and titles always on the left, this should maintain uniform readability for screen reading. I will use 40mm for both margins, as this is the minimum allowed for the binding margin by the guidelines, this also achieves a 0.62 proportion, not far from the 2/3 often advocated.

The top and bottom margins are allowed to go as low as 15mm, and at this limit the document scrolls very nicely during screen reading. However, it looks unwieldy in printed form. Therefore I've opted for a 25mm margin at the top and a 35mm margin at the bottom, with page numbering placed in the centre of the bottom margin.

Punctuating equations

Personally I agree with this answer, therefore I shall punctuate equations that are displayed inline, or where the sentence continues below the equation, but I won't use periods to end a sentence within a numbered equation; in this instance I shall try to use a colon prior to the equation.

Related: should notation be introduced in the text flow or in brackets? I introduce them without brackets.

Font

I ended up using LaTeX's default Times font rather than anything too fancy. This was easy and familiar to me and most readers.

Justification

Most technical articles tend to use fully justified text and there is some evidence that this is of benefit to dyslexics. One downside is the increased tendency for justified text to create distracting "rivers" of white space. One improvement is the use of microtypography techniques. These are pretty easy to use in LaTeX and there doesn't seem to be any downsides, so I'll be using them.

Tables and Figures

Tables and figures will appear throughout as soon a practically possible after they are referenced in the text. They will all be centred with a centred caption, Tables will have a caption above, Figures below. Captions will not be punctuated with a final period if they are only a sentence fragment, however they will be if they are a complete sentence or there are multiple sentences. Figures will not have a title above them, the caption below providing all the required information.

This is obviously a work in progress post... but if anyone has comments then I'd love to hear them!

01/08/2013

Drafting a paper I've just got stuck trying to fit some lengthy equations into a two column document. For some I've gotten away with splitting lines using \begin{split} or \begin{align} environments, but for some it starts to look ridiculous. I also started to run into issues where my \left( and \right) bracket commands didn't like to be split over lines. So I decided that there was no option except spreading the equation over the full page width.

This is actually not particularly straightforward in LaTeX. A reasonable forum discussion on options gives a few suggestions. It seems that it needs to be floated within a figure environment. This is also suggested in a very comprehensive document (p35) on typesetting equations. Unfortunately both of these require manual fiddling with equation numbers and quite a lot of extra code in the document. I've therefore modified their code to define my own environment for a floated equation and used counters to automate the equation numbering.

Preamble:% for floated 2 column equations\newcounter{tempEquationCounter} \newcounter{thisEquationNumber}\newenvironment{floatEq}{\setcounter{thisEquationNumber}{\value{equation}}\addtocounter{equation}{1}% record equation as happened and remember number\begin{figure*}[!t]% float following equation across columns\normalsize\setcounter{tempEquationCounter}{\value{equation}}% record current equation number in floated location\setcounter{equation}{\value{thisEquationNumber}}% use previous equation number}{\setcounter{equation}{\value{tempEquationCounter}}% set back to equation number in floated location\hrulefill\vspace*{4pt}% add a horizontal rule separator\end{figure*}% end float environment

}

In the text simply wrap the standard equation environment in the new environment:\begin{floatEq}\begin{equation}a = b + c\label{equ:floatedEquation}\end{equation}\end{floatEq}

This should float the equation across the whole page and allow LaTeX to fit it in the right place whilst still numbering it as if it appeared where it is defined in the text. Or at least that's what it's achieving for me at the moment!

30/05/2013

It's over a year since I put together some thoughts about how documents (PhD theses in particular) get written up. Having just started on the third year of my PhD I need to start compiling all the little bits I've written in the last two years together into something resembling a final thesis format. I can then build this up into a body of work that I can present at the end. This raises the question - how to best go about this process?
First a few conflicting thoughts about the final document I need to produce:

I'd like to write in short concise sections that will either stand on their own, or can be built up into a larger document. I'm concerned that this will not produce a coherent final document though.

I'd like to setup the individual sections in a hierarchy that reflects the hierarchy of the systems they are describing. Unfortunately this is likely to produce far too many levels in the document (when I drew up a draft layout I got to 12 levels!).

I'd like to describe the work down to a level that anyone (even a schoolchild) could understand, however this would probably end up being a massive document that would be too basic for most readers.

I think the problem I have, that lies behind all of this, is that the thesis is really designed to be a document for your examiners to read, cover to cover, in a format that they are familiar and happy with. It will appear printed on paper, be long and probably dull, and, as has been pointed out to me by other people and stated a number of times on the internet, it should "tell a story". A story that took 3 years to write and that a maximum of 3 people will ever actually read.

What I would prefer to write is a document that I could upload to the web and could be referenced and useful to anyone. I've spent some time thinking about how I might be able to cleverly combine the two, but I don't think I'll be able to write just one document to satisfy both aims. So my plan now is to write the thesis in a fairly conventional manner in order to satisfy the requirements for a PhD, then to later revise this into a format that I prefer. Obviously this second part will be easier if I know what I'm planning during the first part, so here is my initial outline...

Thesis mark I

I'll be using LaTeX for the writing, in whatever document template I'm given (or have to make to fulfill my departments guidelines). I'll try to keep it split into relatively small chunks, but without going to too many levels of subheading (4 max). I plan to keep each chapter in its own subdirectory and include it using the \include command. Within chapters I will break things down again using \input commands to allow me to keep sections and subsections in their own individual tex files.

I'll try to use hyperlinks where possible using \usepackage{hyperref}, using the colorlinks option to avoid the default ugly boxes. By using a subtle colour for the links they shouldn't be too jarring in a printed report (where they will be no use) but obvious enough when read on a computer.
As previously discussed in this blog, I'll try to include figures from matlaband simulink as vector images, stored wherever they are generated. I'll also use a single bibtex reference file for all my references.
As with other work, ongoing revision of the thesis will be controlled using Bazaar. I can include the version used to produce output files within them using vc.tex, which is pretty neat. That way when I have multiple versions printed out and sent to people for comments I hopefully shouldn't lose track of which version they're commenting on!

Latex to Kindle?

Thesis mark II

Latex to html?

html to chd?

I never really finished writing this post, but it gives a flavour of what I was thinking a little while ago when I started writing in earnest!

02/08/2012

I recently described my referencing process where I simultaneously hold reference pdf files in a version control system and keep track of their details in bibtex file. I have also described a problem I've been having with my version control system of choice, Bazaar.

Based on these I decided that there might be a better (more automated) way of adding my references. After a very useful email exchange with the author of cb2bib they confirmed that this should be possible. Now, after a day of messing around and learning about various command-line tools that I hadn't used before, I think I've got it working. Here's how...

First I created a new batch file that I called "bzrAddRef.bat". Here are its contents:

echo bzrAddRef:echo cb2Bib script for adding BibTeX files to a Bazaar repositoryecho.echo Using sed and xargs utilities from:echo http://gnuwin32.sourceforge.netecho.echo Path below may need changing to be within the repositoryecho.

Only the last 4 lines are really important. The first one sets the working directory, I don't think it matters what you use as long as you have write access and it is within the repository you want to add to. The next line runs through the current bibtex file and extracts the location of all the references to a temporary file. The next line adds all these files to Bazaar repository. The final line deletes the temporary file. (maybe I could have used a pipe between commands so that the temporary file was not required?)

Some special commands are used that will need to be installed, what you need are: sed.exe (and its dependencies: regex2.dll, libintl3.dll, libiconv2.dll) and xargs.exe.

Within the settings of cb2bib this batch file can now be pointed at under the "Configure BibTeX" - "External BibTeX Postprocessing" - "Command:" section. Once that is done simply hitting "Alt"+"p" in the cb2bib window should run the batch file and add all the references to version control.

I hope the helps someone! (I presume it could be altered for other version control systems or be called from other applications.)