The main delay here was adding some clarity to the documentation for ols.m and gls.m. After letting this sit for a bit, I pulled back some changes I thought were too bulky. I aimed for adding a bit more context about least squares so that the cov(vec(e)) = kron(S,I) didn't seem so obscure and bewildering. That expression is still there, but I moved it to a third paragraph that describes the error residuals assumption. That third paragraph is what's different between ols and gls. So, the structure of the documentation for these two entries is, in short,

1) The least-square formula construct with complete description of the matrix dimensions.

2) A description of the X and Y matrices, i.e., the input variables.

3) A description of the underlying E matrix, i.e., the implied error variables. It's difficult to be descriptive here without first doing detailed definitions, but I think I've conveyed the ideas.

That seems to make this digestable in reasonable hunks of text. The thing I like most is that I managed to use upper-case S in ols.m case to express matrix and lower-case s in gls.m to signify scalar (in the help-based version, anyway). Basically the difference is that gls() requires an input description of the correlation relationship. That's something hard to validate for one's data no less construct the large matrix without being meticulous...and if one has a large data set, tp-by-tp matrix can be rather large. I suspect that gls.m doesn't see much use, evidenced by the error that is in the routine. ols.m is quite usable though. I think there might be an approach halfway between these two called "feasible" GLS. Also, it might be nice if there were an alternative version that just accepted S, i.e., gls(X,Y,S), rather than have to construct the Kronecher product O.

I'm tempted to change the section 25.4 Linear Least Squares description. It says, "assuming zero-mean Gaussian noise", but Gaussian noise error-variables is not a requirement. If the noise is Gaussian then the estimation is the maximum-likelihood, I think. Generally speaking, though, the requirement is only that the second-order moment (i.e., variance) behaves reasonably nice. Maybe instead it should be "assuming zero-mean noise with reasonably well-behaved variance".

As for the format in LaTeX versus help-based version, eh, I really don't want to pursue that any further right now. I don't think the discussion list will produce much response, and it could be an heap of work in a project where people contribute things sort of hodge-podge. That is, we could make rules, but few would probably know they exist. This isn't a major concern at the moment given a release is approaching.

There is one of these items, though, that I sort of wonder about. The @code{} format produces apostrophes around the code, e.g., 'y = x*b'. The problem there is the apostrophy possibly being mistook as a transpose operator to the unobservant reader.

Well, take a look through the changes, and maybe we'll do one more pass for minor corrections. It might be easiest to just glance at the diff hunks then read the help in both Octave and the octave.pdf file. None of these routines' documentation is very long.

No. I made changes to the ols.m documentation, then turned to the gls.m documentation. But an example I tried--for which I aimed to follow the input matrix size rules--results in an error:

So I set this aside for a day or two. If there's a bug, I will file a separate bug report to keep bugs and fixes out of the voluminous documentation changes. However, I want to understand what the issue is first. Another question I have is why the O matrix is so big. I understand it is accounting for the correlation amongst error variables and across sample "time", i.e., heteroscedasticity, but the external literature on generalized least squares typically considers the random variables having homoscedasticity.

Seeing the gls.m description makes me understand why the original author wrote

The kron(s,I) is like the big matrix O, but of course it is degenerate in the sense it is homoscedastic and reduces to a simplified form, i.e., p-by-p.

The terminology is a bit confusing and I'm still trying to make some sense out of the gls.m routine. I suspect that gls.m might not get used much; ols.m could get used though. In summary, another changeset will be posted soon.

Regarding OLS, I think there is confusion between s and S where perhaps S = diag(s.^2), s being the standard deviation and I being the identity matrix. The kron() could be confusing things.

Let's do a simple test using the variable/case naming convention that we find in the plaintext ols.m help and make the r.v.s all Gaussian so we know what to expect by multiplying by s (std dev). I'm not interested in BETA, so let's just multiply X by 3, i.e., BETA = 3*eye(t).

OK, so BETA looks like 3*eye(t), and SIGMA does look like diag(s.^2), i.e., diag([0.5^2 1.0^2 1.5^2]).

I see now why the original programmer chose vec(E) notation--because E is actually a matrix. And it is strange in the sense that if we really considered the correlation of all residual variables, of which there are t times p, our correlation matrix (if we were to stack the residuals as [e1; e2; e3; ...] where e1 is column one, etc.) would be a tp-by-tp matrix. But I think the example I gave above is more along the lines of what is meant to be expressed here, i.e., there are p random residuals with t observed outcomes per variable. The vec(E) and kron() stuff doesn't seem correct, or at least not the clearest way of describing things. I'll make an attempt at correcting that for ols.m and gls.m.

The "by" symbol looks good in LaTeX. In plain text, I guess I'm sort of used to seeing m x n. However, I started using m-by-n in the patch, especially in cases where there were a lot of variables involved. It's more a visual digestion sort of thing. Without the hyphens, it starts to be: x is t x k, e is p x t, y is k x p, and c is kt x pt. It looks like one big strung-together set of foreign letters.

Upper case versus lower case matrix size parameter isn't too critical for me. I'm comfortable with either so long as it is consistent in a particular function doc. In signal processing, we typically use upper case limits, especially N. But in pure math, it's often lower case limits.

In the ols.m documentation, I think I'll change use of "vec(e)" to just plain "e", i.e., @var{e}. I don't know what was meant--if anything--by the vec() function. kron(s,I) I think is simply supposed to be "s I" in math speak, that is the residuals are independent with the same variance. But right, one typically would write s^2 I where s is standard deviation (of a normal r.v.).

Or maybe we just use spaces around the 'x'. I looked at the syntax at https://en.wikipedia.org/wiki/Matrix_(mathematics) and they use "m x n" where the 'x' is actually some nice mathematical multiplication symbol, but the letter 'x' would do okay.

That makes sense to me. One usually refers to an m-by-n matrix and not an M-by-N matrix. Although, all lower-case, "mxn" looks confusing. But so does "mXn". Checking with the documentation for zeros from Matlab, they have opted to use '-by-' between matrix dimensions so it is an m-by-n matrix.

Maybe put that up for a vote too on the Maintainer's list. I have a Perl script that I use for checking the style of the DOCSTRINGS and I could add a regexp to change all dimensions to use the '-by-' syntax.

I suppose you could say it is a mess of inconsistency. Yes, my concern is mostly about the plaintext that is created when programmers use "help function_name". In LaTex there are many more ways to get the output to appear exactly as you want and I think it is fine to use @var{} in that context if it makes sense.

I'm starting to think that in general we should use real code segments in the ifnottex branches so that @var{x}(i) is used rather than @var{x}_i which doesn't make a lot of sense and requires interpreting the '_' as making the next letter a subscript.

Just to be certain, you're aware that @var{x} in plaintext is X, but in LaTeX it is lower case non-script x (i.e., something like normal math vector notation.). Also, sometimes the documentation splits into @tex and uses x_i whereas for @ifnottex uses @var{x}(i).

By "plaintext" you are referring to "help somefunc" doc? Or are you referring to some plain text version of the manual, i.e., some counterpart to octave.pdf?

Anyway, the point you are raising is the exact conundrum I had. In the case of the octave.pdf manual, with LaTeX, it's as if we are trying to be more pure math frame of mind whereas in the plaintext we are trying to be more matrix-algebra/programming language oriented. Yet, we are trying to use the same or parallel code in the documentation for both LaTeX and plaintext to achieve both of those. I tried striking a balance of using a vector notation in LaTeX whenever it would a vector in Octave script.

You are correct that from a pure math definition we have a random variable (script) x and the particular outcomes or observations are (script) x_i. But again, we have the input to the mean() routine being a vector (or matrix), and that input is expressed in the octave.pdf manual as well. To be technically correct, the math expression the formula represents is "average", not "mean". "mean" from a math perspective incorporates the underlying discrete or continuous PDF.

Capital vs. lower case, I started out using upper case to represent matrices, but this soon became confusing when trying to mixed together in LaTeX and/or plaintext. So, I started using "@var" in places I thought were either vector or matrix (depending on the function input), content with a non-script variable as meaning (typically) vector (or a matrix if one wants to think of it that way).

I'll look the OLS/GLS up in the link you gave, then define T in the patch as well.

@Dan: Could you write a note about the documentation philosophy on the Maintainer's List? I think some of the changes are worth discussing first.

In the plaintext version of the help, for example, using @var() always produces a CAPITAL, generally sans serif, font which can be confusing.

I think there is a distinction between saying what the code does, for example, mean of a vector is

and the mathematical definition of the mean

When defining a mathematical expression, y and x and other entries are general variables, NOT the variables of this particular function, and therefore I don't think they need the @var{} surrounding them.

In particular, once you go with @var{} you can't make the distinction between vectors (lower case Roman alphabet) and matrices (upper case Roman alphabet).

For example,

versus

I don't have a function QUARTILE() in my distribution so I can't comment on that. Maybe that is in the statistics package?

There does seem to be some confusion between in ols.m between whether SIGMA or sigma^2 is being calculated. See Wikipedia (https://en.wikipedia.org/wiki/Ordinary_least_squares). The correction in the denominator is for the statistical degrees of freedom. The input x is a TxK matrix. That means there are T rows (observations). If the matrix is square and of full column rank then there is enough information to make the most accurate determination of the parameters. If x is not square, and rank deficient, then the estimate will not be as good. Still, for calculating the variance the degrees of freedom, at the best of times, are N-1, which suggests that there should be an extra 1 or 2 in this equation someplace. In particular, I tried ols() and for a full rank set of measurements sigma is returned as Inf because (T - rank (x) == 0). This isn't right.

You could also ask on the Maintainer's list about this for someone with statistical expertise to weigh in on this.

As a general guideline (but not a hard rule, as it depends on the context) my logic is that only variables that are in the function argument, e.g., myfunc(x), get the @var{x} treatment. In the Octave doc, the indexing is similar to coding style, i.e., it should be X(i). I felt it might be really confusing to any new user just getting accustom to Octave and not yet understanding that X and x are different variables to have myfunc(X) and then refer in the documentation to x(i). For LaTeX it is slightly different in that the @var{} translates to italic, sans-serif which is typically a vector representation. In that case, I tried to make the individual elements such as x(i) or x_i to be italic, serif which is typically a scalar representation. Similarly, for LaTeX the mean value with bar over the top is also italic, serif because it is a scalar. And then there are some instances such as gls.m and ols.m where the math and Octave both refer to matrices, in which case I made all the matrices upper case (some without @var{} because they are not function variables).

There is a runlength.m and a run_count.m. I wonder why an underscore is used in one case and not the other.

Something doesn't make sense regarding QUARTILE(). There is a discussion in the documentation about how probabilities P are determined. (The various methods are likely meant to change the sampling intervals because distributions can vary, e.g., thin tails, etc.) The variable N shows up in all these methods but I'm wondering where that comes from because if we are generating P how do we know what its length is?

What is the following doing in stat.txi?
It really looks out of place in the octave.pdf manual where it is located.