You are here

Regression: Improper use of GROUP BY statement produces ambiguous column error

I am currently getting the error message below geneated when I enter teh 'My Workbench' section of the site ... in order to make sure that I am using the latest, I've upgraded today to Views 3.x-DEV + Workbench 1.1 ... still happens:

but, I can't figure out *where* or how that code is being generated ... since it only happens in My Workbench, I'm assuming it is related to this module, and not the views module ... the fix is really simple ... the 'vid' in GROUP BY has to be changed to 'node_revision.vid' ... but, where?

Unfortunately, that is a somewhat large commit and git --revert 3b779f6cb164e9104f0d3ba4f7e9ccb83ca7478d does not apply cleanly.
I would love to temporarily revert this commit on my end until a proper solution is made (so much for wishful thinking..).

actually, its in production and running fine with the exception of this commit. This is no theory.

If anything, the differences between mysql and postgresql should be moot if they both follow the sql standard and the sql standard is exclusively used in drupal. Then there would be no mysql this nor postgresql that.. (any issues then would be grounds for a bug in either database).

The surprising thing is...I thought we got rid of the code that special cases the alias on the base field in D7. Did that somehow make a return or did we not actually get rid of it?

I would have thought that was exactly what the d7 database api did a good job off fixing/handling so that programmers could follow the SQL standard and leave it to the database api to apply whatever workarounds for any weirdness done by either database.

Somehow, the mentioned commit reintroduces the problem.

After some playing around, I isolated the problem to this block of code in views/handlers/views_handler_field_entity.inc (around line #63):

<?php// Add the field if the query back-end implements an add_field() method, // just like the default back-end.if (method_exists($this->query, 'add_field')) {$this->field_alias = $this->query->add_field($this->table_alias, $this->base_field, ''); }?>

If I just comment this alias process out, the views start working properly.

The third parameter appears to be the problem.
Setting it to some form of empty() produces this problem.
If I were to set it to "dudd", then the query would work.

After some grepping, I think the following is the add_field() function in question:
views/plugins/views_plugin_query_default.inc:714

So here is what I think is happening:
1) both tables (node and node_revision) have 'vid' as a column name.
2) the previous block of code sets the alias to 'vid', which conflicts with the 'vid' of both tables.
3) Because the vid column already exists and is already defined, it is ambiguous.
By removing the "special case" we prevent the potential ambiguity.

Edit:
But no matter how I look at the query, I cannot understand why postgres is seeing it as ambiguous.

I have finally made sense of this error and why it is happening.
Postgresqls behavior is properly following the sql standard and there is a valid SQL error here.

The problem was not in the SELECT part of the statement, but instead in the GROUP BY part of the statement.

GROUP BY statement uses the original columns from any given TABLE and not the columns from the RESULT SET.

We should be defining GROUP BY columns using the TABLE.FIELD instead of just ALIAS to prevent this conflict from happening.
The attached patch is my first attempt at ensuring that the TABLE.FIELD is used instead of ALIAS in the group by statements.

Its seems that in certain cases, some of the fields in the SELECT statement may not belong to any column.
These SELECT variables (I do not know the proper terminology for this name) cannot appear in the GROUP BY as they do not already exist.

I added a preg_match test to ensure that the SELECT variables are not added to the GROUP BY.

They both have a 'vid' column.
The GROUP BY statement sees that there are 2 vid columns because it gets processed BEFORE the result set is generated.
This the ambiguity.
The ORDER BY statement is processed AFTER the result set is generated and as such there is only 1 vid column.

I hope this makes more sense than my previous attempts at explaining this.

(Side Note: Is it me or is there something wrong with the node table defining vid as bigint and the node_revision defining vid as integer and vice-versa for the nids?)

It seems that using $fields_array directly after calling $this->compile_fields() is what caused the test failures.
Non-table-column-fields are being included, such as COUNT(..).

I decided that the proper solution would be use $this->compile_fields() directly and so I made my changes there.

I changed the behavior of how $this->compile_fields() returns the fields.
It now returns an array of arrays, one sub-array being the usual non_aggregates and a new sub-array containing the actual fields prior to compilation.

I also had to change the behavior for when groupby gets added.
When any groupby field gets added, according to the SQL standards, all of the groupby fields must be added.
The behavior is to add all fields to the groupby if any single groupby is added or if aggregation is enabled and non-aggregates exist.

After fixing the typos, I found that there were some additional problems.
The fault for the additional problems is mine from this patch: #1431600: Invalid use of SUBSTRING() sql and placeholder_length for views_handler_argument_string.
I made two mistakes:
1) I did not add the field to the groupby array.
2) The fix in the patch applied to the functional code, but the simpletests were not updated to apply the fix (Meaning the tests themselves are still invalid).
- I will make a separate bug report for this tomorrow.

From #24, mistake #2 is not a mistake.
- I reviewed the code and saw nothing wrong and simpletest does in fact pull the correct data (I had my debug statement mis-typed).

The problem with my patch in #22 is that I am including aggregate fields along with non-aggregates in the GROUP BY.
It turns out that the standards allow for aggregates to not be specified in the GROUP BY.
The original code in this respect is perfectly fine.

The new patch is very simple and resolves the standards violation, thus avoiding "ambiguous column error" as far as I can tell.

The patch itself seems to like quite simply though i would like to be really sure, so it would be cool if you could some kind of simpletest code which shows that nothing will be broken in the future.is broken

Just putting in my experience testing this - it seems to work without flaw on my test case:

I have a set of relationships where I have one content type (Water Planning Region) which has multiple other content types (Water Use Types) with an entity reference. I am doing a view which lists the Region name and a column for each use type with a count of the number of users in that type. This patch seems to work perfectly.

It looks like this patch is waiting on me to write simpletest support.
I had completely lost track of this issue.

If anybody has any tests cases that we can use, please send them.
I need to come up with a bunch of test cases for this patch to get applied.
(And I need to first learn how simpletests works, but given then name I assume it will be simple).

Perhaps sending exported views that don't work before but do work after this patch would be helpfull.

It appears that the existing simpletests always passed because a contextual filter was used, which somehow changes the way in which the query is generated and bypasses the sql problem.

I have added a new simpletest called "testAggregateAmbiguity" to the groupby simpletest class.

The simpletest view is designed to trigger the ambiguity bug by doing the following:
1) pull in two different tables that have at least one column with the same name, such as: node and node_revision.
2) do not use contextual filters.
3) add an aggregate operation, such as COUNT()
4) select the nid from both the node and the node_revision tables.

If you wish to see the test fail, then revert the changes to plugins/views_plugin_query_default.inc.

Edit:
There is a grammar error with the added simpletest.
The test should read: 'Make sure there are no ambiguity problems with the group by operation.'

When filtering out non-aggregates for distinct tables, the $fieldname variable was being used instead of the $string variable.

The $fieldname variable potentially contains the alias (and if no alias, then it contains the original field name).
The $string variable contains the original field name.

The only exception to using $fieldname instead of $string appears to be when the original fieldname does not come from a table, but is instead a string (Note: string and $string represent different things here). Drupal seems to do this with things like SELECT 'node' AS field_data_field_event_group_node_entity_type ....

Please double-check my logic I am feeling rather dyslexic at the moment.

Hi. This issue still is causing problems with PostgreSQL and Views 7.x-3.10. Among the affected modules are Media (specifically the media browser, already commented above) but I just found today that Similar By Terms is having the very same problem, causing the module's block to come up empty.

The patch from #37 still works fine with 7.x-3.10. It would be great if this issue gets some attention, thanks!