While performing an upgrade from OJS 2.3.7 to 2.4.2 using a PostgreSQL database I ran into this error:
Query failed: ERROR: index row requires 14368 bytes, maximum size is
8191
The failing SQL is:
CREATE INDEX article_settings_name_value ON article_settings (setting_name, setting_value);
The error is triggered by abstracts in setting_value that apparently exceed 8k in size. Unfortuntately PostgreSQL b-tree indexes will not allow indexing fields with more than 8k data.
This is a hard limit for btree based indexes and I could not see any obvious alternative index type that could be used instead.
I see in dbscripts/xml/indexes.xml that for mysql create index adds a limit on the fields (setting_name(50), setting_value(150)), unfortunately there is no direct equivalent in PostgreSQL.
It is however possible to limit the amount of data in a field that is indexed using an expression which could be a builtin function like substring.
Unfortunately this would require any lookups to use the same function or the index is not used, this is probably not a workable solution.
My current workaround is therefore to instead create a partial index which excludes setting_name = 'abstract' from the index.
I'll attach a patch that does this.
More information on PostgreSQL indexes here:
http://www.postgresql.org/docs/8.4/static/indexes.html

Tom, thanks for investigating this. That particular index is only currently used by the PublishedArticleDAO::getBySetting function, which looks up an article based on a setting name and value. In turn this is currently only used by the public identifier code to find an article by public identifier.
The change you're proposing will result in the index not being used for this query, since (as you point out) the query won't use the same function as the index creation.
One option to optimize this in a database-independent and unobtrusive way would be to add a value_hash column that is set to e.g. the SHA1 hash of the setting_value when values are inserted or updated, then index (and query) that rather than the full setting_value.
However, for the moment I suspect that's over-optimizing. Until we run into performance problems with this approach, I'd suggest we simply create the following two indexes instead:
CREATE INDEX article_settings_name ON article_settings (setting_name);
OTOH this should change use cases for this query from a full scan to a partial scan based on this index. Note that we'll probably have to make the same change for issue_settings.
Does that sound workable?

I'm not sure I understand exactly.
The change I've proposed is to exclude rows from the index where setting_name = 'abstract', this is what is known as a partial index in PostgreSQL.
With such a partial index all queries with 'where setting_name =' will use the index *except* for where setting_name = 'abstract'.
I did this thinking that a query such as where setting_name = 'abstract' and setting_value = 'some text from the abstract' was unlikely.
With the addition of the index you propose this ensures as far as I can see (using the psql EXPLAIN command to examine query plans) that no table scan is made on queries even with setting_name = 'abstract' and setting_value = 'some text from the abstract'.
You mentioned "The following two indexes" but only proposed one, did you mean to refer to the partial index I've proposed as the second?
The issue_settings table did not exhibit this issue in our current installation. If a similar solution with a partial index where to be implemented then it looks like the 'description' field is the prime candidate for exclusion when looking at the current contents of our installation.

Thanks for the additional info, Tom. Can you confirm that the query does appear to use a partial index based on an execution plan? I'm speaking OTOH without trying this in PostgreSQL directly so I could well be wrong.
("Following two indexes" was a typo -- I was proposing just the one.)

Tom, thanks. As the article_settings table is meant to be a fairly extensible facility, I do worry that other settings may spill over the size limit. What about whitelisting the settings that we want indexed as opposed to blacklisting just abstracts? Looking at the code, we have...
indexingState (used in SolrWebService)
medra::registeredDoi (used in DOIExportPlugin / MedraExportPlugin)
datacite::registeredDoi (used in DOIExportPlugin / DataciteExportPlugin)
This would mean something like...
CREATE INDEX article_settings_name_value ON article_settings (setting_name, setting_value) where setting_name IN ('indexingState', 'medra::registeredDoi', 'datacite::registeredDoi');
This has the added bonus of being much more greppable, i.e. if someone's wondering what the index is for, it's much clearer.
Would you mind double-checking that this still plays nicely with the execution plan?

Thanks, Tom. I do prefer a whitelist approach rather than a blacklist approach and eventually an implementation using hashes would be more generic. We haven't done a particularly good job ensuring PostgreSQL compatibility and most of that is due to our scattershot testing approach. We are tinkering with automated regression testing and PostgreSQL would be a good candidate for that test suite as well; the development team and most of the user community focuses on MySQL. Meanwhile, I appreciate your diligence in helping us catch up.
I've also added a warning to the code self-documentation for getBySetting that should remind users to ensure indexing is kept correct. This is low-risk if forgotten: it'll result in lower performance for certain queries.
I'll add this to our recommended patches list shortly.